Bug 8545 - No easy way to disable agents
Summary: No easy way to disable agents
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: VSM Server (show other bugs)
Version: trunk
Hardware: PC Unknown
: P2 Normal
Target Milestone: 4.19.0
Assignee: Frida Flodin
URL:
Keywords: linma_tester, relnotes
Depends on:
Blocks: 1015
  Show dependency treegraph
 
Reported: 2025-03-14 13:16 CET by Frida Flodin
Modified: 2025-03-31 16:02 CEST (History)
2 users (show)

See Also:
Acceptance Criteria:
MUST * It must be possible to disable a specific agent from getting new sessions. This should be done by changing the config file. * A user should still be able to reconnect to a disconnected session on the disabled agent. * The new config change must take place after the master service has been restarted. SHOULD * It should be clear in the logs why an agent is not getting any new sessions. * The new config change should take place immediately when the file is saved. * It should be possible to disable a number of agents. * It should be clear in 'tlctl load' and 'tlwebadm status load' that the agent is disabled. COULD * It should be possible to disable entire subclusters.


Attachments

Description Frida Flodin cendio 2025-03-14 13:16:05 CET
If an administrator for some reason wants to disable an agent from getting new sessions, he/she has to resort to the solution for bug 7610. This works just fine, but for some administrators this might feel too permanent, and you have to remember what subcluster the agent belonged to.

It would be nice if this could be done in a more straightforward way.
Comment 3 Frida Flodin cendio 2025-03-19 09:35:23 CET
We did a first draft of updating tlctl load list to show disabled agents. We had some discussions about the layout/design, and maybe this is not the best solution, but here is how it looks right now:
> AGENT      USERS  STATUS  
> ==========================
> 127.0.0.1      5  UP
> 127.0.0.2      2  DISABLED
> 127.0.0.3      -  DOWN
> 127.0.0.4      0  UP
A drawback with this design is that in most scenarios, all listed agents will be "UP", meaning the extra column just adds extra noise.

This is something we can come back to later, and we came up with some alternative designs:

1. Only show STATUS column if at least one is DOWN/DISABLED:
> AGENT      USERS  STATUS  
> ==========================
> 127.0.0.1      5 
> 127.0.0.2      2  DISABLED
> 127.0.0.3      -  DOWN
> 127.0.0.4      0 
> 
> AGENT      USERS
> ================
> 127.0.0.1      5
> 127.0.0.2      2
> 127.0.0.3      2
> 127.0.0.4      0
2. Use some kind of symbol to indicate DISABLED/DOWN:
> AGENT         USERS  
> ===================
> 127.0.0.1         5
> 127.0.0.2  (*)    2
> 127.0.0.3  (x)    -
> 127.0.0.4         0
3. No extra header column, and show DISABLED/DOWN:
> AGENT      USERS
> ================
> 127.0.0.1      5
> 127.0.0.2      2  DISABLED
> 127.0.0.3      -  DOWN
> 127.0.0.4      0
The word "DISABLED" might be changed as well to something more accurate.
Comment 15 Samuel Mannehed cendio 2025-03-26 16:16:46 CET
Tested on CentOS 8 using build 3959.

> MUST
> * It must be possible to disable a specific agent from getting new sessions. This should be done by changing the config file.
Yep, it works.

> * A user should still be able to reconnect to a disconnected session on the disabled agent.
Yes.

> * The new config change must take place after the master service has been restarted.
Yes.

> SHOULD
> * It should be clear in the logs why an agent is not getting any new sessions.
When the vsmserver is started (or restarted) the following is printed in the logs:

2025-03-26 16:07:09 WARNING vsmserver.loadinfo: Draining agents: ['lab-206.lkpg.cendio.se']

> * The new config change should take place immediately when the file is saved.
No, it requires the master service to be restarted. This is in line with how most things currently work in ThinLinc.

> * It should be possible to disable a number of agents.
Yes, and if all agents are draining, the client shows "no agents server was available", and the logs say:

2025-03-26 16:12:42 WARNING vsmserver.loadinfo: All agents for user cendio3 are marked as draining.
2025-03-26 16:12:42 WARNING vsmserver: No working agents found trying to start new session for cendio3

> * It should be clear in 'tlctl load' and 'tlwebadm status load' that the agent is disabled.
Yes, it is. And both show the information in a similar manner - an unlabeled column.

> COULD
> * It should be possible to disable entire subclusters.
No, this was not added for now.

---

The Administrator documentation was also updated to reflect this change.

Adding a GUI for this will be done on bug 1015.

Note that the configuration parameter /vsmserver/draining_agents will probably be moved to /agents/draining as part of bug 8553.
Comment 18 Linn cendio 2025-03-31 16:02:55 CEST
Tested on Fedora 41 with server build 3962. I also used Ubuntu 22.04 as an agent machine for cluster testing. 

> MUST
> ✅ It must be possible to disable a specific agent from getting new sessions. This should be done by changing the config file.
> 
> * A user should still be able to reconnect to a disconnected session on the disabled agent.
Yes, when I specify an agent under /agents/draining, I can no longer create new sessions on it, but I could still reconnect to an existing session. 

Note that the hostname under /agents/draining has to match the hostname specified in /subcluster/<cluster name>. If the names do not match (e.g. one uses the DNS name and the other the IP), I was still able to create a new session on the drained agent.

> ✅ The new config change must take place after the master service has been restarted.
Yes, after restart the config change is active.

> SHOULD
> ✅ It should be clear in the logs why an agent is not getting any new sessions.
I think so, and the following scenarios are logged:
  1) When starting the vsmserver, a list of draining agents are logged:
>    WARNING vsmserver.loadinfo: Draining agents: ['127.0.0.1', '10.48.2.82']
  2) When a drained agent is not part of any subcluster:
>    WARNING vsmserver.loadinfo: Tried to configure draining of agent '10.48.2.82', but it was not found in any subcluster configuration.
  3) When a user can't log in due to their available agents being drained:
>    WARNING vsmserver.loadinfo: All agents for user tester are marked as draining.

> ❌ The new config change should take place immediately when the file is saved.
No, the vsmserver service has to be restarted.

> ✅ It should be possible to disable a number of agents.
Yes. Tested by draining 2 agents, and was unable to start a new session for a user only connected to those agents.

> ✅ It should be clear in 'tlctl load' and 'tlwebadm status load' that the agent is disabled.
Yes. Tested by having an agent DOWN and one DRAINING, and things looked sane in both tlctl and tlwebadm. When all agents are up, the status column is hidden.

> COULD
> ❌ It should be possible to disable entire subclusters.
No, at the moment there is no convenience functionality for draining whole subclusters.

---

Also checked the commits, documentation and release notes, looks good.

Note You need to log in before you can comment on or make changes to this bug.