If users want to perform maintenance on an agent, or remove it permanently, we recommend removing that agent from the configuration and waiting out existing sessions. The idea being that this prevents new sessions from being created, and it will eventually be unused. This works most of the time, but not always. When multiple sessions are allowed, we try to make sure all sessions end up on the same agent. This happens even if the agent has been removed from the configuration. So a user can keep adding new sessions to the agent we want to get out of the cluster, just as long as the user keeps at least one session alive at all times. It is unclear if this is a desired behaviour or not. Multiple sessions per user is not a generally stable situation to begin with. However it gets worse if the user has those sessions spread out over multiple hosts (assuming the home directory is shared). So we really want all user sessions to be on the same machine. On the other hand, this is not something we fully guarantee today. We only try to group the sessions. If the agent with the existing sessions is unresponsive, or fails to create a session, then we'll gladly create the new session on a different agent.
I see that there are at least three options of how this should work: * Let the user keep creating sessions on the removed agent (like today). The user is happy, but the sysadmin might be confused as to why new sessions start after the removal. * Prevent such users from creating any new sessions everywhere until the session on the removed agent is ended. This would require a good error message. The user will not be happy, but the behavior would be "safe" when regards to the principle of keeping multiple sessions on the same agent. * Let the user create new sessions on other agents. This would make the principle of keeping multiple sessions on the same agent a "best effort". See bug 7621 for logging in these cases. The user will be happy in this case but might run into confusing issues in the session. The situation we have right now is probably the worst option. I'm leaning towards that the last option of doing this "best effort" is the way to go.
The user will now get a new session on a different agent in this scenario. This can cause the user to experience problems caused by having multiple sessions on different agents. However, the behavior of removing agents from the configuration is now reliable.
I added a log warning for this scenario as well.
I have now fixed the issues pointed out via email. Mostly this concerned simplifying the logic of getting the prioritized agents, but the log warning was also simplified.
Tested on Ubuntu20.04. Could reproduce on tl-4.12.0 and it works as described in comment #5 when installing nightly-build. Also made sure that the normal behavior works, that is a user gets new sessions on the agent where he/she has the most sessions. And the log message looks good, closing.