Bug 284 - Limit the number of concurrent users or sessions (per cluster, per agent)
Summary: Limit the number of concurrent users or sessions (per cluster, per agent)
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: VSM Server (show other bugs)
Version: 1.0.1
Hardware: PC Linux
: P2 Enhancement
Target Milestone: 4.18.0
Assignee: Emelie
URL:
Keywords: aleze_tester, linma_tester, relnotes, tobfa_tester
: 5966 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-05-07 10:49 CEST by Peter Åstrand
Modified: 2024-11-20 09:21 CET (History)
8 users (show)

See Also:
Acceptance Criteria:
MUST: * Must be able to limit the number of users per agent on a ThinLinc cluster * Users must not be able to log in when the limit is reached * The default limit must not affect uninterested sysadmins * When limit is reached, it should be communicated in the log SHOULD: * Should be able to configure the limit in Web Admin * End users receives an error message * The config-variable and its usage should be documented COULD * End users should get an informative error message - New client - Older client - New Web Access - Older Web Access


Attachments

Description Peter Åstrand cendio 2003-05-07 10:49:37 CEST
It might be useful to be able to limit the number of concurrent users. This has
nothing to do with licensing, but might be useful anway. For example, a system
administrator might want to reduce the server load. 

It should be easy to implement this, since the vsmserver already has knowledge
of the number of concurrent users. All we need to do is to add a parameter to
vsmserver.conf, and use it in vsmserver.
Comment 1 Peter Åstrand cendio 2004-04-05 09:33:03 CEST
To be useful, this should probably be settable *per*agent*; not per cluster.
That's a bit harder than just checking this on the VSM server. 

Perhaps not very useful after all. Re-targeting. 
Comment 2 Karl Mikaelsson cendio 2016-11-08 14:22:16 CET
*** Bug 5966 has been marked as a duplicate of this bug. ***
Comment 5 Pierre Ossman cendio 2020-10-09 09:57:07 CEST
Also note bug 4429 which may remove some of the need for this feature, depending on how it turns out.
Comment 6 Pierre Ossman cendio 2020-10-29 10:33:40 CET
A use case that's been described to us is users making heavy use of GPUs. The amount of RAM is a limiting factor and many applications are designed to get exclusive access and don't gracefully deal with memory running out.

So our users want to work around this by limiting the number of users per agent, and hence per GPU. In some cases to the extreme of one user per agent. For them it is a better user experience to be denied a session than applications crashing or erroring out.
Comment 11 Pierre Ossman cendio 2022-10-04 10:06:31 CEST
Another use case is security, where the admin wants to minimize the risk of data leaking between users by only having a single user per system.

Note that this assumes that there is no substantial risk of things being left behind by a user as an agent can be reused for a different user once the first user logs out.
Comment 17 Emelie cendio 2024-10-14 15:36:06 CEST
(In reply to Peter Åstrand from comment #1)
> To be useful, this should probably be settable *per*agent*; not per cluster.
> That's a bit harder than just checking this on the VSM server. 
As the years have passed, we have decided that limiting based on subcluster instead of agent is a suitable way forward. 

There are a few reasons for this:
 1) It is easier to implement
 2) We haven't had customers ask for this setting at agent level, but rather that
    setting this per subcluster (e.g. limiting to X users per agent) is enough


Additionally, we haven't decided if this setting will limit users or sessions per agent and subcluster. We will likely choose the easiest path forward, as most setups only use 1 session per user.
Comment 26 Emelie cendio 2024-10-22 11:31:44 CEST
A new configuration parameter was added: /vsmserver/subclusters/<name>/max_agents_per_user. It controls the maximum number of concurrent users allowed per agent within the subcluster. A value of 0 indicates unlimited users. When the limit is reached, the agent will no longer be considered by the load balancer for new users.

This new feature was added to the load balancer, as an approach to filter out agents that have exceeded the limit. Agents that have been filtered out are no longer available for new users. The filtering is done after the load balancer has already selected a subcluster, which results in the user being denied if there are no available agents. 

Imagine a scenario where you have two subclusters; one default (1) and one with user associations (2). If the limit max_users_per_agent is reached on all agents on the latter cluster, then the user that is associated with cluster 2 and attempts to log in will be denied, even if there are agents available on cluster 1. A user without subcluster associations will be able to log in to cluster 1 as usual.
Comment 27 Emelie cendio 2024-10-22 12:15:43 CEST
The testing setup for this was a cluster with multiple agents in different configurations. First, the default value for max_users_per_agents (unlimited) was verified to not affect the usual behaviour. Then, given max_users_per_agent=1, four agents and five users, it was verified that the first four users ended up on different agents and that the fifth user was denied.

Then, multiple subclusters were set up with different combinations of agents and user associations to verify the behaviour in situations where agents belonging to several subclusters as well as when users were associated with multiple subclusters. In the case with agents in several subclusters, the lowest limit will apply to the agent. Associating a user to several subclusters is undefined behaviour, and the administrator will be warned about this.

Lastly, multisession was enabled to verify that several sessions for the same user on one agent did not count against the limit. When the same user gets sessions on two different agents, both sessions will count towards the limit.
Comment 32 Samuel Mannehed cendio 2024-10-22 17:28:44 CEST
Note that the subcluster popup screenshot in the TAG will have to be updated to include these changes:

https://www.cendio.com/resources/docs/tag-devel/html/tlwebadm_vsm.html
Comment 35 Samuel Mannehed cendio 2024-10-25 10:30:57 CEST
A new, more informative, error message is added to ThinLinc client as well as ThinLinc Web Access. It is shown in situations where a new user tries to log in and the number of concurrent users exceeds the limit max_users_per_agent configured by the system administrator. The following error message is shown: "You have exceeded a limit configured by your system administrator." The system will now differentiate between the error message in a general agent unavailability (“No agent server was available”) and this specific user limit exceeded condition.
Comment 38 Linn cendio 2024-10-28 16:31:49 CET
While we were adding a new error message for max_users_per_agent, we also added a few general error messages to the client as part of this bug.

These general error messages are not currently in use, but only create the infrastructure for them. The idea is to add them as future proofing, so that when we want to start using one of the error messages, all supported clients should already have the infrastructure ready. That means that no extra backwards compatibility should be needed.
Comment 39 Linn cendio 2024-10-28 17:14:35 CET
The only thing left for this bug is generating new screenshots. Testing can start before this is done, marking as resolved.
Comment 42 Linn cendio 2024-10-29 11:00:07 CET
Looked through the changes to the code and documentation, as well as the updated screenshot and release notes. Looks all good (after fixing a small typo in the documentation).
Comment 43 Tobias cendio 2024-10-29 14:45:46 CET
Tested using server build #3761 on RHEL9 and client build #3652 on Fedora40.

Under singlesession circumstances, users are successfully logged in or rejected
as intended, based on the value of max_users_per_agent configured to -something,
0, or +something. However, under multisession circumstances, things aren't
working as intended anymore.

For instance, on an empty agent, if max_users_per_agent is set to 1 combined
with multiple allowed sessions, a single user will be rejected upon its second
log in. It appears we're not accounting for which specific user is attempting to
log in when filtering agents based on user limits.
Comment 47 Samuel Mannehed cendio 2024-10-30 13:44:29 CET
(In reply to Tobias from comment #43)
> Tested using server build #3761 on RHEL9 and client build #3652 on Fedora40.
> 
> Under singlesession circumstances, users are successfully logged in or
> rejected
> as intended, based on the value of max_users_per_agent configured to
> -something,
> 0, or +something. However, under multisession circumstances, things aren't
> working as intended anymore.
> 
> For instance, on an empty agent, if max_users_per_agent is set to 1 combined
> with multiple allowed sessions, a single user will be rejected upon its
> second
> log in. It appears we're not accounting for which specific user is
> attempting to
> log in when filtering agents based on user limits.

Good catch! There were indeed some holes in our logic, this was fixed in r41289.
Comment 48 Tobias cendio 2024-11-04 14:39:51 CET
Getting a traceback in the following scenario:
- Two subclusters with non-overlapping agents and users, as in Subcl1=(Agent1, User1) and Subcl2=(Agent2, User2).
- Both subclusters exhibit zero or positive max_users_per_agent

User1 logs in seamlessly as intended, while User2 is left with a frozen client. The loadbalancer has crashed according to the traceback log.

It appears to be the case that upon User2's log in, the filter function is handed a session list (now containing User1's session) incompatible with the list of possible agent hosts for User2.

> 2024-11-04 14:37:53 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/loadbalancer.py", line 80, in _filter_agents_with_limit_reached
> 2024-11-04 14:37:53 ERROR vsmserver:     iIIiii1iI [ iiIi1 ] . add ( iII1II11iI1 )
> 2024-11-04 14:37:53 ERROR vsmserver: KeyError: 'Agent1'
Comment 51 Samuel Mannehed cendio 2024-11-05 09:30:54 CET
(In reply to Tobias from comment #48)
> Getting a traceback in the following scenario:
> - Two subclusters with non-overlapping agents and users, as in
> Subcl1=(Agent1, User1) and Subcl2=(Agent2, User2).
> - Both subclusters exhibit zero or positive max_users_per_agent
> 
> User1 logs in seamlessly as intended, while User2 is left with a frozen
> client. The loadbalancer has crashed according to the traceback log.
> 
> It appears to be the case that upon User2's log in, the filter function is
> handed a session list (now containing User1's session) incompatible with the
> list of possible agent hosts for User2.
> 
> > 2024-11-04 14:37:53 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/loadbalancer.py", line 80, in _filter_agents_with_limit_reached
> > 2024-11-04 14:37:53 ERROR vsmserver:     iIIiii1iI [ iiIi1 ] . add ( iII1II11iI1 )
> > 2024-11-04 14:37:53 ERROR vsmserver: KeyError: 'Agent1'

Good find! This was fixed in r41299. There were no unit tests for the scenario where specific users or groups were associated with subclusters.
Comment 52 Tobias cendio 2024-11-05 15:31:39 CET
This was tested using server build #3781 on RHEL9 and client build #3666 on
Fedora40.

Testing was divided into two parts: first part with a single subcluster and two
agents, and the second part with two subclusters and two agents.

Single subcluster, two agents
-----------------------------

Confirmed that users starting multiple sessions will have all their sessions
hosted on the same agent where their first session was started. In other words,
new sessions remains correctly prioritized on agents with existing sessions of
the same users. If the agent user limit is set to a negative number with active
sessions running, no new sessions can be created, by the active sessions remain.

This was confirmed to be independent of current agent ratings and the max users
limit, although beholden to the max sessions limit, as expected. In addition,
the max users limit is confirmed to be respected on all agents in the
subcluster.

Two subclusters, two agents
---------------------------

Confirmed that when the agents are associated with one subcluster each, the
correct max users limit is respected by each agent. In the second case, when the
agents are associated with both subclusters, it was confirmed that the lowest of
the user limits set by each subcluster is respected by the agents.
Comment 53 Linn cendio 2024-11-05 15:42:59 CET
Found another error when max_users_per_agent is not an int. I input max_users_per_agent=rr manually into the hconf file, which gave the following traceback when I tried to log in:
> 2024-11-05 15:02:31 ERROR vsmserver: handle: <Handle NewSessionHandler.get_user_groups_finished(<Future>)>
> 2024-11-05 15:02:31 ERROR vsmserver: ----------------------------------------
> 2024-11-05 15:02:31 ERROR vsmserver: Traceback (most recent call last):
> 2024-11-05 15:02:31 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/hiveconf.py", line 219, in get_integer
> 2024-11-05 15:02:31 ERROR vsmserver:     return int(self._value)
> 2024-11-05 15:02:31 ERROR vsmserver: ValueError: invalid literal for int() with base 10: 'rr'
> 2024-11-05 15:02:31 ERROR vsmserver: 
> 2024-11-05 15:02:31 ERROR vsmserver: During handling of the above exception, another exception occurred:
> 2024-11-05 15:02:31 ERROR vsmserver: 
> 2024-11-05 15:02:31 ERROR vsmserver: Traceback (most recent call last):
> 2024-11-05 15:02:31 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/loadbalancer.py", line 92, in _filter_agents_with_limit_reached
> 2024-11-05 15:02:31 ERROR vsmserver:     I111II111I1I . append ( self . vsmserver . hive . get_integer ( iiIiII11i11II % OOOO0O0ooO0O , 0 ) )
> 2024-11-05 15:02:31 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/hiveconf.py", line 477, in get_integer
> 2024-11-05 15:02:31 ERROR vsmserver:     return self._get_value(parampath, default, Parameter.get_integer)
> 2024-11-05 15:02:31 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/hiveconf.py", line 468, in _get_value
> 2024-11-05 15:02:31 ERROR vsmserver:     return method(param)
> 2024-11-05 15:02:31 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/hiveconf.py", line 221, in get_integer
> 2024-11-05 15:02:31 ERROR vsmserver:     raise BadIntegerFormat()
> 2024-11-05 15:02:31 ERROR vsmserver: thinlinc.hiveconf.BadIntegerFormat
> 2024-11-05 15:02:31 ERROR vsmserver: 
> 2024-11-05 15:02:31 ERROR vsmserver: During handling of the above exception, another exception occurred:
> 2024-11-05 15:02:31 ERROR vsmserver: 
> 2024-11-05 15:02:31 ERROR vsmserver: Traceback (most recent call last):
> 2024-11-05 15:02:31 ERROR vsmserver:   File "/usr/lib64/python3.6/asyncio/events.py", line 145, in _run
> 2024-11-05 15:02:31 ERROR vsmserver:     self._callback(*self._args)
> 2024-11-05 15:02:31 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/handler_newsession.py", line 73, in get_user_groups_finished
> 2024-11-05 15:02:31 ERROR vsmserver:     self . allowed_thinlinc_user ( )
> 2024-11-05 15:02:31 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/handler_newsession.py", line 87, in allowed_thinlinc_user
> 2024-11-05 15:02:31 ERROR vsmserver:     self . allowed_thinlinc_user_finished ( True )
> 2024-11-05 15:02:31 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/handler_newsession.py", line 92, in allowed_thinlinc_user_finished
> 2024-11-05 15:02:31 ERROR vsmserver:     self . check_session_limit ( )
> 2024-11-05 15:02:31 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/handler_newsession.py", line 102, in check_session_limit
> 2024-11-05 15:02:31 ERROR vsmserver:     self . check_license ( )
> 2024-11-05 15:02:31 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/handler_newsession.py", line 112, in check_license
> 2024-11-05 15:02:31 ERROR vsmserver:     self . license_check_ok ( )
> 2024-11-05 15:02:31 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/handler_newsession.py", line 121, in license_check_ok
> 2024-11-05 15:02:31 ERROR vsmserver:     self . find_best_agent ( )
> 2024-11-05 15:02:31 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/handler_newsession.py", line 128, in find_best_agent
> 2024-11-05 15:02:31 ERROR vsmserver:     ( self . agents_to_try , OooO0O00o0 ) = self . parent . loadbalancer . get_best_agents ( self . username , self . user_groups )
> 2024-11-05 15:02:31 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/loadbalancer.py", line 129, in get_best_agents
> 2024-11-05 15:02:31 ERROR vsmserver:     username )
> 2024-11-05 15:02:31 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/loadbalancer.py", line 93, in _filter_agents_with_limit_reached
> 2024-11-05 15:02:31 ERROR vsmserver:     except self . vsmserver . hive . BadIntegerFormat :
> 2024-11-05 15:02:31 ERROR vsmserver: AttributeError: 'Folder' object has no attribute 'BadIntegerFormat'
> 2024-11-05 15:02:31 ERROR vsmserver: ----------------------------------------
Comment 55 Emelie cendio 2024-11-05 15:54:42 CET
Solved the error mentioned in comment 53 in r41304. Tested and verified on Fedora 40 that we get an error message instead of a traceback.
Comment 56 Linn cendio 2024-11-05 16:59:38 CET
Tested the following on SLES 15 with server build 3769. All tests were done with a setup of a single agent, on the same machine as the master.

Limits set by max_users_per_agent are respected
-----------------------------------------------
* One session per user:
 ✓ max_users_per_agent=0 does not limit the number of allowed users
 ✓ Limit is respected when setting a positive value (I tested max_users_per_agent=1
   and =2)
 ✓ Setting max_users_per_agent to a negative value does not give errors in the log
 ✓ Setting max_users_per_agent an invalid value (e.g. max_users_per_agent=rr)
   correctly shows a warning in vsmserver.log (tested with server build
   3785):
> WARNING vsmserver.loadinfo: Invalid 'max_users_per_agent' for subcluster 'Default', expected integer. Setting no limit.
* Multi session:
 - Setup: 2 users, each with 2 running sessions and max_users_per_agent=4
 ✓ When a user with an existing session logs in to a new session, the new session
   is created
 ✓ When a user with no running sessions logs in, log in is cancelled and the error
   message below is shown:
> ThinLinc login failed.
> (You have exceeded a limit configured by your system administrator. Contact them for assistance.)

Interaction between limits by licenses and limits by max_users_per_agent
------------------------------------------------------------------------
 ✓ The hard limit for licenses can be hit when max_users_per_agent=0, i.e. our
   license handling still works as before (shows error message [2])

* Hitting the limit for max_users_per_agent simultaneously as hitting the hard
  limit for licenses
 * One session per user:
   - Setup: 2 licenses, 2 users logged in, max_users_per_agent=2
   ✓ Logging in with a new user gives error message:

 * Multi session:
   - Setup: 4 licences, 2 user with 2 sessions each logged in,
     max_users_per_agent=4
   ✓ When a user with no running sessions logs in, log in in cancelled and error
     message is shown [2]
   ✓ When a user with an existing session logs in to a new session (from another
     machine so it consumes a licence), log in is cancelled and an error message
      is shown [2]

[2] Error message:
> ThinLinc login failed
> (Cannot allocate Licences)
Comment 57 Linn cendio 2024-11-05 17:05:43 CET
Tested the AC of this bug, as well as looked through the code for the straggler commits. With that everything is tested, closing!

> MUST:
> ✅ Must be able to limit the number of users per agent on a ThinLinc cluster
> 
> ✅ Users must not be able to log in when the limit is reached
Yes, see comment 52 and comment 56 for details on what was tested.

> ✅ The default limit must not affect uninterested sysadmins
The default value for this is set to 0, which means unlimited. If no changes are done to the default value, ThinLinc places session in the same way as before adding parameter max_users_per_agent.

> ✅ When limit is reached, it should be communicated in the log
Yes, when the limit is set to 1 I see this in vsmserver.log:
> WARNING vsmserver.loadinfo: Agent 127.0.01 has reached maximum number of users (1)
> WARNING vsmserver.loadinfo: All agents for user cendio2 have reached maximum number of users
> WARNING vsmserver: No working agents found trying to start new session for cendio2

> SHOULD:
> ✅ Should be able to configure the limit in Web Admin
Yes, changing the value in webadmin updates the value in vsmserver.hconf. Also, webadmin has input validation that protects agains non-numbers and negative values.

If setting Max users per agent to a negative value and clicking save, neither the radio button for "No limit" or the input box will be selected. When clicking save and checking the entry, option "No limit" is selected. (In this scenario, some data will also disappear from the entry after clicking save. See bug 8103 for details.) 

> ✅ The config-variable and its usage should be documented
Yes, there is a short explanation of the parameter in vsmserver.hconf, and it is documented in the TAG under Subcluster configuration. The updates to the Web Admin interface are also documented.

> SHOULD:
> ✅ End users receives an error message
> 
> COULD:
> ✅ End users should get an informative error message
Depending on how the login is done, the end user gets different error messages: 
   
Message shown for new client builds and Web Access:
> ThinLinc login failed.
> (You have exceeded a limit configured by your system administrator. Contact them for assistance.)
Older client builds:
> ThinLinc login failed.
> (No agent server was available)
Older Web Access:
Not applicable, Web Access is always the same version as the server that runs it.
Comment 58 Linn cendio 2024-11-06 09:44:12 CET
(In reply to Linn from comment #56)
> Tested the following on SLES 15 with server build 3769. All tests were done
> with a setup of a single agent, on the same machine as the master.
> 
> Limits set by max_users_per_agent are respected
> -----------------------------------------------
> * One session per user:
> ...
> * Multi session:
>  - Setup: 2 users, each with 2 running sessions and max_users_per_agent=4
>  ✓ When a user with an existing session logs in to a new session, the new
> session
>    is created
>  ✓ When a user with no running sessions logs in, log in is cancelled and the
> error
>    message below is shown:
> > ThinLinc login failed.
> > (You have exceeded a limit configured by your system administrator. Contact them for assistance.)
Correction to the "Setup" step for the multi session - should be max_users_per_agent=2, not =4 as stated above.

Note You need to log in before you can comment on or make changes to this bug.