Not entirely sure how to reproduce this, did not manage to do it again.
I was testing HA (Build 2005) on two SLES 12 machines, I had started a session from one master (Node 1) then trying to reconnect via the other master (Node 2). Got this traceback on Node 2:
> 2021-04-15 10:49:39 INFO vsmserver.license: Updating license data from disk to memory
> 2021-04-15 10:49:39 INFO vsmserver.license: License summary: 5 concurrent users. Hard limit of 6 concurrent users.
> 2021-04-15 10:49:39 INFO vsmserver.session: Loaded 1 sessions for 1 users from file
> 2021-04-15 10:49:48 ERROR vsmserver.session: Unhandled exception trying to verify sessions on VSM Agent 10.48.2.253:904: <class 'KeyError'> 'error' Traceback (most recent call last):
> File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 105, in ooOOOoOO0
> obj . handle_read_event ( )
> File "/usr/lib64/python3.4/asyncore.py", line 442, in handle_read_event
> File "/usr/lib64/python3.4/asynchat.py", line 151, in handle_read
> File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 426, in found_terminator
> self . handle_response ( )
> File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 458, in handle_response
> self . handle_returnvalue ( )
> File "/opt/thinlinc/modules/thinlinc/vsm/call_verifysessions.py", line 39, in handle_returnvalue
> self . callback ( self . returnvalue , * self . cbparams [ 0 ] , ** self . cbparams [ 1 ] )
> File "/opt/thinlinc/modules/thinlinc/vsm/handler_sessionchange.py", line 125, in verify_sessions_finished
> if IiIIII111 [ 'error' ] or not IiIIII111 [ 'alive' ] :
> KeyError: 'error'
> 2021-04-15 10:49:48 WARNING vsmserver.HA: Tried to transfer session change (delete,cendio/10.48.2.253:10) to other node but other node reported HA_NOSUCHSESSION
And this was found in the agent log at the same time:
> 2021-04-15 10:49:48 INFO vsmagent.session: Verified connectivity to newly started Xvnc for cendio
> 2021-04-15 10:49:48 WARNING vsmagent.sessions: Broken session for user cendio, tl-session process 7686 does not exist
> 2021-04-15 10:50:23 INFO vsmagent.session: Verified connectivity to newly started Xvnc for cendio
The issue is that the 'error' key is not always present but handler_sessionchange takes that for granted in some cases.
This seems to have been broken since r32077 on bug 5489. So I assume we didn't properly test things then.
The scenario for this code is when there is conflicting information between the masters. This is extremely unlikely to happen, but should be possible to simulate using these steps:
1. Stop one master
2. Create a session
3. Stop the second master
4. Kill the session
5. Start the master in 1.
6. Create a session, making sure it gets the same agent and display as in 2.
7. Start the second master
At this point both masters want to report to the other they have a new session, however they claim the same agent and display, which is impossible. Hence why the master(s) verify the sessions to see which is still running and which isn't.