Bug 7683 - HA new session conflict handling crashes
Summary: HA new session conflict handling crashes
Status: NEW
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: VSM Server (show other bugs)
Version: trunk
Hardware: PC Unknown
: P2 Normal
Target Milestone: LowPrio
Assignee: Bugzilla mail exporter
Depends on:
Reported: 2021-04-15 14:32 CEST by Frida
Modified: 2021-06-08 13:28 CEST (History)
0 users

See Also:
Acceptance Criteria:


Description Frida cendio 2021-04-15 14:32:22 CEST
Not entirely sure how to reproduce this, did not manage to do it again. 

I was testing HA (Build 2005) on two SLES 12 machines, I had started a session from one master (Node 1) then trying to reconnect via the other master (Node 2). Got this traceback on Node 2:

> 2021-04-15 10:49:39 INFO vsmserver.license: Updating license data from disk to memory
> 2021-04-15 10:49:39 INFO vsmserver.license: License summary: 5 concurrent users. Hard limit of 6 concurrent users. 
> 2021-04-15 10:49:39 INFO vsmserver.session: Loaded 1 sessions for 1 users from file
> 2021-04-15 10:49:48 ERROR vsmserver.session: Unhandled exception trying to verify sessions on VSM Agent <class 'KeyError'> 'error' Traceback (most recent call last):
>   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 105, in ooOOOoOO0
>     obj . handle_read_event ( )
>   File "/usr/lib64/python3.4/asyncore.py", line 442, in handle_read_event
>     self.handle_read()
>   File "/usr/lib64/python3.4/asynchat.py", line 151, in handle_read
>     self.found_terminator()
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 426, in found_terminator
>     self . handle_response ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 458, in handle_response
>     self . handle_returnvalue ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/call_verifysessions.py", line 39, in handle_returnvalue
>     self . callback ( self . returnvalue , * self . cbparams [ 0 ] , ** self . cbparams [ 1 ] )
>   File "/opt/thinlinc/modules/thinlinc/vsm/handler_sessionchange.py", line 125, in verify_sessions_finished
>     if IiIIII111 [ 'error' ] or not IiIIII111 [ 'alive' ] :
> KeyError: 'error'
> .
> 2021-04-15 10:49:48 WARNING vsmserver.HA: Tried to transfer session change (delete,cendio/ to other node but other node reported HA_NOSUCHSESSION

And this was found in the agent log at the same time:
> 2021-04-15 10:49:48 INFO vsmagent.session: Verified connectivity to newly started Xvnc for cendio
> 2021-04-15 10:49:48 WARNING vsmagent.sessions: Broken session for user cendio, tl-session process 7686 does not exist
> 2021-04-15 10:50:23 INFO vsmagent.session: Verified connectivity to newly started Xvnc for cendio

The issue is that the 'error' key is not always present but handler_sessionchange takes that for granted in some cases.
Comment 1 Pierre Ossman cendio 2021-04-16 09:33:45 CEST
This seems to have been broken since r32077 on bug 5489. So I assume we didn't properly test things then.

The scenario for this code is when there is conflicting information between the masters. This is extremely unlikely to happen, but should be possible to simulate using these steps:

 1. Stop one master
 2. Create a session
 3. Stop the second master
 4. Kill the session
 5. Start the master in 1.
 6. Create a session, making sure it gets the same agent and display as in 2.
 7. Start the second master

At this point both masters want to report to the other they have a new session, however they claim the same agent and display, which is impossible. Hence why the master(s) verify the sessions to see which is still running and which isn't.

Note You need to log in before you can comment on or make changes to this bug.