Bug 7683 - HA new session conflict handling crashes
Summary: HA new session conflict handling crashes
Status: NEW
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: VSM Server (show other bugs)
Version: trunk
Hardware: PC Unknown
: P2 Normal
Target Milestone: LowPrio
Assignee: Bugzilla mail exporter
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-04-15 14:32 CEST by Frida
Modified: 2023-12-20 16:22 CET (History)
1 user (show)

See Also:
Acceptance Criteria:


Attachments

Description Frida cendio 2021-04-15 14:32:22 CEST
Not entirely sure how to reproduce this, did not manage to do it again. 

I was testing HA (Build 2005) on two SLES 12 machines, I had started a session from one master (Node 1) then trying to reconnect via the other master (Node 2). Got this traceback on Node 2:

> 2021-04-15 10:49:39 INFO vsmserver.license: Updating license data from disk to memory
> 2021-04-15 10:49:39 INFO vsmserver.license: License summary: 5 concurrent users. Hard limit of 6 concurrent users. 
> 2021-04-15 10:49:39 INFO vsmserver.session: Loaded 1 sessions for 1 users from file
> 2021-04-15 10:49:48 ERROR vsmserver.session: Unhandled exception trying to verify sessions on VSM Agent 10.48.2.253:904: <class 'KeyError'> 'error' Traceback (most recent call last):
>   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 105, in ooOOOoOO0
>     obj . handle_read_event ( )
>   File "/usr/lib64/python3.4/asyncore.py", line 442, in handle_read_event
>     self.handle_read()
>   File "/usr/lib64/python3.4/asynchat.py", line 151, in handle_read
>     self.found_terminator()
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 426, in found_terminator
>     self . handle_response ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 458, in handle_response
>     self . handle_returnvalue ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/call_verifysessions.py", line 39, in handle_returnvalue
>     self . callback ( self . returnvalue , * self . cbparams [ 0 ] , ** self . cbparams [ 1 ] )
>   File "/opt/thinlinc/modules/thinlinc/vsm/handler_sessionchange.py", line 125, in verify_sessions_finished
>     if IiIIII111 [ 'error' ] or not IiIIII111 [ 'alive' ] :
> KeyError: 'error'
> .
> 2021-04-15 10:49:48 WARNING vsmserver.HA: Tried to transfer session change (delete,cendio/10.48.2.253:10) to other node but other node reported HA_NOSUCHSESSION


And this was found in the agent log at the same time:
> 2021-04-15 10:49:48 INFO vsmagent.session: Verified connectivity to newly started Xvnc for cendio
> 2021-04-15 10:49:48 WARNING vsmagent.sessions: Broken session for user cendio, tl-session process 7686 does not exist
> 2021-04-15 10:50:23 INFO vsmagent.session: Verified connectivity to newly started Xvnc for cendio


The issue is that the 'error' key is not always present but handler_sessionchange takes that for granted in some cases.
Comment 1 Pierre Ossman cendio 2021-04-16 09:33:45 CEST
This seems to have been broken since r32077 on bug 5489. So I assume we didn't properly test things then.


The scenario for this code is when there is conflicting information between the masters. This is extremely unlikely to happen, but should be possible to simulate using these steps:

 1. Stop one master
 2. Create a session
 3. Stop the second master
 4. Kill the session
 5. Start the master in 1.
 6. Create a session, making sure it gets the same agent and display as in 2.
 7. Start the second master

At this point both masters want to report to the other they have a new session, however they claim the same agent and display, which is impossible. Hence why the master(s) verify the sessions to see which is still running and which isn't.
Comment 2 Adam Halim cendio 2023-12-18 17:24:31 CET
I was able to reproduce this error following the steps in comment #1 using build r40664. The error messages are printed once per second and stopped after /var/lib/vsm/session was removed on both nodes.

Log from primary master:
> 2023-12-18 17:17:44 DEBUG vsmserver: Handling method 'sessionchange' from ('10.48.2.88', 1022)
> 2023-12-18 17:17:44 DEBUG vsmserver.HA: Handling session change (new, cendio/lab-170:10)
> 2023-12-18 17:17:44 DEBUG2 vsmserver.HA: Session for cendio on agent lab-170 with display 10 exists in session database - validating both infos
> 2023-12-18 17:17:44 WARNING vsmserver.HA: Error updating other HA node: Operation timed out
> 2023-12-18 17:17:44 ERROR vsmserver: Exception in callback SessionChangeHandler._verify_sessions_done(<Future>)
> 2023-12-18 17:17:44 ERROR vsmserver: handle: <Handle SessionChangeHandler._verify_sessions_done(<Future>)>
> 2023-12-18 17:17:44 ERROR vsmserver: ----------------------------------------
> 2023-12-18 17:17:44 ERROR vsmserver: Traceback (most recent call last):
> 2023-12-18 17:17:44 ERROR vsmserver:   File "/usr/lib64/python3.4/asyncio/events.py", line 125, in _run
> 2023-12-18 17:17:44 ERROR vsmserver:     self._callback(*self._args)
> 2023-12-18 17:17:44 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/handler_sessionchange.py", line 134, in _verify_sessions_done
> 2023-12-18 17:17:44 ERROR vsmserver:     if iIiIiiIIIiII [ 'error' ] or not iIiIiiIIIiII [ 'alive' ] :
> 2023-12-18 17:17:44 ERROR vsmserver: KeyError: 'error'
> 2023-12-18 17:17:44 ERROR vsmserver: ----------------------------------------
> 2023-12-18 17:17:45 ERROR vsmserver: Exception in callback ServerProtocol._callback_done(<Future>)
> 2023-12-18 17:17:45 ERROR vsmserver: handle: <Handle ServerProtocol._callback_done(<Future>)>
> 2023-12-18 17:17:45 ERROR vsmserver: ----------------------------------------
> 2023-12-18 17:17:45 ERROR vsmserver: Traceback (most recent call last):
> 2023-12-18 17:17:45 ERROR vsmserver:   File "/usr/lib64/python3.4/asyncio/events.py", line 125, in _run
> 2023-12-18 17:17:45 ERROR vsmserver:     self._callback(*self._args)
> 2023-12-18 17:17:45 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 345, in _callback_done
> 2023-12-18 17:17:45 ERROR vsmserver:     assert future is self . _response
> 2023-12-18 17:17:45 ERROR vsmserver: AssertionError
> 2023-12-18 17:17:45 ERROR vsmserver: ----------------------------------------
Log from secondary master:
> 2023-12-18 17:16:49 WARNING vsmserver.HA: Error updating other HA node: Operation timed out
> 2023-12-18 17:16:49 DEBUG vsmserver: Handling method 'sessionchange' from ('10.48.2.170', 1018)
> 2023-12-18 17:16:49 DEBUG vsmserver.HA: Handling session change (new, cendio/lab-170:10)
> 2023-12-18 17:16:49 DEBUG2 vsmserver.HA: Session for cendio on agent lab-170 with display 10 exists in session database - validating both infos
> 2023-12-18 17:16:49 ERROR vsmserver: Exception in callback SessionChangeHandler._verify_sessions_done(<Future>)
> 2023-12-18 17:16:49 ERROR vsmserver: handle: <Handle SessionChangeHandler._verify_sessions_done(<Future>)>
> 2023-12-18 17:16:49 ERROR vsmserver: ----------------------------------------
> 2023-12-18 17:16:49 ERROR vsmserver: Traceback (most recent call last):
> 2023-12-18 17:16:49 ERROR vsmserver:   File "/usr/lib64/python3.4/asyncio/events.py", line 125, in _run
> 2023-12-18 17:16:49 ERROR vsmserver:     self._callback(*self._args)
> 2023-12-18 17:16:49 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/handler_sessionchange.py", line 134, in _verify_sessions_done
> 2023-12-18 17:16:49 ERROR vsmserver:     if iIiIiiIIIiII [ 'error' ] or not iIiIiiIIIiII [ 'alive' ] :
> 2023-12-18 17:16:49 ERROR vsmserver: KeyError: 'error'
> 2023-12-18 17:16:49 ERROR vsmserver: ----------------------------------------
> 2023-12-18 17:16:50 ERROR vsmserver: Exception in callback ServerProtocol._callback_done(<Future>)
> 2023-12-18 17:16:50 ERROR vsmserver: handle: <Handle ServerProtocol._callback_done(<Future>)>
> 2023-12-18 17:16:50 ERROR vsmserver: ----------------------------------------
> 2023-12-18 17:16:50 ERROR vsmserver: Traceback (most recent call last):
> 2023-12-18 17:16:50 ERROR vsmserver:   File "/usr/lib64/python3.4/asyncio/events.py", line 125, in _run
> 2023-12-18 17:16:50 ERROR vsmserver:     self._callback(*self._args)
> 2023-12-18 17:16:50 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 345, in _callback_done
> 2023-12-18 17:16:50 ERROR vsmserver:     assert future is self . _response
> 2023-12-18 17:16:50 ERROR vsmserver: AssertionError
Comment 3 Adam Halim cendio 2023-12-20 16:22:20 CET
The reason this crash happens is that the response from handler_verifysessions contains either the error key, or the alive key, but never both at the same time. In handler_sessionchange, we check if the error key, or if the alive key is present, which results in a crash if the error key is not present.

The values for both keys are booleans, and a simple fix would simply be to set the values for both keys no matter what.

Note You need to log in before you can comment on or make changes to this bug.