Bug 7592 - Crash in a delayed call kills vsmserver/vsmagent
Summary: Crash in a delayed call kills vsmserver/vsmagent
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: VSM Server (show other bugs)
Version: trunk
Hardware: PC Unknown
: P2 Normal
Target Milestone: 4.16.0
Assignee: Adam Halim
URL:
Keywords: prosaic, samuel_tester
Depends on:
Blocks:
 
Reported: 2020-11-19 09:34 CET by Pierre Ossman
Modified: 2023-10-25 10:45 CEST (History)
4 users (show)

See Also:
Acceptance Criteria:
MUST * Not kill vsmserver/vsmagent when delayed call crashes SHOULD * Log when delayed call crashes


Attachments

Description Pierre Ossman cendio 2020-11-19 09:34:04 CET
The crash in bug 6178 is fatal, which is probably not a good idea:

> Traceback (most recent call last):
>   File "/opt/thinlinc/sbin/vsmagent", line 23, in <module>
>     VSMAgent ( sys . argv )
>   File "/opt/thinlinc/modules/thinlinc/vsm/vsmagent.py", line 169, in __init__
>     self . loop ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/async.py", line 423, in loop
>     I11I11i1I = self . run_delayed_calls ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/async.py", line 381, in run_delayed_calls
>     oOO0OOOo . func ( * oOO0OOOo . args , ** oOO0OOOo . kw )
>   File "/opt/thinlinc/modules/thinlinc/vsm/handler_reqsession.py", line 397, in wait_on_vnc_port
>     self . wait_on_vnc_port , O0ooooOOoo0O )
>   File "/opt/thinlinc/modules/thinlinc/vsm/portopencheck.py", line 40, in __init__
>     self . connect ( addr )
>   File "/usr/lib64/python2.7/asyncore.py", line 354, in connect
>     self.handle_connect_event()
>   File "/usr/lib64/python2.7/asyncore.py", line 455, in handle_connect_event
>     self.handle_connect()
>   File "/opt/thinlinc/modules/thinlinc/vsm/portopencheck.py", line 55, in handle_connect
>     self . callback ( Ii1iI )
>   File "/opt/thinlinc/modules/thinlinc/vsm/handler_reqsession.py", line 386, in wait_on_vnc_port
>     self . xvnc_ready ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/handler_reqsession.py", line 420, in xvnc_ready
>     self . parent . update_sshd_permitopen ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/vsmagent.py", line 407, in update_sshd_permitopen
>     OOoo = locale_encode ( pwd . getpwuid ( iIiIiIiI ) . pw_name )
>   File "/opt/thinlinc/modules/thinlinc/locale.py", line 19, in locale_encode
>     return s . encode ( i1ii1 )
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
> 

We should probably try to limp along in a case like this, just like we do when a handler crashes.
Comment 1 Pierre Ossman cendio 2021-04-20 14:30:11 CEST
We want to log the crash though, which is currently problematic as this code is too low level to have access to the logging. Some redesign is needed first.
Comment 2 Linn cendio 2022-05-19 09:05:31 CEST
When doing bug 7847, the function update_sshd_permitopen was restructured so the call to it is inside a try/catch. This likely solves this bug as well since that should remove the fatalness of the crashes, but testing is needed.

About the logging, we do log a message and the traceback if we catch such a crash, so I don't believe there are any issues on that part.
Comment 3 Adam Halim cendio 2023-10-04 11:20:30 CEST
Tested using the latest build, and triggering a crash in a delayed call does not kill the server. This was done by creating a fake agent and raising an exception in the timeout handler in xmlrpc.py. This will be triggered when trying to get load info. The triggered exception is not logged at all.
Comment 5 Adam Halim cendio 2023-10-11 14:41:17 CEST
(In reply to Adam Halim from comment #3)
> The triggered exception is not logged at all.
This is not true, the exception was logged but without any timestamp. This has now been fixed.

Tested by raising an exception in _periodic_update in loadbalancer.py.
Comment 6 Adam Halim cendio 2023-10-11 14:43:37 CEST
The acceptance criteria are met.

MUST
  * Not kill vsmserver/vsmagent when delayed call crashes ✓

SHOULD
  * Log when delayed call crashes ✓
Comment 8 Adam Halim cendio 2023-10-13 09:00:29 CEST
The following is logged when raising an exception in delay_connect() in portopencheck.py:

> 2023-10-12 16:19:19 ERROR vsmagent: Unhandled event loop exception: <class 'Exception'> Exception!
> Exception in callback CheckTCPPortOpen.delay_connect()
> handle: <Handle CheckTCPPortOpen.delay_connect() created at /opt/thinlinc/modules/thinlinc/vsm/portopencheck.py:32>
> source_traceback: Object created at (most recent call last):
>   File "/opt/thinlinc/modules/thinlinc/vsm/handler_reqsession.py", line 137, in find_free_display
>     self . set_sessionkey ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/handler_reqsession.py", line 200, in set_sessionkey
>     self . set_vncpassword ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/handler_reqsession.py", line 242, in set_vncpassword
>     self . correct_system_sockets ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/handler_reqsession.py", line 305, in correct_system_sockets
>     self . check_homedir ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/handler_reqsession.py", line 346, in check_homedir
>     self . start_session ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/handler_reqsession.py", line 363, in start_session
>     self . wait_on_vnc_port ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/handler_reqsession.py", line 390, in wait_on_vnc_port
>     CheckTCPPortOpen ( self . parent . log , ( "127.0.0.1" , i1II ) ,
>   File "/opt/thinlinc/modules/thinlinc/vsm/portopencheck.py", line 32, in __init__
>     get_event_loop ( ) . call_soon ( self . delay_connect )
> Traceback (most recent call last):
>   File "/usr/lib64/python3.11/asyncio/events.py", line 80, in _run
>     self._context.run(self._callback, *self._args)
>   File "/opt/thinlinc/modules/thinlinc/vsm/portopencheck.py", line 35, in delay_connect
>     raise Exception("Exception!")
> Exception: Exception!
Comment 14 Samuel Mannehed cendio 2023-10-20 18:35:50 CEST
I used the patch in attachment 1164 [details] to provoke the bug on a 4.15.0 installation. I then used the patch in attachment 1165 [details] to verify that things behave much better now.

The method I used to provoke the system was a `raise` in vsmagent's housekeeping(). The new behavior seems very robust – vsmagent doesn't crash and can still be used to start new sessions etc. The traceback is logged in a clear and detailed manner.

I reviewed the commits, they look good. The resulting function is logical and readable.

> MUST
>  * Not kill vsmserver/vsmagent when delayed call crashes

Yes.

> SHOULD
>  * Log when delayed call crashes

Yes.

All good, closing.
Comment 15 Samuel Mannehed cendio 2023-10-23 12:21:01 CEST
Missed to verify logging without devel-mode. This needs to be checked before closing the bug.
Comment 16 Samuel Mannehed cendio 2023-10-25 10:44:09 CEST
Looks good.

Disabled devel-mode:

> 2023-10-24 19:55:53 ERROR vsmagent: Exception in callback VSMAgent.housekeeping()
> 2023-10-24 19:55:53 ERROR vsmagent: handle: <TimerHandle when=12357.369740724 VSMAgent.housekeeping()>
> 2023-10-24 19:55:53 ERROR vsmagent: ----------------------------------------
> 2023-10-24 19:55:53 ERROR vsmagent: Traceback (most recent call last):
> 2023-10-24 19:55:53 ERROR vsmagent:   File "/usr/lib64/python3.9/asyncio/events.py", line 80, in _run
> 2023-10-24 19:55:53 ERROR vsmagent:     self._context.run(self._callback, *self._args)
> 2023-10-24 19:55:53 ERROR vsmagent:   File "/opt/thinlinc/modules/thinlinc/vsm/vsmagent.py", line 133, in housekeeping
> 2023-10-24 19:55:53 ERROR vsmagent:     raise NotImplementedError("SAMUEL error")
> 2023-10-24 19:55:53 ERROR vsmagent: NotImplementedError: SAMUEL error
> 2023-10-24 19:55:53 ERROR vsmagent: ----------------------------------------

Enabled devel-mode:

> 2023-10-24 19:55:15 ERROR vsmagent: Exception in callback VSMAgent.housekeeping()
> 2023-10-24 19:55:15 ERROR vsmagent: handle: <TimerHandle when=12319.643435075 VSMAgent.housekeeping() created at /opt/thinlinc/modules/thinlinc/vsm/vsmagent.py:144>
> 2023-10-24 19:55:15 ERROR vsmagent: ----------------------------------------
> 2023-10-24 19:55:15 ERROR vsmagent: Object created at (most recent call last):
> 2023-10-24 19:55:15 ERROR vsmagent:   File "/opt/thinlinc/sbin/vsmagent", line 74, in <module>
> 2023-10-24 19:55:15 ERROR vsmagent:     iIi1ii1I1iI11 ( )
> 2023-10-24 19:55:15 ERROR vsmagent:   File "/opt/thinlinc/sbin/vsmagent", line 56, in iIi1ii1I1iI11
> 2023-10-24 19:55:15 ERROR vsmagent:     iiI1111IIi1 = VSMAgent ( oOooo0OOO , Iio0 )
> 2023-10-24 19:55:15 ERROR vsmagent:   File "/opt/thinlinc/modules/thinlinc/vsm/vsmagent.py", line 115, in __init__
> 2023-10-24 19:55:15 ERROR vsmagent:     self . housekeeping ( False )
> 2023-10-24 19:55:15 ERROR vsmagent:   File "/opt/thinlinc/modules/thinlinc/vsm/vsmagent.py", line 144, in housekeeping
> 2023-10-24 19:55:15 ERROR vsmagent:     get_event_loop ( ) . call_later ( 10 , self . housekeeping )
> 2023-10-24 19:55:15 ERROR vsmagent: ----------------------------------------
> 2023-10-24 19:55:15 ERROR vsmagent: Traceback (most recent call last):
> 2023-10-24 19:55:15 ERROR vsmagent:   File "/usr/lib64/python3.9/asyncio/events.py", line 80, in _run
> 2023-10-24 19:55:15 ERROR vsmagent:     self._context.run(self._callback, *self._args)
> 2023-10-24 19:55:15 ERROR vsmagent:   File "/opt/thinlinc/modules/thinlinc/vsm/vsmagent.py", line 133, in housekeeping
> 2023-10-24 19:55:15 ERROR vsmagent:     raise NotImplementedError("SAMUEL error")
> 2023-10-24 19:55:15 ERROR vsmagent: NotImplementedError: SAMUEL error
> 2023-10-24 19:55:15 ERROR vsmagent: ----------------------------------------
Comment 17 Samuel Mannehed cendio 2023-10-25 10:45:45 CEST
Note that devel-mode was disabled by removing "export PYTHONDEVMODE=1" from /opt/thinlinc/libexec/python3.

Note You need to log in before you can comment on or make changes to this bug.