Bug 7636 - asyncore is deprecated
Summary: asyncore is deprecated
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: Other (show other bugs)
Version: trunk
Hardware: PC Unknown
: P2 Normal
Target Milestone: 4.16.0
Assignee: Pierre Ossman
URL:
Keywords: relnotes
Depends on: 7648 8224 8230 8232 8242
Blocks:
  Show dependency treegraph
 
Reported: 2021-02-03 12:55 CET by Pierre Ossman
Modified: 2023-12-28 16:41 CET (History)
5 users (show)

See Also:
Acceptance Criteria:
MUST: * ThinLinc no longer relies on the system asyncore module * The main loop is asyncio based so that asyncio tasks and callbacks can be scheduled SHOULD: * There is no coroutine syntax in ThinLinc code (new or old syntax) * Everything uses asyncio terminology rather than asyncore such * There is no ThinLinc code that is redundant with asyncio functionality * Everything is converted to asyncio models and abstractions (e.g. protocols)


Attachments

Description Pierre Ossman cendio 2021-02-03 12:55:47 CET
Python has decided that asyncore is deprecated in favour of asyncio. This is a major problem for us as asyncore is the foundation for vsmserver and vsmagent. We need to change those to get away from the asyncore dependency.

The recommended replacement, asyncio, is only available from Python 3.6. We are currently targeting 3.4 so we cannot use that until we've raised our system requirements. That likely means when SLE 12 and Ubuntu 16.04 are EOL in 2024.

Note that Python developers originally planned on removing asyncore in Python 3.8, but seemed to have come to their senses and are now keeping it around until they feel that everyone no longer has to support old versions of Python. We'll see what that means in practice.
Comment 1 William Sjöblom cendio 2022-01-26 11:34:39 CET
Here is the upstream issue discussing the matter: https://bugs.python.org/issue28533

It seems like the planned removal of asyncore is for Python 3.11. Since we are already at Python 3.10 in the current Fedora 35, this issue becomes somewhat more pressing.

Worth noting is that the Python documentation states that asyncio has been shipped with Python since version 3.4 on a provisional basis (see the first couple of sentences in https://docs.python.org/3.5/library/asyncio.html) which might be of value regarding a decision for this matter.
Comment 2 Samuel Mannehed cendio 2022-01-26 12:10:49 CET
I'm getting a deprecation warning now on Fedora 35 (Python 3.10.1):

> $ /opt/thinlinc/sbin/tl-show-licenses 
> /opt/thinlinc/modules/thinlinc/vsm/asyncbase.py:21: DeprecationWarning: The asyncore module is deprecated. The recommended replacement is asyncio
>   import asyncore
Comment 3 Pierre Ossman cendio 2022-01-26 12:48:53 CET
So upstream tried dropping this in 3.11:

https://github.com/python/cpython/commit/9bf2cbc4c498812e14f20d86acb61c53928a5a57

But it got reverted again:

https://github.com/python/cpython/commit/cf7eaa4617295747ee5646c4e2b7e7a16d7c64ab

The reasoning is found here:

https://github.com/python/steering-council/issues/86

Basically their rules say that they should have a deprecation notice for two releases before anything is allowed to be removed. That notice was added to 3.11, and also backported to 3.10.1.

That probably means that 3.12 is the earliest it will be removed. It could be delayed further, but I would say that is unlikely as they've already had a commit applied to remove it before.
Comment 4 Pierre Ossman cendio 2022-01-26 12:52:07 CET
Ubuntu 16.04 is now out of normal support (since April 2021), so the limiting factor here is SLES 12. I'll poke SUSE and see if I can get an update on getting a newer Python on that platform.
Comment 5 Samuel Mannehed cendio 2022-01-26 13:17:41 CET
(In reply to Samuel Mannehed from comment #2)
> I'm getting a deprecation warning now on Fedora 35 (Python 3.10.1):
> 
> > $ /opt/thinlinc/sbin/tl-show-licenses 
> > /opt/thinlinc/modules/thinlinc/vsm/asyncbase.py:21: DeprecationWarning: The asyncore module is deprecated. The recommended replacement is asyncio
> >   import asyncore

This only shows up on manual builds, won't show up for customers.
Comment 6 Pierre Ossman cendio 2022-07-05 13:11:20 CEST
That warning also shows up when tl-collect-licensestats is run, which results in a mail from cron every time.
Comment 7 Samuel Mannehed cendio 2023-08-29 19:41:18 CEST
Fedoras plans to include Python 3.12 in Fedora 39 [1] which will be released Q4 2023. That would likely translate into RHEL 10 (Q2 2025) [2].

Ubuntu 22.04 shipped with Python 3.10 [3] (which was released a few months prior [4]). Judging from the past, it is possible that Python 3.12 will be included in the next LTS (24.04) [5].

As far as SLES goes, I don't think we need to be concerned. SLES 15 ships with Python 3.6 which is old now.

[1]: https://fedoraproject.org/wiki/Changes/Python3.12
[2]: https://access.redhat.com/articles/3078
[3]: https://changelogs.ubuntu.com/changelogs/pool/main/p/python3.10/python3.10_3.10.2-1/changelog
[4]: https://devguide.python.org/versions/
[5]: https://wiki.ubuntu.com/Releases
Comment 11 Pierre Ossman cendio 2023-09-07 13:31:53 CEST
We've decided to go with the recommended approach of switching to asyncio. We won't fully embrace the asyncio style, though, in that we'll stick to their callback API rather than their coroutine API.

The reason for this is to make the system more robust, as it is easy to lose track of the control flow when using coroutines. For the servers, the robustness is more important than simpler code. Other parts of the product might use asyncio in the future.

This decision hopefully also means we can continue supporting Python 3.4. We'll need to test and see how it works, as Python haven't documented the changes between 3.4 and 3.6 in detail.


Also worth noting is that we cannot define our own coroutines without raising the requirements to Python 3.6. The reason being that the async/await syntax isn't available until Python 3.6, and Python are dropping support for abusing "yield from" to write coroutines soon.
Comment 16 Pierre Ossman cendio 2023-09-12 14:52:23 CEST
It looks like asyncio.ensure_future() is useful and maybe critical to bridge coroutines and callbacks. Unfortunately, it was renamed from asyncio.async() in 3.4.4. That older name is no longer present in modern Python.

So it might be problematic to support Python 3.4.0, like we currently state. SLES 12 seems to have Python 3.4.6, so we can probably get away with raising the requirement a bit, though.
Comment 17 Pierre Ossman cendio 2023-09-13 15:04:53 CEST
Apparently, there is a big mess around reference counting and asyncio tasks. The documentation warns about this:

https://docs.python.org/3/library/asyncio-task.html#asyncio.create_task

> Important
> 
> Save a reference to the result of this function, to avoid a task disappearing
> mid-execution. The event loop only keeps weak references to tasks. A task that
> isn’t referenced elsewhere may get garbage collected at any time, even before
> it’s done. For reliable “fire-and-forget” background tasks, gather them in a
> collection: 

There are some upstream issues discussing this:

https://github.com/python/cpython/issues/88831
https://github.com/python/cpython/issues/91887

And a lot of discussion on stack overflow, where this post explains the problem best:

https://stackoverflow.com/questions/71938799/python-asyncio-create-task-really-need-to-keep-a-reference/76823668#76823668

This is likely something we need to keep in mind.
Comment 18 Pierre Ossman cendio 2023-09-13 16:14:31 CEST
As mentioned in the comments on that Stack Overflow, I think this can only happen if your code is already buggy. The short of it is this:

1. The loop holds STRONG references to everything that is scheduled to run. Doesn't matter if it is now, or later. I.e. all loop.call_*().

2. If a task isn't scheduled to run, that means it is waiting for something.

3. Task can only wait for one thing; futures. So the task must be waiting for a future.

4. The task has registered a callback with said future. Meaning the future has a  STRONG reference to the task.

5. If that future is ever to complete, then someone must be holding a reference to it. Likely a STRONG one.

If it's unreferenced, then the task can never continue, and maybe it's a good thing it gets mopped up by the garbage collector?
Comment 28 Adam Halim cendio 2023-10-19 17:29:22 CEST
Here is a summary of how error handling/logging is done 4.15.0 compared to the current build.

In general, 4.15.0 has no higher level catch-all to prevent the process from dying in case of programmer error. A crash in AsyncProcess.handle_read() for example would kill the server. Behaviours are similar between 4.15.0 and 4.15.post, where some information is less detailed in some places, but more detailed in others.

In the current implementation, it seems that as long as the initial startup works without issue, any uncaught exceptions will be handled by the exception handler in asyncbase.py

extproc
----------------------------------------
4.15.0:
A timeout in /opt/thinlinc/etc/sessionstartup.d/ is logged as 
> 2023-10-17 09:42:42 WARNING vsmserver.extproc: Timeout running ['/opt/thinlinc/etc/sessionstartup.d/test.sh']
> 2023-10-17 09:42:42 INFO vsmserver.extproc: ['/opt/thinlinc/etc/sessionstartup.d/test.sh'] exited with exit code -15
and a startup script that crashes is logged as
> 2023-10-17 09:40:20 INFO vsmserver.extproc: ['/opt/thinlinc/etc/sessionstartup.d/test.sh'] exited with exit code 127
> 2023-10-17 09:40:20 WARNING vsmserver.extproc: ['/opt/thinlinc/etc/sessionstartup.d/test.sh'] wrote to stderr: /opt/thinlinc/etc/sessionstartup.d/test.sh: line 2: bogus: command not found

4.15post:
This is very similar to how logging is done currently, where a timeout is logged as 
> 2023-10-17 14:23:31 ERROR vsmserver: Timeout running /opt/thinlinc/etc/sessionstartup.d/test.sh
and a crash is logged as 
> 2023-10-17 14:24:35 WARNING vsmserver: /opt/thinlinc/etc/sessionstartup.d/test.sh wrote to stderr: /opt/thinlinc/etc/sessionstartup.d/test.sh: line 3: bogus: command not found
> 
> 2023-10-17 14:24:35 INFO vsmserver: /opt/thinlinc/etc/sessionstartup.d/test.sh exited with exit code 127

4.15.0:
Exceptions that occur when running a subprocess are logged as follows:
> 2023-10-19 14:43:11 ERROR vsmagent.extproc: OSError: "Exception!" when trying to start ss
This was tested by raising an OSError() before running subprocess.Popen().

4.15post:
This behaviour is similar in the current implementation, but the responsibility of error handling has been moved to subclasses of VSMAgentHandler::handle(). The error is also now logged on both the server and the client.
> 2023-10-18 09:16:32 ERROR vsmagent: Exception when running ss: Exception!
> 
> 2023-10-18 09:16:32 WARNING vsmserver.session: Exception when getting user groups: <class 'Exception'>
Inside of subprocess_run(), there are three layers where an exception could occur. The first place is the outmost layer of the function, before any async calls are made. If something crashes here, the following message is logged (tested by raising an exception at the beginning of subprocess_run()):

> 2023-10-19 14:59:22 ERROR vsmserver: Unhandled XMLRPC exception: <class 'Exception'> Exception! Traceback (most recent call last):
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 486, in _handle_request
>     self . handle_method ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/usersocketserverchannel.py", line 86, in handle_method
>     ii1iIi1i11i ( self . vsmserver , sock = self . socket ) . handle ( self . methodname ,
>   File "/opt/thinlinc/modules/thinlinc/vsm/handler_getusersessions.py", line 64, in handle
>     iii111 = get_user_groups ( iiIi1 , self . log . session )
>              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/opt/thinlinc/modules/thinlinc/vsm/extproc.py", line 123, in get_user_groups
>     process_result = subprocess_run(*args, env={"USER": username})
>                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/opt/thinlinc/modules/thinlinc/vsm/extproc.py", line 37, in subprocess_run
>     raise Exception("Exception!")
> Exception: Exception!
The second layer is inside the asynchronous function, but outside a try-catch block. Raising an exception at the beginning of get_result() gives the following log:

> 2023-10-19 15:03:02 ERROR vsmserver: Exception in callback subprocess_run.<locals>.get_result(<Process 26534>, <Future pendi...extproc.py:78>, encoding=None, errors=None)(<Task finishe.../tasks.py:670>) at /opt/thinlinc/modules/thinlinc/vsm/extproc.py:37
> 2023-10-19 15:03:02 ERROR vsmserver: handle: <Handle subprocess_run.<locals>.get_result(<Process 26534>, <Future pendi...extproc.py:78>, encoding=None, errors=None)(<Task finishe.../tasks.py:670>) at /opt/thinlinc/modules/thinlinc/vsm/extproc.py:37 created at /usr/lib64/python3.11/asyncio/events.py:80>
> 2023-10-19 15:03:02 ERROR vsmserver: ----------------------------------------
> 2023-10-19 15:03:02 ERROR vsmserver: Object created at (most recent call last):
> 2023-10-19 15:03:02 ERROR vsmserver:   File "/opt/thinlinc/sbin/vsmserver", line 74, in <module>
> 2023-10-19 15:03:02 ERROR vsmserver:     iIi1ii1I1iI11 ( )
> 2023-10-19 15:03:02 ERROR vsmserver:   File "/opt/thinlinc/sbin/vsmserver", line 63, in iIi1ii1I1iI11
> 2023-10-19 15:03:02 ERROR vsmserver:     OoO . loop ( )
> 2023-10-19 15:03:02 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 224, in loop
> 2023-10-19 15:03:02 ERROR vsmserver:     self . run_forever ( )
> 2023-10-19 15:03:02 ERROR vsmserver:   File "/usr/lib64/python3.11/asyncio/base_events.py", line 607, in run_forever
> 2023-10-19 15:03:02 ERROR vsmserver:     self._run_once()
> 2023-10-19 15:03:02 ERROR vsmserver:   File "/usr/lib64/python3.11/asyncio/base_events.py", line 1914, in _run_once
> 2023-10-19 15:03:02 ERROR vsmserver:     handle._run()
> 2023-10-19 15:03:02 ERROR vsmserver:   File "/usr/lib64/python3.11/asyncio/events.py", line 80, in _run
> 2023-10-19 15:03:02 ERROR vsmserver:     self._context.run(self._callback, *self._args)
> 2023-10-19 15:03:02 ERROR vsmserver: ----------------------------------------
> 2023-10-19 15:03:02 ERROR vsmserver: Traceback (most recent call last):
> 2023-10-19 15:03:02 ERROR vsmserver:   File "/usr/lib64/python3.11/asyncio/events.py", line 80, in _run
> 2023-10-19 15:03:02 ERROR vsmserver:     self._context.run(self._callback, *self._args)
> 2023-10-19 15:03:02 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/extproc.py", line 38, in get_result
> 2023-10-19 15:03:02 ERROR vsmserver:     raise Exception("Exception!")
> 2023-10-19 15:03:02 ERROR vsmserver: Exception: Exception!
> 2023-10-19 15:03:02 ERROR vsmserver: ----------------------------------------
Lastly, an exception that occurs in a try-catch block will set the exception on the future, and the exception is handled by the VSMAgentHandler::handle():
> 2023-10-18 09:16:32 ERROR vsmagent: Exception when running ss: Exception!
> 2023-10-18 09:16:32 WARNING vsmserver.session: Exception when getting user groups: <class 'Exception'>
xmlrpc
----------------------------------------
4.15.0
Following is the behaviour when a crash occurs in the callback of an RPC call, which was tested by raising an exception in call_unbindports.py::handle_returnvalue():

> 2023-10-18 14:50:35 ERROR vsmserver.session: Unhandled exception trying to unbind ports for display 14 on VSM Agent 127.0.0.1:904: <class 'Exception'> Exception! Traceback (most recent call last):
>   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 105, in handle_read
>     obj.handle_read_event()
>   File "/usr/lib64/python3.11/asyncore.py", line 425, in handle_read_event
>     self.handle_read()
>   File "/usr/lib64/python3.11/asynchat.py", line 158, in handle_read
>     self.found_terminator()
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 424, in found_terminator
>     self.handle_response()
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 456, in handle_response
>     self.handle_returnvalue()
>   File "/opt/thinlinc/modules/thinlinc/vsm/call_unbindports.py", line 41, in handle_returnvalue
>     raise Exception("Exception!")
> Exception: Exception!
> . Marking agent as down.
Note that the agent is not actually marked as down, and the session is created as usual.

4.15post
The entire VSMXMLRPCCall class has been removed in the current implementation, and functionality has been moved to a more general function in vsmxmlrpccall.py::call_method(). The equivalent callback is now loginhandler_common.py::_unbind_ports_done(), which also is responsible for the error handling. Currently, none of the handlers print a traceback, which all previous calls did. Raising an exception in _unbind_ports_done() gives the following error message:

> 2023-10-18 14:52:41 WARNING vsmserver: Error unbinding ports for session 127.0.0.1:13: Exception!

LicenseHandler
----------------------------------------
The function update_licenses() is first called synchronously, and later asynchronously repeatedly at a set interval. In the synchronous case, an exception was simply raised at the beginning of the function. For the asynchronous case, I added a boolean to only raise the exception the second time the function is called. When the function is run synchronously and crashes, we get the following error message:

4.15.0
> 2023-10-16 13:16:16 INFO vsmserver.license: Updating license data from disk to memory
> Traceback (most recent call last):
>   File "/opt/thinlinc/sbin/vsmserver", line 18, in <module>
>     OOoOoo000O00 ( )
>   File "/opt/thinlinc/sbin/vsmserver", line 15, in OOoOoo000O00
>     VSMServer ( sys . argv )
>   File "/opt/thinlinc/modules/thinlinc/vsm/vsmserver.py", line 114, in __init__
>     self . licensehandler . update_licenses ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/licensehandler.py", line 188, in update_licenses
>     raise Exception("Exception!")
> Exception: Exception!
When the function is run in an asynchronous context and crashes, the following error message is produced:
> Traceback (most recent call last):
>   File "/opt/thinlinc/sbin/vsmserver", line 18, in <module>
>     OOoOoo000O00 ( )
>   File "/opt/thinlinc/sbin/vsmserver", line 15, in OOoOoo000O00
>     VSMServer ( sys . argv )
>   File "/opt/thinlinc/modules/thinlinc/vsm/vsmserver.py", line 168, in __init__
>     self . loop ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 426, in loop
>     OooO0O00o0 = self . run_delayed_calls ( )
>                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 384, in run_delayed_calls
>     I1I1IIi1IIIi . func ( * I1I1IIi1IIIi . args , ** I1I1IIi1IIIi . kw )
>   File "/opt/thinlinc/modules/thinlinc/vsm/licensehandler.py", line 190, in update_licenses
>     raise Exception("Exception!")
> Exception: Exception!
4.15post
The synchronous behaviour is unchanged in the current implementation
> 2023-10-16 13:12:51 INFO vsmserver.license: Updating license data from disk to memory
> Traceback (most recent call last):
>   File "/opt/thinlinc/sbin/vsmserver", line 74, in <module>
>     iIi1ii1I1iI11 ( )
>   File "/opt/thinlinc/sbin/vsmserver", line 56, in iIi1ii1I1iI11
>     iiI1111IIi1 = VSMServer ( oOooo0OOO , Iio0 )
>                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/opt/thinlinc/modules/thinlinc/vsm/vsmserver.py", line 77, in __init__
>     self._update_licenses()
>   File "/opt/thinlinc/modules/thinlinc/vsm/vsmserver.py", line 154, in _update_licenses
>     self.licensehandler.update_licenses()
>   File "/opt/thinlinc/modules/thinlinc/vsm/licensehandler.py", line 186, in update_licenses
>     raise Exception("Exception!")
> Exception: Exception!
but in the async contxt, the error is caught by the error handler in asyncbase.py:
> 2023-10-16 13:31:00 INFO vsmserver.license: Updating license data from disk to memory
> 2023-10-16 13:31:00 ERROR vsmserver: Exception in callback VSMServer._update_licenses()
> 2023-10-16 13:31:00 ERROR vsmserver: handle: <TimerHandle when=18730.975908411 VSMServer._update_licenses() created at /opt/thinlinc/modules/thinlinc/vsm/vsmserver.py:156>
> 2023-10-16 13:31:00 ERROR vsmserver: ----------------------------------------
> 2023-10-16 13:31:00 ERROR vsmserver: Object created at (most recent call last):
> 2023-10-16 13:31:00 ERROR vsmserver:   File "/opt/thinlinc/sbin/vsmserver", line 74, in <module>
> 2023-10-16 13:31:00 ERROR vsmserver:     iIi1ii1I1iI11 ( )
> 2023-10-16 13:31:00 ERROR vsmserver:   File "/opt/thinlinc/sbin/vsmserver", line 56, in iIi1ii1I1iI11
> 2023-10-16 13:31:00 ERROR vsmserver:     iiI1111IIi1 = VSMServer ( oOooo0OOO , Iio0 )
> 2023-10-16 13:31:00 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/vsmserver.py", line 77, in __init__
> 2023-10-16 13:31:00 ERROR vsmserver:     self._update_licenses()
> 2023-10-16 13:31:00 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/vsmserver.py", line 156, in _update_licenses
> 2023-10-16 13:31:00 ERROR vsmserver:     get_event_loop().call_later(10, self._update_licenses)
> 2023-10-16 13:31:00 ERROR vsmserver: ----------------------------------------
> 2023-10-16 13:31:00 ERROR vsmserver: Traceback (most recent call last):
> 2023-10-16 13:31:00 ERROR vsmserver:   File "/usr/lib64/python3.11/asyncio/events.py", line 80, in _run
> 2023-10-16 13:31:00 ERROR vsmserver:     self._context.run(self._callback, *self._args)
> 2023-10-16 13:31:00 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/vsmserver.py", line 154, in _update_licenses
> 2023-10-16 13:31:00 ERROR vsmserver:     self.licensehandler.update_licenses()
> 2023-10-16 13:31:00 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/licensehandler.py", line 188, in update_licenses
> 2023-10-16 13:31:00 ERROR vsmserver:     raise Exception("Exception!")
> 2023-10-16 13:31:00 ERROR vsmserver: Exception: Exception!
> 2023-10-16 13:31:00 ERROR vsmserver: ----------------------------------------

CheckTCPPortOpen
----------------------------------------
To test what happens in the case of a timeout, I put a sleep(50) in handle_connect_event() and got timed out after 40 seconds as expected:

4.15.0
> 2023-10-16 14:44:27 INFO vsmserver.session: User with uid 1000 (cendio) requested a new session
> 2023-10-16 14:45:07 WARNING vsmserver: Timeout trying to start new session for cendio on VSM Agent 127.0.0.1:904
> 2023-10-16 14:45:07 WARNING vsmserver.loadinfo: Agent 127.0.0.1 failed to start session, Increasing penalty points by 5 to 5
> 2023-10-16 14:45:07 WARNING vsmserver: No working agents found trying to start new session for cendio
The class still depends on asyncore and will be tested after it is converted fully to asynico.
Comment 30 Linn cendio 2023-10-27 16:35:40 CEST
(In reply to Adam Halim from comment #28)
> Here is a summary of how error handling/logging is done 4.15.0 compared to
> the current build.
> ...
> extproc
> ----------------------------------------
> 4.15.0:
> A timeout in /opt/thinlinc/etc/sessionstartup.d/ is logged as 
> > 2023-10-17 09:42:42 WARNING vsmserver.extproc: Timeout running ['/opt/thinlinc/etc/sessionstartup.d/test.sh']
> > 2023-10-17 09:42:42 INFO vsmserver.extproc: ['/opt/thinlinc/etc/sessionstartup.d/test.sh'] exited with exit code -15
> ...
> 4.15post:
> This is very similar to how logging is done currently, where a timeout is
> logged as 
> > 2023-10-17 14:23:31 ERROR vsmserver: Timeout running /opt/thinlinc/etc/sessionstartup.d/test.sh
When updating extproc to use asyncio, the timeout logging was accidentally changed from WARNING to ERROR. This was changed back in commit r40578, and the updated logging for a timeout is:
> 2023-10-27 16:30:50 WARNING vsmserver: Timeout running /opt/thinlinc/etc/sessionstartup.d/10-echo-startup.sh
Comment 32 Linn cendio 2023-11-01 09:32:09 CET
xmlrpc.py
---------
Timeouts
========
To start the countdown for a timeout, function _activate_timeout() is called. I added sleep calls right after the countdown had been started.

4.15.0
A timeout in XMLRPCClientChannel is logged as:
> 2023-10-31 12:57:17 INFO vsmserver.session: User with uid 1000 (cendio) requested a new session
> 2023-10-31 12:58:57 WARNING vsmserver: Timeout getting load from VSM Agent 127.0.0.1:904, marking as down
> 2023-10-31 12:58:57 WARNING vsmserver: No working agents found trying to start new session for cendio
> 2023-10-31 12:58:57 WARNING vsmserver: Timeout getting load from VSM Agent 127.0.0.1:904, marking as down
A timeout in XMLRPXServerChannel is logged as:
When calling sleep in __init__:
> 2023-10-31 14:43:54 WARNING vsmserver: Timeout getting load from VSM Agent 127.0.0.1:904, marking as down
> 2023-10-31 14:44:45 WARNING vsmserver: Timeout communicating with socket ('127.0.0.1', 35116), closing
When calling sleep in handle_response:
> 2023-10-31 16:20:11 INFO vsmserver.session: User with uid 1000 (cendio) requested a new session
> 2023-10-31 16:20:21 WARNING vsmserver: Timeout getting load from VSM Agent 127.0.0.1:904, marking as down
> 2023-10-31 16:20:21 WARNING vsmserver: No working agents found trying to start new session for cendio

4.15.post:
After the changes, ClientProtocol does no longer internally use _activate_timeout. This kind of timeout is instead handled by VSMXMLRPCCall.call_method, which was tested in comment 28.

A timeout when calling sleep in ServerProtocol.connection_made is logged as:
> 2023-10-31 16:38:34 WARNING vsmserver.loadinfo: Error getting load from agent 127.0.0.1: Operation timed out
> 2023-10-31 16:38:34 WARNING vsmserver.loadinfo: Marking agent 127.0.0.1 as down
> Executing <Handle ServerProtocol.connection_made(<_SelectorSoc...e, bufsize=0>>) created at /usr/lib64/python3.9/asyncio/selector_events.py:774> took 50.032 seconds
A timeout when calling sleep in ServerProtocol._write_response is logged as:
> 2023-10-31 17:09:41 WARNING vsmserver.loadinfo: Marking agent 127.0.0.1 as down
> Executing <Handle ServerProtocol._callback_done(<Future finis...server.py:309>) created at /opt/thinlinc/modules/thinlinc/vsm/vsmserver.py:391> took 50.051 seconds
The last line in both cases looks very similar to the one as found in comment 31. I did not dig deeper into why/when this kind of logging occurs, or if the logging only is shown in devmode or not.
Comment 34 Samuel Mannehed cendio 2023-11-10 11:46:57 CET
I'm also getting similar log lines in vsmserver.log as described in comment 32:

> 2023-11-10 11:40:51 INFO vsmagent.session: Verified connectivity to newly started Xvnc for test9
> Executing <Handle ReqSessionHandler._checked_vnc_socket(<Task finishe.../tasks.py:685>) created at /usr/lib64/python3.12/asyncio/events.py:84> took 0.144 seconds
> Executing <TimerHandle when=8007.594749 VSMAgent.housekeeping() created at /opt/thinlinc/modules/thinlinc/vsm/vsmagent.py:135> took 0.134 seconds
Comment 35 Linn cendio 2023-11-10 14:39:27 CET
Differences between error handling between XMLRPCClientChannel and ClientProtocol in xmlrpc

Some of the error handling has been moved out of the XMLRPCClientChannel from its constructor into call_method, which was covered when testing xmlrpc calls in comment 28. Below are the categories of exceptions I found.

XML marshalling exceptions
==========================

4.15.0
There are two places that can raise errors regarding XML marshalling in 4.15.0. I have manually raised exceptions to trigger them. In the logging below, some values of variables have been replaced with "..." to save space.

Logging when raising an exception in XMLRPCClientChannel.create_xmlrpc_call:
> Traceback (most recent call last):
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 369, in create_xmlrpc_call
>     raise ValueError("LINN custom error")
> ValueError: LINN custom error
> 
> During handling of the above exception, another exception occurred:
> 
> Traceback (most recent call last):
>   File "/opt/thinlinc/sbin/vsmserver", line 18, in <module>
>     OOoOoo000O00 ( )
>   File "/opt/thinlinc/sbin/vsmserver", line 15, in OOoOoo000O00
>     VSMServer ( sys . argv )
>   File "/opt/thinlinc/modules/thinlinc/vsm/vsmserver.py", line 168, in __init__
>     self . loop ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 426, in loop
>     OooO0O00o0 = self . run_delayed_calls ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 384, in run_delayed_calls
>     I1I1IIi1IIIi . func ( * I1I1IIi1IIIi . args , ** I1I1IIi1IIIi . kw )
>   File "/opt/thinlinc/modules/thinlinc/vsm/loadbalancer.py", line 198, in periodic_update
>     GetLoadCall ( OO00OOooO , ii , self . periodic_update ,
>   File "/opt/thinlinc/modules/thinlinc/vsm/call_getload.py", line 25, in __init__
>     self . xmlrpc_call ( 'get_load' )
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 335, in xmlrpc_call
>     self . create_xmlrpc_call ( methodname , params )
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 371, in create_xmlrpc_call
>     raise XMLMarshallingError ( [ str ( o0o0Oo00000o ) , params , methodname ] )
> thinlinc.vsm.xmlrpc.XMLMarshallingError: ['LINN custom error', (), 'get_load']
Logging when raising an exception in XMLRPCClientChannel.handle_response:
> 2023-11-01 14:00:33 ERROR vsmserver: Exception in error handler for <thinlinc.vsm.call_verifysessions.VerifySessionsCall 127.0.0.1:904 at 0x7f5f6f
> 89b430, callback: <bound method SessionStore.verify_sessions_finished of <thinlinc.vsm.sessionstore.SessionStore object at 0x7f5f6f86a100>>>: <class 'IndexError'> tuple index out of range
> Traceback (most recent call last):
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 451, in handle_response
>     raise ValueError("LINN custom error")
> ValueError: LINN custom error
> 
> During handling of the above exception, another exception occurred:
> 
> Traceback (most recent call last):
>   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 104, in ooOOOoOO0
>     obj . handle_read_event ( )
>   File "/usr/lib64/python3.9/asyncore.py", line 420, in handle_read_event
>     self.handle_read()
>   File "/usr/lib64/python3.9/asynchat.py", line 151, in handle_read
>     self.found_terminator()
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 425, in found_terminator
>     self . handle_response ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 453, in handle_response
>     raise XMLDeMarshallingError ( [ str ( o0o0Oo00000o ) , self . payload ] )
> thinlinc.vsm.xmlrpc.XMLDeMarshallingError: ['LINN custom error', b"..."]
We see that both these exceptions are not handled with the correct logging format, since most or all lines are missing the logging prefix with date and time.

4.15post
In 4.15 post there are still two places that raises errors regarding XML marshalling and demarshalling. In the logging below, some values of variables have been replaced with "..." to save space.

When raising exception in ClientProtocol._write_request:
> 2023-11-01 14:54:28 ERROR vsmserver: handle: <Handle call_method.<locals>.IiI1I(<Task finishe...xmlrpc.py:472>) at /opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py:435 created at /usr/lib64/python3.9/asyncio/events.py:80>
> 2023-11-01 14:54:28 ERROR vsmserver: ----------------------------------------
> 2023-11-01 14:54:28 ERROR vsmserver: Object created at (most recent call last):
> 2023-11-01 14:54:28 ERROR vsmserver:   File "/opt/thinlinc/sbin/vsmserver", line 74, in <module>
> 2023-11-01 14:54:28 ERROR vsmserver:     iIi1ii1I1iI11 ( )
> 2023-11-01 14:54:28 ERROR vsmserver:   File "/opt/thinlinc/sbin/vsmserver", line 63, in iIi1ii1I1iI11
> 2023-11-01 14:54:28 ERROR vsmserver:     OoO . loop ( )
> 2023-11-01 14:54:28 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 218, in loop
> 2023-11-01 14:54:28 ERROR vsmserver:     self . run_forever ( )
> 2023-11-01 14:54:28 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/base_events.py", line 601, in run_forever
> 2023-11-01 14:54:28 ERROR vsmserver:     self._run_once()
> 2023-11-01 14:54:28 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/base_events.py", line 1897, in _run_once
> 2023-11-01 14:54:28 ERROR vsmserver:     handle._run()
> 2023-11-01 14:54:28 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/events.py", line 80, in _run
> 2023-11-01 14:54:28 ERROR vsmserver:     self._context.run(self._callback, *self._args)
> 2023-11-01 14:54:28 ERROR vsmserver: ----------------------------------------
> 2023-11-01 14:54:28 ERROR vsmserver: Traceback (most recent call last):
> 2023-11-01 14:54:28 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 100, in _write_request
> 2023-11-01 14:54:28 ERROR vsmserver:     raise ValueError("LINN custom error")
> 2023-11-01 14:54:28 ERROR vsmserver: ValueError: LINN custom error
> 2023-11-01 14:54:28 ERROR vsmserver:
> 2023-11-01 14:54:28 ERROR vsmserver: The above exception was the direct cause of the following exception:
> 2023-11-01 14:54:28 ERROR vsmserver:
> 2023-11-01 14:54:28 ERROR vsmserver: Traceback (most recent call last):
> 2023-11-01 14:54:28 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/events.py", line 80, in _run
> 2023-11-01 14:54:28 ERROR vsmserver:     self._context.run(self._callback, *self._args)
> 2023-11-01 14:54:28 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 447, in IiI1I
> 2023-11-01 14:54:28 ERROR vsmserver:     IiI111111i1 = i1i1 . call_method ( methodname , * params )
> 2023-11-01 14:54:28 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 82, in call_method
> 2023-11-01 14:54:28 ERROR vsmserver:     self . _write_request ( methodname , * params )
> 2023-11-01 14:54:28 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 102, in _write_request
> 2023-11-01 14:54:28 ERROR vsmserver:     raise XMLMarshallingError ( "Error encoding XML-RPC call %s%r: %s" %
> 2023-11-01 14:54:28 ERROR vsmserver: thinlinc.vsm.xmlrpc.XMLMarshallingError: Error encoding XML-RPC call get_load(): LINN custom error
> 2023-11-01 14:54:28 ERROR vsmserver: ----------------------------------------
When raising exception in ClientProtocol._handle_response:
> 2023-11-01 15:02:54 WARNING vsmserver: Error checking if agent 127.0.0.1 is alive: Error parsing XML-RPC response b"...": LINN custom error
When using asyncio, the traceback have the correct logging format and the demarshalling error is now logged as a regular log line instead of a traceback.


Exceptions when handling data
=============================

4.15.0
There are also some exceptions that can be cast when incoming data is handled, found in XMLRPCClientChannel.found_terminator.

Logging for XMLRPCClientChannel.collect_incoming_data when incoming data is too long:
> 2023-11-10 09:35:12 WARNING vsmserver.loadinfo: Connection to 127.0.0.1:904 closed before receiving response in request for loadinfo. Marking as down.
Logging when XMLRPCClientChannel.found_terminator casts InvalidHTTPResponse due to encoding error:
> 2023-11-07 12:30:29 WARNING vsmserver.loadinfo: HTTP Error from VSM Agent 127.0.0.1:904 in request for loadinfo. Marking as down.
Logging when XMLRPCClientChannel.found_terminator casts HTTPResponseNotOK (I set the response code to 44444):
> 2023-11-07 15:57:42 ERROR vsmserver: Exception in error handler for <thinlinc.vsm.call_getload.GetLoadCall 127.0.0.1:904 at 0x7f36fc6c6400, callback: <bound method NewSessionHandler.find_best_agent of <thinlinc.vsm.handler_newsession.NewSessionHandler connected b'\x006eac4' at 0x7f36fc6c6160>>>: <class 'IndexError'> tuple index out of range
> Traceback (most recent call last):
>   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 104, in ooOOOoOO0
>     obj . handle_read_event ( )
>   File "/usr/lib64/python3.9/asyncore.py", line 420, in handle_read_event
>     self.handle_read()
>   File "/usr/lib64/python3.9/asynchat.py", line 171, in handle_read
>     self.found_terminator()
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 414, in found_terminator
>     raise HTTPResponseNotOK ( self . response )
> thinlinc.vsm.xmlrpc.HTTPResponseNotOK: ['HTTP/1.1', '44444', 'OK']
The first two exceptions are logged appropriately, but the last one does not follow the logging format for the traceback.

4.15post
The replacement for 4.15.0's XMLRPCClientChannel.found_terminator and XMLRPCClientChannel.collect_incoming_data can be found in ClientProtocol._process_data.

Logging for ClientProtocol._process_data when incoming data is too long:
> 2023-11-10 10:03:26 WARNING vsmserver: Error checking if agent 127.0.0.1 is alive: Excessive response data
> 2023-11-10 10:03:26 WARNING vsmserver: Marking agent 127.0.0.1 as down
Logging when ClientProtocol._process_data casts InvalidHTTPResponse due to raised encoding error (note that a real HTTP status line would include a non-ascii character to cast this error):
> 2023-11-07 16:41:21 WARNING vsmserver: Error checking if agent 127.0.0.1 is alive: Invalid HTTP status line: b'HTTP/1.1 200 OK\r\n'
> 2023-11-07 16:41:21 WARNING vsmserver: Marking agent 127.0.0.1 as down
Logging when ClientProtocol._process_data casts HTTPResponseNotOK due to bad response code (when setting the response code to 44444):
> 2023-11-07 16:45:26 WARNING vsmserver: Error checking if agent 127.0.0.1 is alive: Request rejected by server: '44444' 'OK\r\n'
> 2023-11-07 16:45:26 WARNING vsmserver: Marking agent 127.0.0.1 as down
All these exceptions are now logged in a very similar format, explaining what went wrong and when it happened, as well as logging the marking agent as down.
Comment 36 Linn cendio 2023-11-13 15:45:28 CET
Here are the differences between error handling between XMLRPCServerChannel and ServerProtocol in xmlrpc. In the logging below, some values of variables have been replaced with "..." to save space.

XML marshalling exceptions
==========================

4.15.0
There are two places that can raise errors regarding XML marshalling in 4.15.0. I have manually raised exceptions to trigger them.

Logging when raising an exception in XMLRPCServerChannel.handle_response:
> 2023-11-13 10:39:40 ERROR vsmserver: Unhandled XMLRPC exception: <class 'thinlinc.vsm.xmlrpc.XMLMarshallingError'> ['LINN custom error', ('/var/run/thinlinc/master/1000',), None]
> Traceback (most recent call last):
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 608, in handle_response
>     raise TypeError("LINN custom error")
> TypeError: LINN custom error
> 
> During handling of the above exception, another exception occurred:
> 
> Traceback (most recent call last):
>   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 104, in ooOOOoOO0
>     obj . handle_read_event ( )
>   File "/usr/lib64/python3.9/asyncore.py", line 420, in handle_read_event
>     self.handle_read()
>   File "/usr/lib64/python3.9/asynchat.py", line 151, in handle_read
>     self.found_terminator()
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 565, in found_terminator
>     self . handle_request ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 580, in handle_request
>     self . handle_method ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/vsmserverchannel.py", line 120, in handle_method
>     O0o0OOoOOOO ( self . vsmserver , sock = self . socket ) . handle ( self . methodname ,
>   File "/opt/thinlinc/modules/thinlinc/vsm/handler_createusersocket.py", line 28, in handle
>     self . create_socket ( Ooo )
>   File "/opt/thinlinc/modules/thinlinc/vsm/handler_createusersocket.py", line 44, in create_socket
>     self . handle_response ( i11IiIIiii1II )
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 610, in handle_response
>     raise XMLMarshallingError ( [ str ( o0o0Oo00000o ) , params , None ] )
> thinlinc.vsm.xmlrpc.XMLMarshallingError: ['LINN custom error', ('/var/run/thinlinc/master/1000',), None]
Logging when raising an exception in XMLRPCServerChannel.handle_request:
> 2023-11-13 10:27:52 ERROR vsmserver: Unhandled XMLRPC exception: <class 'thinlinc.vsm.xmlrpc.XMLDeMarshallingError'> ['LINN custom error', b"..."] Traceback (most recent call last):
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 576, in handle_request
>     raise ValueError("LINN custom error")
> ValueError: LINN custom error
> 
> During handling of the above exception, another exception occurred:
> 
> Traceback (most recent call last):
>   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 104, in ooOOOoOO0
>     obj . handle_read_event ( )
>   File "/usr/lib64/python3.9/asyncore.py", line 420, in handle_read_event
>     self.handle_read()
>   File "/usr/lib64/python3.9/asynchat.py", line 151, in handle_read
>     self.found_terminator()
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 565, in found_terminator
>     self . handle_request ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 578, in handle_request
>     raise XMLDeMarshallingError ( [ str ( o0o0Oo00000o ) , self . payload ] )
> thinlinc.vsm.xmlrpc.XMLDeMarshallingError: ['LINN custom error', b"..."]
> 
> 2023-11-13 10:27:54 WARNING vsmserver.loadinfo: Connection to 127.0.0.1:904 closed before receiving response in request for loadinfo. Marking as down.
We see that the traceback in both cases have the wrong logging format, since the lines are missing the logging prefix.

4.15post
Logging when raising an exception in ServerProtocol._write_response:
> 2023-11-13 11:19:12 ERROR vsmserver: handle: <Handle ServerProtocol._callback_done(<Future finis...server.py:309>) created at /opt/thinlinc/modules/thinlinc/vsm/vsmserver.py:391>
> 2023-11-13 11:19:12 ERROR vsmserver: ----------------------------------------
> 2023-11-13 11:19:12 ERROR vsmserver: Object created at (most recent call last):
> 2023-11-13 11:19:12 ERROR vsmserver:   File "/opt/thinlinc/sbin/vsmserver", line 74, in <module>
> 2023-11-13 11:19:12 ERROR vsmserver:     iIi1ii1I1iI11 ( )
> 2023-11-13 11:19:12 ERROR vsmserver:   File "/opt/thinlinc/sbin/vsmserver", line 63, in iIi1ii1I1iI11
> 2023-11-13 11:19:12 ERROR vsmserver:     OoO . loop ( )
> 2023-11-13 11:19:12 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 218, in loop
> 2023-11-13 11:19:12 ERROR vsmserver:     self . run_forever ( )
> 2023-11-13 11:19:12 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/base_events.py", line 601, in run_forever
> 2023-11-13 11:19:12 ERROR vsmserver:     self._run_once()
> 2023-11-13 11:19:12 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/base_events.py", line 1897, in _run_once
> 2023-11-13 11:19:12 ERROR vsmserver:     handle._run()
> 2023-11-13 11:19:12 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/events.py", line 80, in _run
> 2023-11-13 11:19:12 ERROR vsmserver:     self._context.run(self._callback, *self._args)
> 2023-11-13 11:19:12 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/vsmserver.py", line 391, in _handle_method_done
> 2023-11-13 11:19:12 ERROR vsmserver:     response . set_result ( call . result ( ) )
> 2023-11-13 11:19:12 ERROR vsmserver: ----------------------------------------
> 2023-11-13 11:19:12 ERROR vsmserver: Traceback (most recent call last):
> 2023-11-13 11:19:12 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 387, in _write_response
> 2023-11-13 11:19:12 ERROR vsmserver:     raise TypeError("LINN custom error")
> 2023-11-13 11:19:12 ERROR vsmserver: TypeError: LINN custom error
> 2023-11-13 11:19:12 ERROR vsmserver: 
> 2023-11-13 11:19:12 ERROR vsmserver: The above exception was the direct cause of the following exception:
> 2023-11-13 11:19:12 ERROR vsmserver: 
> 2023-11-13 11:19:12 ERROR vsmserver: Traceback (most recent call last):
> 2023-11-13 11:19:12 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/events.py", line 80, in _run
> 2023-11-13 11:19:12 ERROR vsmserver:     self._context.run(self._callback, *self._args)
> 2023-11-13 11:19:12 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 370, in _callback_done
> 2023-11-13 11:19:12 ERROR vsmserver:     self . _write_response ( II1I1Ii1IIi )
> 2023-11-13 11:19:12 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 390, in _write_response
> 2023-11-13 11:19:12 ERROR vsmserver:     raise XMLMarshallingError ( "Error encoding XML-RPC response %r: %s" %
> 2023-11-13 11:19:12 ERROR vsmserver: thinlinc.vsm.xmlrpc.XMLMarshallingError: Error encoding XML-RPC response '/var/run/thinlinc/master/1000': LINN custom error
> 2023-11-13 11:19:12 ERROR vsmserver: ----------------------------------------
Logging when raising an exception in ServerProtocol._handle_request:
> 2023-11-13 11:11:19 WARNING vsmserver.loadinfo: Error getting load from agent 127.0.0.1: [Errno 111] Connect call failed ('127.0.0.1', 904)
> 2023-11-13 11:11:19 WARNING vsmserver.loadinfo: Marking agent 127.0.0.1 as down
After swapping to asyncio, the first traceback is now logged with the correct logging format. The other traceback has been shortened to a regular log message, and the agent is still marked as down.


Exceptions when handling data
=============================

There are also some exceptions that can be cast when incoming data is handled.

4.15.0
Logging for XMLRPCServerChannel.found_terminator when incoming data is too long:
> 2023-11-13 13:29:48 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent 127.0.0.1:904 in request for loadinfo. Marking as down.
> 2023-11-13 13:30:37 WARNING vsmserver: Timeout communicating with socket ('127.0.0.1', 36764), closing
Logging when raising UnicodeDecodeError in XMLRPCServerChannel.found_terminator:
> 2023-11-13 10:44:43 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent 127.0.0.1:904 in request for loadinfo. Marking as down.
We see that too long incoming data triggered a timeout, and that a unicode error triggered a regular log line.

4.15post
Logging for ServerProtocol.data_recieved when incoming data is too long:
> 2023-11-13 13:32:12 WARNING vsmserver.loadinfo: Error getting load from agent 127.0.0.1: [Errno 111] Connect call failed ('127.0.0.1', 904)
> 2023-11-13 13:32:12 WARNING vsmserver.loadinfo: Marking agent 127.0.0.1 as down
> 2023-11-13 13:33:32 ERROR vsmserver: Exception in callback call_method.<locals>.OO(<Future finis...xmlrpc.py:477>) at /opt/thinlinc/modules/thinlinc/vsm/vsmxmlrpccall.py:140
> 2023-11-13 13:33:32 ERROR vsmserver: handle: <Handle call_method.<locals>.OO(<Future finis...xmlrpc.py:477>) at /opt/thinlinc/modules/thinlinc/vsm
> /vsmxmlrpccall.py:140 created at /opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py:465>
> 2023-11-13 13:33:32 ERROR vsmserver: ----------------------------------------
> 2023-11-13 13:33:32 ERROR vsmserver: Object created at (most recent call last):
> 2023-11-13 13:33:32 ERROR vsmserver:   File "/opt/thinlinc/sbin/vsmserver", line 74, in <module>
> 2023-11-13 13:33:32 ERROR vsmserver:     iIi1ii1I1iI11 ( )
> 2023-11-13 13:33:32 ERROR vsmserver:   File "/opt/thinlinc/sbin/vsmserver", line 63, in iIi1ii1I1iI11
> 2023-11-13 13:33:32 ERROR vsmserver:     OoO . loop ( )
> 2023-11-13 13:33:32 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 218, in loop
> 2023-11-13 13:33:32 ERROR vsmserver:     self . run_forever ( )
> 2023-11-13 13:33:32 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/base_events.py", line 601, in run_forever
> 2023-11-13 13:33:32 ERROR vsmserver:     self._run_once()
> 2023-11-13 13:33:32 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/base_events.py", line 1897, in _run_once
> 2023-11-13 13:33:32 ERROR vsmserver:     handle._run()
> 2023-11-13 13:33:32 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/events.py", line 80, in _run
> 2023-11-13 13:33:32 ERROR vsmserver:     self._context.run(self._callback, *self._args)
> 2023-11-13 13:33:32 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 465, in i1iiiIii
> 2023-11-13 13:33:32 ERROR vsmserver:     I1II1IIIi . set_exception ( i11iIii )
> 2023-11-13 13:33:32 ERROR vsmserver: ----------------------------------------
> 2023-11-13 13:33:32 ERROR vsmserver: Traceback (most recent call last):
> 2023-11-13 13:33:32 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/vsmxmlrpccall.py", line 144, in OO
> 2023-11-13 13:33:32 ERROR vsmserver:     II . set_result ( call . result ( ) )
> 2023-11-13 13:33:32 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 463, in i1iiiIii
> 2023-11-13 13:33:32 ERROR vsmserver:     I1II1IIIi . set_result ( call . result ( ) )
> 2023-11-13 13:33:32 ERROR vsmserver: thinlinc.vsm.xmlrpc.ConnectionClosed: Connection closed unexpectedly
> 2023-11-13 13:33:32 ERROR vsmserver: 
> 2023-11-13 13:33:32 ERROR vsmserver: During handling of the above exception, another exception occurred:
> 2023-11-13 13:33:32 ERROR vsmserver: 
> 2023-11-13 13:33:32 ERROR vsmserver: Traceback (most recent call last):
> 2023-11-13 13:33:32 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/events.py", line 80, in _run
> 2023-11-13 13:33:32 ERROR vsmserver:     self._context.run(self._callback, *self._args)
> 2023-11-13 13:33:32 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/vsmxmlrpccall.py", line 146, in OO
> 2023-11-13 13:33:32 ERROR vsmserver:     II . set_exception ( OO0000 )
> 2023-11-13 13:33:32 ERROR vsmserver: asyncio.exceptions.InvalidStateError: invalid state
> 2023-11-13 13:33:32 ERROR vsmserver: ----------------------------------------
> 2023-11-13 13:33:32 WARNING vsmserver.loadinfo: Error getting load from agent 127.0.0.1: Operation timed out
> 2023-11-13 13:33:32 WARNING vsmserver.loadinfo: Marking agent 127.0.0.1 as down
Logging when raising UnicodeDecodeError in ServerProtocol.data_recieved:
> 2023-11-13 11:27:40 WARNING vsmserver.loadinfo: Error getting load from agent 127.0.0.1: Request rejected by server: '400' 'Invalid request\r\n'
> 2023-11-13 11:27:40 WARNING vsmserver.loadinfo: Marking agent 127.0.0.1 as down
Logging when incoming data is too long still warns about failed connection and that a timeout occured, but now it also logs a traceback. InvalidStateError can occur when trying to set a result value to a Future that already has a result. A personal theory is that the cancel call on the Future is what is triggering this "double result call", but I have not confirmed.

When raising a UnicodeDecodeError, the logging contains much of the info as in 4.15.0, and in my opinion even better communicates the code for this scenario.
Comment 41 Adam Halim cendio 2023-11-14 17:02:18 CET
The portopencheck module has been re-written with bug 8242, where it in 4.15.0 consisted of the classes CheckTCPPortOpen and CheckUnixSocketOpen.
In 4.15post, these classes have been replaced with the functions check_tcp_port_open() and check_unix_socket_open() which provide the same functionality.

The following is a summary of how the module behaves differently between 4.15.0 and 4.15post when it comes to error handling.

This was tested using Fedora 38 (Python 3.11.4)

Timeout handling
================

4.15post:
Putting an asynchronous sleep inside of check_tcp_port_open() will cause a timeout (tested with a 26 second sleep), which results in the agent's penalty points being increased by five, the client seeing a "ThinLinc login failed. (No agent server was available)", and this being logged:
> 2023-11-14 08:00:16 WARNING vsmagent.session: Xvnc start for username cendio failed (timeout waiting for vnc port)

4.15.0
The check_tcp_port_open() equivalent in 4.15.0 is instantiating CheckTCPPortOpen.
To trigger a timeout, I simply added a delayed call to delay_connect with the same timeout value as the timeout handler.
I could verify that both delay_connect and handle_timeout were called by logging in the functions.
The result was the same as in 4.15post.

4.15post
The handler that uses portopencheck also has a timeout, where a parent function calls check_tcp_port_open() recursively until the correct value is returned, which will make the function progress.
We can trigger this timeout by forcing check_tcp_port_open() to always return False, causing it to be called repeatedly until the timeout is reached.
Triggering a timeout this way has the same effect as putting an asynchronous sleep inside the function as described previously (the same branch in the parent function is reached):
> 2023-11-14 08:05:12 WARNING vsmagent.session: Xvnc start for username cendio failed (timeout waiting for vnc port)

4.15.0
We can acheive the same behaviour in 4.15.0 by making sure CheckTCPPortOpen always passes SOCKET_NOT_INUSE to its callback, which will make sure CheckTCPPortOpen is instantiated recursively until the timeout is reached. The result is the same as in 4.15post.

4.15post
Making the a call to check_(tcp|unix)_port_open take >40 seconds (shouldn't be possible in a real-world scenario, there are mechanisms to ensure a timeout of 25 seconds), xmlrpc will cancel the handler after 40 seconds. The client will receive the message "ThinLinc login failed. (No agent server was available)", and the following is output in the logs:
> 2023-11-14 15:34:19 WARNING vsmserver: Error requesting a new session from agent 127.0.0.1: Operation timed out
> 2023-11-14 15:34:19 WARNING vsmserver.loadinfo: Agent 127.0.0.1 failed to start session, Increasing penalty points by 5 to 5
> 2023-11-14 15:34:19 WARNING vsmserver: No working agents found trying to start new session for cendio

> 2023-11-14 15:34:19 ERROR vsmagent: Exception in callback ServerProtocol._callback_done(<Future cance...magent.py:249>)
> 2023-11-14 15:34:19 ERROR vsmagent: handle: <Handle ServerProtocol._callback_done(<Future cance...magent.py:249>) created at /opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py:241>
> 2023-11-14 15:34:19 ERROR vsmagent: ----------------------------------------
> 2023-11-14 15:34:19 ERROR vsmagent: Object created at (most recent call last):
> 2023-11-14 15:34:19 ERROR vsmagent:   File "/opt/thinlinc/sbin/vsmagent", line 63, in iIi1ii1I1iI11
> 2023-11-14 15:34:19 ERROR vsmagent:     OoO . loop ( )
> 2023-11-14 15:34:19 ERROR vsmagent:   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 218, in loop
> 2023-11-14 15:34:19 ERROR vsmagent:     self . run_forever ( )
> 2023-11-14 15:34:19 ERROR vsmagent:   File "/usr/lib64/python3.11/asyncio/base_events.py", line 607, in run_forever
> 2023-11-14 15:34:19 ERROR vsmagent:     self._run_once()
> 2023-11-14 15:34:19 ERROR vsmagent:   File "/usr/lib64/python3.11/asyncio/base_events.py", line 1914, in _run_once
> 2023-11-14 15:34:19 ERROR vsmagent:     handle._run()
> 2023-11-14 15:34:19 ERROR vsmagent:   File "/usr/lib64/python3.11/asyncio/events.py", line 80, in _run
> 2023-11-14 15:34:19 ERROR vsmagent:     self._context.run(self._callback, *self._args)
> 2023-11-14 15:34:19 ERROR vsmagent:   File "/usr/lib64/python3.11/asyncio/selector_events.py", line 1117, in _call_connection_lost
> 2023-11-14 15:34:19 ERROR vsmagent:     super()._call_connection_lost(exc)
> 2023-11-14 15:34:19 ERROR vsmagent:   File "/usr/lib64/python3.11/asyncio/selector_events.py", line 896, in _call_connection_lost
> 2023-11-14 15:34:19 ERROR vsmagent:     self._protocol.connection_lost(exc)
> 2023-11-14 15:34:19 ERROR vsmagent:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 241, in connection_lost
> 2023-11-14 15:34:19 ERROR vsmagent:     self . _response . cancel ( )
> 2023-11-14 15:34:19 ERROR vsmagent: ----------------------------------------
> 2023-11-14 15:34:19 ERROR vsmagent: Traceback (most recent call last):
> 2023-11-14 15:34:19 ERROR vsmagent:   File "/usr/lib64/python3.11/asyncio/events.py", line 80, in _run
> 2023-11-14 15:34:19 ERROR vsmagent:     self._context.run(self._callback, *self._args)
> 2023-11-14 15:34:19 ERROR vsmagent:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 348, in _callback_done
> 2023-11-14 15:34:19 ERROR vsmagent:     assert future is self . _response
> 2023-11-14 15:34:19 ERROR vsmagent:            ^^^^^^^^^^^^^^^^^^^^^^^^^^
> 2023-11-14 15:34:19 ERROR vsmagent: AssertionError
> 2023-11-14 15:34:19 ERROR vsmagent: ----------------------------------------

4.15.0
To test what happens if CheckTCPPortOpen takes more than 40 seconds to finish, I simply changed the delayed call to happen after 1000 seconds (I also set the timeout handler to trigger after 1000 seconds).
The results are the same as in 4.15post, except that there is nothing logged in vsmagent.log

Exception handling
==================

Exception handling has changed between 4.15.0 and 4.15post.
In 4.15.0, the CheckTCPPortOpen class was more responsible for exception handling, where it had an internal handle_error() method that was automatically invoked if an uncaught exception was raised.
In 4.15post, the caller is responsible for any uncaught exceptions, as they are set on the future object that is returned.

4.15post
Inside check_(tcp|unix)_port_open(), it seems all possible exceptions are managed (OSError).
Simply raising an OSError inside either function will have different effects depending on the caller, as it is up to the caller to manage the results from the future.
If some other unexpected exception is unpacked from the future, the exception is propagated to the default exception handler in asyncio:
> 2023-11-14 12:24:03 ERROR vsmagent: Exception in callback ReqSessionHandler._checked_vnc_socket(<Task finishe.../tasks.py:670>)
> 2023-11-14 12:24:03 ERROR vsmagent: handle: <Handle ReqSessionHandler._checked_vnc_socket(<Task finishe.../tasks.py:670>) created at /usr/lib64/python3.11/asyncio/events.py:80>
> 2023-11-14 12:24:03 ERROR vsmagent: ----------------------------------------
> 2023-11-14 12:24:03 ERROR vsmagent: Object created at (most recent call last):
> 2023-11-14 12:24:03 ERROR vsmagent:   File "/opt/thinlinc/sbin/vsmagent", line 74, in <module>
> 2023-11-14 12:24:03 ERROR vsmagent:     iIi1ii1I1iI11 ( )
> 2023-11-14 12:24:03 ERROR vsmagent:   File "/opt/thinlinc/sbin/vsmagent", line 63, in iIi1ii1I1iI11
> 2023-11-14 12:24:03 ERROR vsmagent:     OoO . loop ( )
> 2023-11-14 12:24:03 ERROR vsmagent:   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 218, in loop
> 2023-11-14 12:24:03 ERROR vsmagent:     self . run_forever ( )
> 2023-11-14 12:24:03 ERROR vsmagent:   File "/usr/lib64/python3.11/asyncio/base_events.py", line 607, in run_forever
> 2023-11-14 12:24:03 ERROR vsmagent:     self._run_once()
> 2023-11-14 12:24:03 ERROR vsmagent:   File "/usr/lib64/python3.11/asyncio/base_events.py", line 1914, in _run_once
> 2023-11-14 12:24:03 ERROR vsmagent:     handle._run()
> 2023-11-14 12:24:03 ERROR vsmagent:   File "/usr/lib64/python3.11/asyncio/events.py", line 80, in _run
> 2023-11-14 12:24:03 ERROR vsmagent:     self._context.run(self._callback, *self._args)
> 2023-11-14 12:24:03 ERROR vsmagent: ----------------------------------------
> 2023-11-14 12:24:03 ERROR vsmagent: Traceback (most recent call last):
> 2023-11-14 12:24:03 ERROR vsmagent:   File "/usr/lib64/python3.11/asyncio/events.py", line 80, in _run
> 2023-11-14 12:24:03 ERROR vsmagent:     self._context.run(self._callback, *self._args)
> 2023-11-14 12:24:03 ERROR vsmagent:   File "/opt/thinlinc/modules/thinlinc/vsm/handler_reqsession.py", line 413, in _checked_vnc_socket
> 2023-11-14 12:24:03 ERROR vsmagent:     IiI111IIi = socket_status . result ( )
> 2023-11-14 12:24:03 ERROR vsmagent:                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
> 2023-11-14 12:24:03 ERROR vsmagent:   File "/usr/lib64/python3.11/asyncio/tasks.py", line 479, in wait_for
> 2023-11-14 12:24:03 ERROR vsmagent:     return fut.result()
> 2023-11-14 12:24:03 ERROR vsmagent:            ^^^^^^^^^^^^
> 2023-11-14 12:24:03 ERROR vsmagent: ValueError: ADAM custom error
> 2023-11-14 12:24:03 ERROR vsmagent: ----------------------------------------

4.15.0
Inside the CheckTCPPortOpen class, OSError exceptions are also managed, which has different outcomes depending on the callback being passed to the constructor.
If any unexpected exceptions occur in the class, it is managed by handle_error() which gives the following output in vsmagent.log:
> Traceback (most recent call last):
>   File "/opt/thinlinc/sbin/vsmagent", line 18, in <module>
>     OOoOoo000O00 ( )
>   File "/opt/thinlinc/sbin/vsmagent", line 15, in OOoOoo000O00
>     VSMAgent ( sys . argv )
>   File "/opt/thinlinc/modules/thinlinc/vsm/vsmagent.py", line 168, in __init__
>     self . loop ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 426, in loop
>     OooO0O00o0 = self . run_delayed_calls ( )
>                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 384, in run_delayed_calls
>     I1I1IIi1IIIi . func ( * I1I1IIi1IIIi . args , ** I1I1IIi1IIIi . kw )
>   File "/opt/thinlinc/modules/thinlinc/vsm/portopencheck.py", line 36, in delay_connect
>     raise ValueError("ADAM error")
> ValueError: ADAM error
The agent is also given 5 penalty points.

4.15.post
By forcing check_unix_port_open() to always return True for each unix socket being checked (i.e. there are no free displays), the parent function will never progress. The client receives the following message: "ThinLinc login failed. (No agent server was available)", and the following is logged:
> 2023-11-14 10:14:31 WARNING vsmagent.session: Unable to find free display number when starting session for cendio. Reporting to VSM Server
The agent also receives five penalty points.

4.15.0
We can test the same behaviour here by making CheckUnixSocketOpen always passes SOCKET_IN_USE to its callback. (I ran into some recursion depth issues here, so I manually set the starting display to 1950 to test this).
The same message is logged:
> 2023-11-14 10:28:11 WARNING vsmagent.session: Unable to find free display number when starting session for cendio. Reporting to VSM Server

In both 4.15.0 and 4.15post, the client is given the same message: "ThinLinc login failed. (No agent server was available)", and the agent service keeps running.

When it comes to logging, there is an improvment in 4.15post where logging is managed by the default exception handler in asyncio.
Comment 42 Linn cendio 2023-11-16 16:48:26 CET
We also want to check a caller of xmlrpc to see how errors are handled in those handlers. I choose handler_createusersocket. Tested on RHEL 9 with Python 3.9.16.

The log messages below are triggered when logging in to a session. For all scenarios, the client fails to log in and reports back the following:
> ThinLinc login failed.
> (The SSH connection succeeded, but the ThinLinc server connection failed.
> Perhaps this server doesn't run a ThinLinc server?)
4.15.0
Logging as below when casting KeyError in CreateUserSocketHandler.handle.
> 2023-11-16 09:09:54 WARNING vsmserver.session: create_user_socket: User with uid 1000 does not exist
Logging when casting ValueError in CreateUserSocketHandler.handle (outside try/except-case):
> 2023-11-14 16:23:43 ERROR vsmserver: Unhandled XMLRPC exception: <class 'ValueError'> LINN custom error Traceback (most recent call last):
>   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 104, in ooOOOoOO0
>     obj . handle_read_event ( )
>   File "/usr/lib64/python3.9/asyncore.py", line 420, in handle_read_event
>     self.handle_read()
>   File "/usr/lib64/python3.9/asynchat.py", line 151, in handle_read
>     self.found_terminator()
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 555, in found_terminator
>     self . handle_request ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 569, in handle_request
>     self . handle_method ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/vsmserverchannel.py", line 120, in handle_method
>     O0o0OOoOOOO ( self . vsmserver , sock = self . socket ) . handle ( self . methodname ,
>   File "/opt/thinlinc/modules/thinlinc/vsm/handler_createusersocket.py", line 24, in handle
>     raise ValueError("LINN custom error")
> ValueError: LINN custom error
When casting a KeyError we are unable to find the user, as is indicated by the logging. We also see that an uncaught exception has no special handling, and that the traceback we get is missing the logging prefix. 


4.15post
Logging when casting KeyError in CreateUserSocketHandler.handle:
> 2023-11-14 16:55:03 WARNING vsmserver.session: create_user_socket: User with uid 1000 does not exist
Logging when casting ValueError in CreateUserSocketHandler.handle (outside try/except-case):
> 2023-11-14 17:08:23 ERROR vsmserver: Exception in callback ServerProtocol._call_callback('create_user_socket', 0)
> 2023-11-14 17:08:23 ERROR vsmserver: handle: <Handle ServerProtocol._call_callback('create_user_socket', 0) created at /opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py:332>
> 2023-11-14 17:08:23 ERROR vsmserver: ----------------------------------------
> 2023-11-14 17:08:23 ERROR vsmserver: Object created at (most recent call last):
> 2023-11-14 17:08:23 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 218, in loop
> 2023-11-14 17:08:23 ERROR vsmserver:     self . run_forever ( )
> 2023-11-14 17:08:23 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/base_events.py", line 601, in run_forever
> 2023-11-14 17:08:23 ERROR vsmserver:     self._run_once()
> 2023-11-14 17:08:23 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/base_events.py", line 1897, in _run_once
> 2023-11-14 17:08:23 ERROR vsmserver:     handle._run()
> 2023-11-14 17:08:23 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/events.py", line 80, in _run
> 2023-11-14 17:08:23 ERROR vsmserver:     self._context.run(self._callback, *self._args)
> 2023-11-14 17:08:23 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/selector_events.py", line 811, in _read_ready
> 2023-11-14 17:08:23 ERROR vsmserver:     self._read_ready_cb()
> 2023-11-14 17:08:23 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/selector_events.py", line 868, in _read_ready__data_received
> 2023-11-14 17:08:23 ERROR vsmserver:     self._protocol.data_received(data)
> 2023-11-14 17:08:23 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 316, in data_received
> 2023-11-14 17:08:23 ERROR vsmserver:     self . _handle_request ( )
> 2023-11-14 17:08:23 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 332, in _handle_request
> 2023-11-14 17:08:23 ERROR vsmserver:     get_event_loop ( ) . call_soon ( self . _call_callback , oOoOO0O0 , * O00 )
> 2023-11-14 17:08:23 ERROR vsmserver: ----------------------------------------
> 2023-11-14 17:08:23 ERROR vsmserver: Traceback (most recent call last):
> 2023-11-14 17:08:23 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/events.py", line 80, in _run
> 2023-11-14 17:08:23 ERROR vsmserver:     self._context.run(self._callback, *self._args)
> 2023-11-14 17:08:23 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 340, in _call_callback
> 2023-11-14 17:08:23 ERROR vsmserver:     self . _response = self . _callback ( methodname , * params )
> 2023-11-14 17:08:23 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/vsmserver.py", line 308, in _handle_method
> 2023-11-14 17:08:23 ERROR vsmserver:     ooO0oOO0 = oOOoO0o . handle ( method , params )
> 2023-11-14 17:08:23 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/handler_createusersocket.py", line 29, in handle
> 2023-11-14 17:08:23 ERROR vsmserver:     raise ValueError("LINN custom error")
> 2023-11-14 17:08:23 ERROR vsmserver: ValueError: LINN custom error
> 2023-11-14 17:08:23 ERROR vsmserver: ----------------------------------------
In 4.15post, we still do the same logging when unable to find the user. We still get a traceback for uncaught exceptions, but the now have the correct logging format. The behaviour for exceptions in xmlrpc handlers still seem to be the same as before the swap to asyncio.
Comment 43 Linn cendio 2023-11-17 10:31:03 CET
As a last test to see how errors are handled in handlers calling xmlrpc, I briefly tested handler_reqsession. Tested on RHEL 9 with Python 3.9.16.

The log messages below are triggered when logging in to a session. For all scenarios, the client fails to log in and reports back the following:
> ThinLinc login failed.
> (No agent server was available)
4.15.0
Logging when raising a general exception in ReqSessionHandler.start_session:
> 2023-11-16 16:16:48 INFO vsmserver.session: User with uid 1000 (cendio) requested a new session
> 2023-11-16 16:16:48 WARNING vsmserver.session: Connection to 127.0.0.1:904 closed before receiving response in request for new session. 
> 2023-11-16 16:16:48 WARNING vsmserver.loadinfo: Agent 127.0.0.1 failed to start session, Increasing penalty points by 5 to 5
> 2023-11-16 16:16:48 WARNING vsmserver: No working agents found trying to start new session for cendio
Logging when raising an OSError in ReqSessionHandler.check_homedir. This will happen when unable to create a home directory for the user:
> 2023-11-16 16:35:57 INFO vsmserver.session: User with uid 1000 (cendio) requested a new session
> 2023-11-16 16:35:57 WARNING vsmserver: VSM Agent 127.0.0.1 reported failure creating session for cendio
> 2023-11-16 16:35:57 WARNING vsmserver.loadinfo: Agent 127.0.0.1 failed to start session, Increasing penalty points by 5 to 5
> 2023-11-16 16:35:57 WARNING vsmserver: No working agents found trying to start new session for cendio
Logging contains no tracebacks and in both cases the pentaly points for the agent are increased by 5.

4.15post
Logging when raising a general exception in ReqSessionHandler._start_session:
> 2023-11-17 09:24:42 INFO vsmserver.session: User with uid 1000 (cendio) requested a new session
> 2023-11-17 09:24:42 WARNING vsmserver: Error requesting a new session from agent 127.0.0.1: Request rejected by server: '500' 'Internal error\r\n'
> 2023-11-17 09:24:42 WARNING vsmserver.loadinfo: Agent 127.0.0.1 failed to start session, Increasing penalty points by 5 to 5
> 2023-11-17 09:24:42 WARNING vsmserver: No working agents found trying to start new session for cendio
Logging when raising an OSError in ReqSessionHandler._check_homedir:
> 2023-11-17 09:26:39 INFO vsmserver.session: User with uid 1000 (cendio) requested a new session
> 2023-11-17 09:26:39 WARNING vsmserver: VSM Agent 127.0.0.1 reported failure creating session for cendio
> 2023-11-17 09:26:39 WARNING vsmserver.loadinfo: Agent 127.0.0.1 failed to start session, Increasing penalty points by 5 to 5
> 2023-11-17 09:26:39 WARNING vsmserver: No working agents found trying to start new session for cendio
We see that the logging has been slightly for altered when raising a general exception compared to 4.15.0. In this case, the agent now reports "500 Internal error" while in 4.15.0 it had the more generic "Connection [...] closed before receiving response [...]".

Otherwise, the logging is the same as in 4.15.0 and the agent still get penalty points when failing to create a session, so the error handling is comparable to before.
Comment 44 Adam Halim cendio 2023-11-20 13:22:04 CET
(In reply to Linn from comment #32)

> A timeout when calling sleep in ServerProtocol.connection_made is logged as:
> > 2023-10-31 16:38:34 WARNING vsmserver.loadinfo: Error getting load from agent 127.0.0.1: Operation timed out
> > 2023-10-31 16:38:34 WARNING vsmserver.loadinfo: Marking agent 127.0.0.1 as down
> > Executing <Handle ServerProtocol.connection_made(<_SelectorSoc...e, bufsize=0>>) created at /usr/lib64/python3.9/asyncio/selector_events.py:774> took 50.032 seconds
> A timeout when calling sleep in ServerProtocol._write_response is logged as:
> > 2023-10-31 17:09:41 WARNING vsmserver.loadinfo: Marking agent 127.0.0.1 as down
> > Executing <Handle ServerProtocol._callback_done(<Future finis...server.py:309>) created at /opt/thinlinc/modules/thinlinc/vsm/vsmserver.py:391> took 50.051 seconds
> The last line in both cases looks very similar to the one as found in
> comment 31. I did not dig deeper into why/when this kind of logging occurs,
> or if the logging only is shown in devmode or not.
These lines only get logged when running in debug mode and happens if a callback takes >0.1s to finish. This is 0.1s of actual execution, and not any asynchronous waiting. I tested this by adding a time.sleep(0.1) in a callback, which triggered  the line to be logged:
> Executing <TimerHandle when=623115.182962302 ReqSessionHandler._wait_on_vnc_port() created at /opt/thinlinc/modules/thinlinc/vsm/handler_reqsession.py:377> took 0.101 seconds
Replacing time.sleep(0.1) with await asyncio.sleep(0.1) does not trigger the line (I tested with 1 second as well). Disabling developer mode does not produce the line in the log (tested with time.sleep(1)).
Comment 51 Linn cendio 2023-11-22 09:23:38 CET
Regarding the testing for error handling/logging, we now consider this covered. Below is a summary of what was tested.

In general, we wanted to test both the "core parts" of the error handling, as well as the callers of those core parts. In 4.15.0, the core parts were the files containing function "handle_error", listed below:
  - asyncbase
  - extproc
  - portopencheck
  - xmlrpc

The callers of these files were our different handler classes. Asyncbase has been removed as part of this bug, but extproc, portopencheck and xmlrpc have all been tested, both by triggering errors in those files as well as in the callers of those files.

Xmlrpc is a bit special, since it has more than one layer to get to its caller:
  xmlrpc <--> vsmxmlrpccall <--> handler

Testing related to xmlrpc were done in all three layers.
Comment 52 Linn cendio 2023-11-22 09:32:59 CET
The result from the error handling was that we wanted to examine two scenarios a bit closer.

The first one has already been examined in comment 44, and was regarding the following line in the log:
> Executing <Handle ServerProtocol._callback_done(<Future finis...server.py:309>) created at /opt/thinlinc/modules/thinlinc/vsm/vsmserver.py:391> took 50.051 seconds
Since it is not triggered outside of dev mode and seems to exist to bring attention to async calls that takes longer than 0.1 sec to execute. I don't believe this logging is an issue for us as it currently is.


The second scenario to look closer too in the error handling was a traceback in comment 36, seen ServerProtocol.data_recieved when incoming data is too long. Note that be exact, rather than sending too much data, I simulated too much data by setting the allowed maxdata to -1 in the following code.

> class ServerProtocol():
>     ....
>     def data_received(self, data):
>         ...
>         self.max_datalength = -1
>         if self._datalength > self.max_datalength:
>             return
It is correct that we get a timeout in this scenario, but we should not get a traceback. It seems to be related to how our xmlrpc calls are handled, so I'm moving this to bug 8224.

With that, all things for the error handling should be taken care of regarding to this bug.
Comment 58 Samuel Mannehed cendio 2023-12-01 10:25:21 CET
Build 3430 has stopped working on Fedora 39. I once again see this warning in the log:

> /opt/thinlinc/modules/thinlinc/vsm/vsmserver.py:405: DeprecationWarning: There is no current event loop
>   asyncio . get_event_loop ( ) . call_later ( 3600 * 12 , self . _update_licenses ) 

And the master doesn't seem to work. Web Admin shows “Connection refused” on the pages talking to the vsmserver service, same with tlctl. Logging in with a ThinLinc client doesn't work either.

Build 3421 works well.
Comment 60 Samuel Mannehed cendio 2023-12-01 17:02:24 CET
(In reply to Samuel Mannehed from comment #58)
> Build 3430 has stopped working on Fedora 39. I once again see this warning
> in the log:
> 
> > /opt/thinlinc/modules/thinlinc/vsm/vsmserver.py:405: DeprecationWarning: There is no current event loop
> >   asyncio . get_event_loop ( ) . call_later ( 3600 * 12 , self . _update_licenses ) 
> 
> And the master doesn't seem to work. Web Admin shows “Connection refused” on
> the pages talking to the vsmserver service, same with tlctl. Logging in with
> a ThinLinc client doesn't work either.
> 
> Build 3421 works well.

It seems like Python changed how asyncio works regarding forked processes in python 3.12:

https://github.com/python/cpython/issues/66285

In practice, this means we need to call “set_event_loop()” after forking (daemon_init).
Comment 61 Samuel Mannehed cendio 2023-12-01 18:10:09 CET
We simply get a “Connection refused”, and no other error during the startup of vsmserver. This is because, as of Python 3.12, when asyncio detects that it runs in a forked process, it will reset the loop and create a new one. This means the forked process runs on a different loop than the initial process. So, the 'loop' that we have initiated at the top is different from the one that the VSMServer object gets when it runs “get_event_loop()”.

The loop that has an exception handler and starts running, is not the same loop that VSMServer has used for “create_server()”. The effect is that things just quietly stop — no errors, no progress.
Comment 63 Samuel Mannehed cendio 2023-12-06 09:43:08 CET
I looked through all the comments on this bug so far. It seems we have covered or have moved things to separate bugs for everything that was discovered.

The resulting functionality works well and is a step-up in terms of logging and error handling. This is ready for testing.
Comment 72 Linn cendio 2023-12-22 13:46:31 CET
Ran into this resource warning when testing server build 3455 on Ubuntu 22.04. 

The warning was triggered when restarting vsmagent service. The default Python version (before running any upgrades) is 3.10.4, and with this version I am able to see this warning in the log. However, after upgrading, the Python version is instead 3.10.12, and this version does not trigger any resource warnings. 

> /opt/thinlinc/modules/thinlinc/vsm/vsmagent.py:116: DeprecationWarning: There is no current event loop
>   asyncio . get_event_loop ( ) . create_server ( self . _handle_connection , port = I11ii1IiIii )
> /opt/thinlinc/modules/thinlinc/vsm/vsmagent.py:115: DeprecationWarning: There is no current event loop
>   OoOOOOO = asyncio . ensure_future (
> /opt/thinlinc/modules/thinlinc/vsm/vsmagent.py:124: DeprecationWarning: There is no current event loop
>   asyncio . get_event_loop ( ) . call_soon ( self . housekeeping )
Since 3.10 is an older Python version, and we start our event loop before these calls, we can safely ignore these resource warnings since they seem to be a result of bugs in asyncio.
Comment 73 Adam Halim cendio 2023-12-22 15:36:12 CET
Changed /vsmserver/listen_port to an invalid port (90000) and got past the
initial server creation. I only started getting errors when creating a new
session, as the client would get stuck on "Mounting local drives". Nothing out
of the ordinary is logged in neither vsmserver.log nor vsmagent.log. The first
signs of error could be seen from webadm, where the status page displayed
errors. tlct also didn't work (session list/terminate) and returned
> [Errno 111] Connection refused.
This is different from the behaviour in 4.15.0, where the server would
immediately crash and the following would be visible in vsmserver.log:
> Traceback (most recent call last):
>   File "/opt/thinlinc/sbin/vsmserver", line 18, in <module>
>     OOoOoo000O00 ( )
>   File "/opt/thinlinc/sbin/vsmserver", line 15, in OOoOoo000O00
>     VSMServer ( sys . argv )
>   File "/opt/thinlinc/modules/thinlinc/vsm/vsmserver.py", line 137, in __init__
>     self . bind_and_listen ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 499, in bind_and_listen
>     self . bind ( self . server_address )
>   File "/usr/lib64/python3.11/asyncore.py", line 331, in bind
>     return self.socket.bind(addr)
>            ^^^^^^^^^^^^^^^^^^^^^^
> OverflowError: bind(): port must be 0-65535
Ideally, we want to display an error message in the log and terminate early.

vsmagent is also affected by the same issue, but in this case, we at
least get the appropriate error messages on the server and client (the agent
stays silent):
> 2023-12-22 14:54:12 WARNING vsmserver: Error checking if agent 127.0.0.1 is alive: [Errno 111] Connect call failed ('127.0.0.1', 904)
> 2023-12-22 14:54:12 WARNING vsmserver: Marking agent 127.0.0.1 as down
> 2023-12-22 14:54:12 WARNING vsmserver: No working agents found trying to start new session for cendio
> 
> "Thinlinc login failed. (No agent server was availble)" message.
In 4.15.0, the agent receives the same traceback as the master if an invalid
port is configured.
Comment 74 Linn cendio 2023-12-22 15:41:40 CET
Tested vsmagent.py with server build 3455 on Ubuntu 22.04.

 * Housekeeping is called every 10 seconds
 * Successfully modifies sshd config if it includes the ThinLinc markers, or leaves it untouched if it does not
 * Errors encountered when trying to modify the sshd config file is still logged as before
 * If an error is raised in housekeeping(), the traceback has the correct logging format:
> 2023-12-22 14:42:42 ERROR vsmagent: Exception in callback VSMAgent.housekeeping()
> 2023-12-22 14:42:42 ERROR vsmagent: handle: <Handle VSMAgent.housekeeping() created at /opt/thinlinc/modules/thinlinc/vsm/vsmagent.py:124>
> 2023-12-22 14:42:42 ERROR vsmagent: ----------------------------------------
> 2023-12-22 14:42:42 ERROR vsmagent: Object created at (most recent call last):
> 2023-12-22 14:42:42 ERROR vsmagent:   File "/opt/thinlinc/sbin/vsmagent", line 147, in <module>
> 2023-12-22 14:42:42 ERROR vsmagent:     o0OoOOo00OOO ( )
> 2023-12-22 14:42:42 ERROR vsmagent:   File "/opt/thinlinc/sbin/vsmagent", line 105, in o0OoOOo00OOO
> 2023-12-22 14:42:42 ERROR vsmagent:     oOoO00 = VSMAgent ( o00O000o0 , OOO00oO0oOo0O )
> 2023-12-22 14:42:42 ERROR vsmagent:   File "/opt/thinlinc/modules/thinlinc/vsm/vsmagent.py", line 124, in __init__
> 2023-12-22 14:42:42 ERROR vsmagent:     asyncio . get_event_loop ( ) . call_soon ( self . housekeeping )
> 2023-12-22 14:42:42 ERROR vsmagent: ----------------------------------------
> 2023-12-22 14:42:42 ERROR vsmagent: Traceback (most recent call last):
> 2023-12-22 14:42:42 ERROR vsmagent:   File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
> 2023-12-22 14:42:42 ERROR vsmagent:     self._context.run(self._callback, *self._args)
> 2023-12-22 14:42:42 ERROR vsmagent:   File "/opt/thinlinc/modules/thinlinc/vsm/vsmagent.py", line 132, in housekeeping
> 2023-12-22 14:42:42 ERROR vsmagent:     self . update_sshd_permitopen ( )
> 2023-12-22 14:42:42 ERROR vsmagent:   File "/opt/thinlinc/modules/thinlinc/vsm/vsmagent.py", line 349, in update_sshd_permitopen
> 2023-12-22 14:42:42 ERROR vsmagent:     raise ValueError("oh nooo")
> 2023-12-22 14:42:42 ERROR vsmagent: ValueError: oh nooo
> 2023-12-22 14:42:42 ERROR vsmagent: ----------------------------------------
Also looked through the code changes related to this bug for vsmserver.py. The biggest changes are removing the custom exception handler for handling of sshd config file and moving SIGTERM handling out of this file. Changes look good.
Comment 75 Adam Halim cendio 2023-12-22 15:54:01 CET
(In reply to Adam Halim from comment #73)

Looks like too large port numbers result in a wrap-around instead of a crash:

asyncio.open_connection('localhost', 65536 + 101) gives the following error:
> [Errno 111] Connect call failed ('127.0.0.1', 101)
Comment 77 Adam Halim cendio 2023-12-22 17:18:50 CET
(In reply to Adam Halim from comment #73)

This issue has now been fixed, the server/agent crashes and logs an error if an invalid port is configured:
> 2023-12-22 17:10:25 ERROR vsmagent: Port 65536 must be within range 0-65535.
> 2023-12-22 17:10:25 ERROR vsmagent: Exiting
> 
> 2023-12-22 17:11:28 ERROR vsmserver: Port 65536 must be within range 0-65535.
> 2023-12-22 17:11:28 ERROR vsmserver: Exiting
It seems like there is an open issue on CPython about this: https://github.com/python/cpython/issues/68357
Comment 78 Alexander Zeijlon cendio 2023-12-27 16:54:51 CET
Testing of VSM Server and VSM Agent scripts, since these scripts are very similar, I have tested them both at the same time.

Function _handle_exception():
#############################

This function is called when unhandled exceptions are triggered. Make sure that it is called when expected.

vsmserver result:
> 2023-12-27 15:43:23 INFO vsmserver.session: Session 127.0.0.1:10 for cendio has terminated. Removing.
> 2023-12-27 15:43:23 ERROR vsmserver: Exception in callback ServerProtocol._call_callback(<str>, {...}, {...})
> 2023-12-27 15:43:23 ERROR vsmserver: handle: <Handle ServerProtocol._call_callback(<str>, {...}, {...}) created at /opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py:333>
> 2023-12-27 15:43:23 ERROR vsmserver: ----------------------------------------
> 2023-12-27 15:43:23 ERROR vsmserver: Object created at (most recent call last):
> 2023-12-27 15:43:23 ERROR vsmserver:   File "/opt/thinlinc/sbin/vsmserver", line 112, in o0OoOOo00OOO
> 2023-12-27 15:43:23 ERROR vsmserver:     iIIiii1iI . run_forever ( )
> 2023-12-27 15:43:23 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/base_events.py", line 601, in run_forever
> 2023-12-27 15:43:23 ERROR vsmserver:     self._run_once()
> 2023-12-27 15:43:23 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/base_events.py", line 1897, in _run_once
> 2023-12-27 15:43:23 ERROR vsmserver:     handle._run()
> 2023-12-27 15:43:23 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/events.py", line 80, in _run
> 2023-12-27 15:43:23 ERROR vsmserver:     self._context.run(self._callback, *self._args)
> 2023-12-27 15:43:23 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/selector_events.py", line 811, in _read_ready
> 2023-12-27 15:43:23 ERROR vsmserver:     self._read_ready_cb()
> 2023-12-27 15:43:23 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/selector_events.py", line 868, in _read_ready__data_received
> 2023-12-27 15:43:23 ERROR vsmserver:     self._protocol.data_received(data)
> 2023-12-27 15:43:23 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 317, in data_received
> 2023-12-27 15:43:23 ERROR vsmserver:     self . _handle_request ( )
> 2023-12-27 15:43:23 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 333, in _handle_request
> 2023-12-27 15:43:23 ERROR vsmserver:     asyncio . get_event_loop ( ) . call_soon ( self . _call_callback , iIi1 , * i1i1iI1II )
> 2023-12-27 15:43:23 ERROR vsmserver: ----------------------------------------
> 2023-12-27 15:43:23 ERROR vsmserver: Traceback (most recent call last):
> 2023-12-27 15:43:23 ERROR vsmserver:   File "/usr/lib64/python3.9/asyncio/events.py", line 80, in _run
> 2023-12-27 15:43:23 ERROR vsmserver:     self._context.run(self._callback, *self._args)
> 2023-12-27 15:43:23 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 341, in _call_callback
> 2023-12-27 15:43:23 ERROR vsmserver:     self . _response = self . _callback ( methodname , * params )
> 2023-12-27 15:43:23 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/vsmserver.py", line 390, in _handle_user_method
> 2023-12-27 15:43:23 ERROR vsmserver:     oOooOo0o = iII . handle ( method , params , ooOOO00oO0o0o )
> 2023-12-27 15:43:23 ERROR vsmserver:   File "/opt/thinlinc/modules/thinlinc/vsm/handler_newsession.py", line 40, in handle
> 2023-12-27 15:43:23 ERROR vsmserver:     raise ValueError
> 2023-12-27 15:43:23 ERROR vsmserver: ValueError
> 2023-12-27 15:43:23 ERROR vsmserver: ----------------------------------------

vsmagent result:
> 2023-12-27 15:46:01 ERROR vsmagent: Exception in callback ServerProtocol._call_callback(<str>, <str>, <int>)
> 2023-12-27 15:46:01 ERROR vsmagent: handle: <Handle ServerProtocol._call_callback(<str>, <str>, <int>) created at /opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py:333>
> 2023-12-27 15:46:01 ERROR vsmagent: ----------------------------------------
> 2023-12-27 15:46:01 ERROR vsmagent: Object created at (most recent call last):
> 2023-12-27 15:46:01 ERROR vsmagent:   File "/opt/thinlinc/sbin/vsmagent", line 111, in o0OoOOo00OOO
> 2023-12-27 15:46:01 ERROR vsmagent:     iIIiii1iI . run_forever ( )
> 2023-12-27 15:46:01 ERROR vsmagent:   File "/usr/lib64/python3.9/asyncio/base_events.py", line 601, in run_forever
> 2023-12-27 15:46:01 ERROR vsmagent:     self._run_once()
> 2023-12-27 15:46:01 ERROR vsmagent:   File "/usr/lib64/python3.9/asyncio/base_events.py", line 1897, in _run_once
> 2023-12-27 15:46:01 ERROR vsmagent:     handle._run()
> 2023-12-27 15:46:01 ERROR vsmagent:   File "/usr/lib64/python3.9/asyncio/events.py", line 80, in _run
> 2023-12-27 15:46:01 ERROR vsmagent:     self._context.run(self._callback, *self._args)
> 2023-12-27 15:46:01 ERROR vsmagent:   File "/usr/lib64/python3.9/asyncio/selector_events.py", line 811, in _read_ready
> 2023-12-27 15:46:01 ERROR vsmagent:     self._read_ready_cb()
> 2023-12-27 15:46:01 ERROR vsmagent:   File "/usr/lib64/python3.9/asyncio/selector_events.py", line 868, in _read_ready__data_received
> 2023-12-27 15:46:01 ERROR vsmagent:     self._protocol.data_received(data)
> 2023-12-27 15:46:01 ERROR vsmagent:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 317, in data_received
> 2023-12-27 15:46:01 ERROR vsmagent:     self . _handle_request ( )
> 2023-12-27 15:46:01 ERROR vsmagent:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 333, in _handle_request
> 2023-12-27 15:46:01 ERROR vsmagent:     asyncio . get_event_loop ( ) . call_soon ( self . _call_callback , iIi1 , * i1i1iI1II )
> 2023-12-27 15:46:01 ERROR vsmagent: ----------------------------------------
> 2023-12-27 15:46:01 ERROR vsmagent: Traceback (most recent call last):
> 2023-12-27 15:46:01 ERROR vsmagent:   File "/usr/lib64/python3.9/asyncio/events.py", line 80, in _run
> 2023-12-27 15:46:01 ERROR vsmagent:     self._context.run(self._callback, *self._args)
> 2023-12-27 15:46:01 ERROR vsmagent:   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 341, in _call_callback
> 2023-12-27 15:46:01 ERROR vsmagent:     self . _response = self . _callback ( methodname , * params )
> 2023-12-27 15:46:01 ERROR vsmagent:   File "/opt/thinlinc/modules/thinlinc/vsm/vsmagent.py", line 259, in _handle_method
> 2023-12-27 15:46:01 ERROR vsmagent:     ooO0oOO0 = oO0oO . handle ( method , params )
> 2023-12-27 15:46:01 ERROR vsmagent:   File "/opt/thinlinc/modules/thinlinc/vsm/handler_vsmagent_terminatesession.py", line 23, in handle
> 2023-12-27 15:46:01 ERROR vsmagent:     raise ValueError
> 2023-12-27 15:46:01 ERROR vsmagent: ValueError
> 2023-12-27 15:46:01 ERROR vsmagent: ----------------------------------------

✅ Both vsmserver and vsmagent behave as expected.

Function _handle_sigterm():
###########################

This function is called when the server and agent receives SIGTERM.

vsmserver & vsmagent result:
> In a normal running scenario, the SIGTERM-handler is called.
> On the other hand, it is not called when a keyboard-interrupt is sent when it is running in foreground mode.
> - This is because keyboard-interrupts result in a SIGINT which we do not have handling for right now.

❌ Both vsmserver and vsmagent behave as expected when SIGTERM is sent but not when SIGINT is sent.

Test for errors related to hiveconf:
####################################

Test behavior when /opt/thinlinc/etc/thinlinc.hconf does not exist:

vsmserver result:
> systemd[1]: Starting ThinLinc VSM server...
> bash[64876]: Traceback (most recent call last):
> bash[64876]:   File "/opt/thinlinc/sbin/vsmserver", line 148, in <module>
> bash[64876]:     o0OoOOo00OOO ( )
> bash[64876]:   File "/opt/thinlinc/sbin/vsmserver", line 74, in o0OoOOo00OOO
> bash[64876]:     iI1i1I = o00O000o0 . get_string ( "/vsmserver/pidfile" ,
> bash[64876]: AttributeError: 'NoneType' object has no attribute 'get_string'

vsmagent result:
> systemd[1]: Starting ThinLinc VSM agent...
> bash[73651]: Traceback (most recent call last):
> bash[73651]:   File "/opt/thinlinc/sbin/vsmagent", line 147, in <module>
> bash[73651]:     o0OoOOo00OOO ( )
> bash[73651]:   File "/opt/thinlinc/sbin/vsmagent", line 74, in o0OoOOo00OOO
> bash[73651]:     iI1i1I = o00O000o0 . get_string ( "/vsmagent/pidfile" ,
> bash[73651]: AttributeError: 'NoneType' object has no attribute 'get_string'

Test behavior when /opt/thinlinc/etc/thinlinc.hconf does not point to other hconf-files:

vsmserver result:
> systemd[1]: Starting ThinLinc VSM server...
> bash[65157]: The setup has not been completed - please run tl-setup

vsmagent result:
> systemd[1]: Starting ThinLinc VSM agent...
> bash[73760]: The setup has not been completed - please run tl-setup

Test behavior when /opt/thinlinc/etc/thinlinc.hconf points to files in a folder that does not exist:

vsmserver result:
> systemd[1]: Starting ThinLinc VSM server...
> bash[65110]: The setup has not been completed - please run tl-setup

vsmagent result:
> systemd[1]: Starting ThinLinc VSM agent...
> bash[73795]: The setup has not been completed - please run tl-setup

✅ Both vsmserver and vsmagent behave as expected, the services do not start without the hive finding at least the vsm.hconf-file.

Test for errors related to pidfile-creation:
############################################

Test pidfile is removed while the server is running:

vsmserver result:
> Traceback (most recent call last):
>   File "/opt/thinlinc/sbin/vsmserver", line 148, in <module>
>     o0OoOOo00OOO ( )
>   File "/opt/thinlinc/sbin/vsmserver", line 124, in o0OoOOo00OOO
>     os . unlink ( iI1i1I )
> FileNotFoundError: [Errno 2] No such file or directory: '/var/run/vsmserver.pid'

vsmagent result:
> Traceback (most recent call last):
>   File "/opt/thinlinc/sbin/vsmagent", line 147, in <module>
>     o0OoOOo00OOO ( )
>   File "/opt/thinlinc/sbin/vsmagent", line 123, in o0OoOOo00OOO
>     os . unlink ( iI1i1I )
> FileNotFoundError: [Errno 2] No such file or directory: '/var/run/vsmagent.pid'

The services are shutdown even if the pidfiles are missing.

Test pidfile location specified is a symbolic link:

> vsmserver and vsmagent OK!

Test not able to create pid-file:

vsmserver result:
> vsmserver.service: Can't open PID file /run/thinlinc/testing/vsmserver.pid (yet?) after start: Operation not permitted
When folder does not exist:
> FileNotFoundError: [Errno 2] No such file or directory: '/run/thinlinc/testing/vsmserver.pid'
When fs is read-only:
> OSError: [Errno 30] Read-only file system: '/run/thinlinc/testing/vsmserver.pid'

vsmagent result:
> vsmagent.service: Can't open PID file /run/thinlinc/testing/vsmagent.pid (yet?) after start: Operation not permitted
When folder does not exist:
> FileNotFoundError: [Errno 2] No such file or directory: '/run/thinlinc/testing/vsmagent.pid'
When fs is read-only:
> OSError: [Errno 30] Read-only file system: '/run/thinlinc/testing/vsmagent.pid'

Services can not be started if pidfile can not be created.

✅ Both vsmserver and vsmagent behave as expected.

Test for errors related to redirect_logger:
###########################################

Test that asyncio logging messages are relayed to tllogger, tested by sleeping 0.2 seconds in _call_callback in xmlrpc.py

vsmserver & vsmagent result:

With redirect_logger("asyncio") present:
> Logging from asycio is redirected as expected with correct format to our log files.

With redirect_logger("asyncio") not present:
> Logging from asycio is not redirected and has the wrong format.

With redirect_logger("asyncioz") present (wrong module name):
> Logging from asycio is not redirected and has the wrong format, and this occurs without any warnings in our log files.

✅ For both vsmserver and vsmagent, the usage of redirect_logger() has the expected behavior.
❗ It could maybe be worth evaluating if an unsuccessful redirect should result in logging a warning.

Note that there are still some parts of vsmserver and vsmagent scripts left to test.
Comment 79 Adam Halim cendio 2023-12-28 10:28:19 CET
Tested vsmserver.py using SLES12 (Python 3.4.10) and build r40671.

I filled /var/lib/vsm/sessions with garbage characters and got the following error on startup:
> 2023-12-22 14:13:18 ERROR vsmserver: Error loading session database: Error unpacking /var/lib/vsm/sessions ()
> 2023-12-22 14:13:18 ERROR vsmserver: Session database needs manual recovery.
> 2023-12-22 14:13:18 ERROR vsmserver: Exiting
In sessionstore, we do str(e) when we could be doing repr(e) instead? Not
sure if this depends on Python version. This would give us (UnpicklingError()) instead of just () in this example:
> 2023-12-22 14:15:04 ERROR vsmserver: Error loading session database: Error unpacking /var/lib/vsm/sessions (UnpicklingError())
Tested occupying the port used by vsmserver:
> 2023-12-27 16:43:30 ERROR vsmserver: Unable to start server: [Errno 98] error while attempting to bind on address ('0.0.0.0', 9000): address already in use
> 2023-12-27 16:43:30 ERROR vsmserver: Exiting
I tested using nc -Ul
/var/run/thinlinc/master/100 and called the function through a fake client with uid 100. Verified that the ownership was changed from:
> srwxr-xr-x 1 root root 0 Dec 27 16:57 100
to
> srw------- 1 root root 0 Dec 27 16:58 100
Verified that idle user sockets get cleaned up regularly, and re-using the
socket updates the last_used timer.

(User)-methods were tested in bug 8272, comment #5.
In addition to those tests, I tried sending an RPC call with incorrect parameter values, which results in either a traceback or an error message being logged. In both cases, the client gets a 500 Internal error response.

Tested listening on port 9000 before starting vsmserver:
> 2023-12-28 09:21:03 ERROR vsmserver: Unable to start server: [Errno 98] error while attempting to bind on address ('0.0.0.0', 9000): address already in use
> 2023-12-28 09:21:03 ERROR vsmserver: Exiting

Verified that sessions were dumped to disk when shutting down:
> 2023-12-28 09:55:02 INFO vsmserver: Got SIGTERM, signaling process to quit
> 2023-12-28 09:55:02 INFO vsmserver: Terminating. Have a nice day!
> 2023-12-28 09:55:02 DEBUG vsmserver.session: Writing active sessions to disk
> 2023-12-28 09:55:02 DEBUG vsmserver.session: Done writing sessions
Removed a usersocket and created a directory with the same name:
> 2023-12-28 09:53:15 ERROR vsmserver: Error removing user socket /var/run/thinlinc/master/1000: [Errno 21] Is a directory: '/var/run/thinlinc/master/1000'
Licence updating was tested in bug 8232.

Looked through the commits, and the code looks good.
Comment 80 Pierre Ossman cendio 2023-12-28 11:20:07 CET
If you shut down the service with an active connection, then you get a ResourceWarning for that socket and transport:

> 2023-12-28 10:05:41 INFO vsmserver: Got SIGTERM, signaling process to quit
> 2023-12-28 10:05:41 INFO vsmserver: Terminating. Have a nice day!
> sys:1: ResourceWarning: unclosed <socket.socket fd=6, family=10, type=1, proto=6, laddr=('::1', 9000, 0, 0), raddr=('::1', 60516, 0, 0)>
> ResourceWarning: Enable tracemalloc to get the object allocation traceback
> /usr/lib64/python3.12/asyncio/selector_events.py:875: ResourceWarning: unclosed transport <_SelectorSocketTransport fd=6>
>   _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
> ResourceWarning: Enable tracemalloc to get the object allocation traceback

There doesn't seem to be any mechanism to terminate ongoing connections. I've reported this problem here:

https://github.com/python/cpython/issues/113538

For now, I'm not sure if it's worth doing any hacky workaround. These messages aren't seen in release versions, and it should happen rarely enough that it shouldn't be an issue for developers.

I added bug 8274 to track this issue.
Comment 82 Adam Halim cendio 2023-12-28 15:22:20 CET
Found an issue when running vsmagent/vsmserver in the foreground with -f where Ctrl+c would result in some ResourceWarnings and a traceback:
> ^CTraceback (most recent call last):
>   File "./vsmserver", line 148, in <module>
>     main()
>   File "./vsmserver", line 111, in main
>     loop.run_forever()
>   File "/usr/lib64/python3.4/asyncio/base_events.py", line 331, in run_forever
>     self._run_once()
>   File "/usr/lib64/python3.4/asyncio/base_events.py", line 1259, in _run_once
>     event_list = self._selector.select(timeout)
>   File "/usr/lib64/python3.4/selectors.py", line 436, in select
>     fd_event_list = self._epoll.poll(timeout, max_ev)
> KeyboardInterrupt
> sys:1: ResourceWarning: unclosed <socket object at 0x7fe9c04d2588>
> /usr/lib64/python3.4/importlib/_bootstrap.py:2150: ImportWarning: sys.meta_path is empty
> sys:1: ResourceWarning: unclosed <socket object at 0x7fe9c04d2648>

The ResourceWarnings came from not shutting down the server gracefully, and would for example also appear if executing sys.exit() somewhere after create_server() in vsmserver/vsmagent.py.

This has now been fixed as of r40672. The vsmserver/agent shuts the server down gracefully, and no ResourceWarnings appear. Running vsmagent -f and sending Ctrl-c now only gives a traceback:
> ^C2023-12-28 14:23:19 INFO vsmagent: Terminating. Have a nice day!
> Traceback (most recent call last):
>   File "./vsmagent", line 150, in <module>
>     o0OoOOo00OOO ( )
>   File "./vsmagent", line 107, in o0OoOOo00OOO
>     OO0o0O0o0 ( OOO00oO0oOo0O )
>   File "./vsmagent", line 119, in OO0o0O0o0
>     iIIiii1iI . run_forever ( )
>   File "/usr/lib64/python3.4/asyncio/base_events.py", line 331, in run_forever
>     self._run_once()
>   File "/usr/lib64/python3.4/asyncio/base_events.py", line 1259, in _run_once
>     event_list = self._selector.select(timeout)
>   File "/usr/lib64/python3.4/selectors.py", line 436, in select
>     fd_event_list = self._epoll.poll(timeout, max_ev)
> KeyboardInterrupt
Comment 83 Pierre Ossman cendio 2023-12-28 16:20:17 CET
Tested the task cancellation on shutdown on Fedora 39. Found an issue, which was resolved on bug 8224. After re-testing, everything now works as expected.
Comment 84 Pierre Ossman cendio 2023-12-28 16:41:14 CET
Everything changed should now be verified, either here or on one of the split off bugs.

Release notes also look good.

> MUST:
> 
>  * ThinLinc no longer relies on the system asyncore module
> 

There is not a single mention of asyncore or asynchat in the entire tree.

>  * The main loop is asyncio based so that asyncio tasks and callbacks can be scheduled
> 

Yup. We use a plain asyncio event loop that we get from asyncio.new_event_loop().

> SHOULD:
> 
>  * There is no coroutine syntax in ThinLinc code (new or old syntax)
> 

There is no instance of 'async def', 'await', or 'coroutine in the tree.

>  * Everything uses asyncio terminology rather than asyncore such
> 

As far as I can see, yes. We still use the term "channel" for one of the XML-RPC exceptions, but that is probably acceptable.

>  * There is no ThinLinc code that is redundant with asyncio functionality
> 

Not that I can find, no. The closest thing is that we sort-of duplicate asyncio.run(). But that is too new for us to use anyway, and it doesn't have everything we want.

>  * Everything is converted to asyncio models and abstractions (e.g. protocols)

Yes, we make heavy use of protocols and futures to follow asyncio's model.

Note You need to log in before you can comment on or make changes to this bug.