Bug 5556 - traceback in vsmserver when stressing Web Access
Summary: traceback in vsmserver when stressing Web Access
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: VSM Server (show other bugs)
Version: pre-1.0
Hardware: PC Unknown
: P2 Normal
Target Milestone: 4.16.0
Assignee: Pierre Ossman
URL:
Keywords: aleze_tester, prosaic, samuel_tester
Depends on:
Blocks:
 
Reported: 2015-05-28 11:12 CEST by Pierre Ossman
Modified: 2023-11-03 11:09 CET (History)
2 users (show)

See Also:
Acceptance Criteria:
MUST: * Servers should ignore all requests but the first one on each connection


Attachments

Description Pierre Ossman cendio 2015-05-28 11:12:40 CEST
Whilst stressing the HTML client on bug 765, I kept getting these tracebacks:

2015-05-28 11:09:14 ERROR vsmserver: Unhandled XMLRPC exception: <type 'exceptions.NotImplementedError'> Subclass must implement handle_method Traceback (most recent call last):
  File "/opt/thinlinc/modules/thinlinc/vsm/async.py", line 112, in iii11
    obj . handle_read_event ( )
  File "/usr/lib64/python2.7/asyncore.py", line 449, in handle_read_event
    self.handle_read()
  File "/usr/lib64/python2.7/asynchat.py", line 140, in handle_read
    self.found_terminator()
  File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 530, in found_terminator
    self . handle_request ( )
  File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 544, in handle_request
    self . handle_method ( )
  File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 547, in handle_method
    raise NotImplementedError , "Subclass must implement handle_method"
NotImplementedError: Subclass must implement handle_method

We've also had this reported on issue 15346.
Comment 1 Samuel Mannehed cendio 2018-04-23 16:49:54 CEST
I saw this traceback in vsmserver.log now on Ubuntu 16.04 with 4.9.0rc2. I did not stress Web Access, I simply reconnected to an existing session once..

Can't reproduce by simply reconnecting again.
Comment 6 Pierre Ossman cendio 2023-10-19 16:21:05 CEST
I was able to reproduce this reliably:

1. Create a malicious client that sends a request, waits 1 second, then sends a new request. It does not wait for the response for the first request.

2. Add a 5 second (asynchronous) delay to the server's handler of the request, giving enough time for the second request to arrive.
Comment 8 Pierre Ossman cendio 2023-10-19 16:37:06 CEST
The new code avoids swapping ownership of the socket, so this problem is avoided. Test code can no longer trigger the issue.
Comment 11 Samuel Mannehed cendio 2023-11-03 11:01:32 CET

Reproduced the error in 4.15.0 using the diff in attachment 1167 [details]:

> ==> /var/log/vsmserver.log <==
> 2023-11-02 16:54:39 DEBUG2 vsmserver.session: Verifying sessions for 1 users
> 2023-11-02 16:54:39 DEBUG3 vsmserver.session: Verifying 1 session(s) on 127.0.0.1
> 2023-11-02 16:54:39 DEBUG3 vsmserver.session: Calling verify_sessions: 1
> 2023-11-02 16:54:39 DEBUG3 vsmserver.session: Delaying another verify_sessions call..
> 
> ==> /var/log/vsmagent.log <==
> 2023-11-02 16:54:39 DEBUG vsmagent: Handling connection from ('127.0.0.1', 1023)
> 2023-11-02 16:54:39 DEBUG2 vsmagent: Handling method verify_sessions
> 
> ==> /var/log/vsmserver.log <==
> 2023-11-02 16:54:40 DEBUG3 vsmserver.session: Calling verify_sessions: 2
> 
> ==> /var/log/vsmagent.log <==
> 2023-11-02 16:54:40 ERROR vsmagent: Unhandled XMLRPC exception: <class 'NotImplementedError'> Subclass must implement handle_method Traceback (most recent call last):
>   File "/opt/thinlinc/modules/thinlinc/vsm/asyncbase.py", line 104, in ooOOOoOO0
>     obj . handle_read_event ( )
>   File "/usr/lib64/python3.9/asyncore.py", line 420, in handle_read_event
>     self.handle_read()
>   File "/usr/lib64/python3.9/asynchat.py", line 151, in handle_read
>     self.found_terminator()
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 555, in found_terminator
>     self . handle_request ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 569, in handle_request
>     self . handle_method ( )
>   File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 572, in handle_method
>     raise NotImplementedError ( "Subclass must implement handle_method" )
> NotImplementedError: Subclass must implement handle_method
> 
> ==> /var/log/vsmserver.log <==
> 2023-11-02 16:54:40 WARNING vsmserver.session: Connection to 127.0.0.1:904 closed before receiving response verifying sessions. 

build 3414 with patch from attachment

> ==> /var/log/vsmserver.log <==
> 2023-11-03 10:16:05 DEBUG2 vsmserver.session: Verifying sessions for 1 users
> 2023-11-03 10:16:05 DEBUG3 vsmserver.session: Verifying 1 session(s) on 127.0.0.1
> 2023-11-03 10:16:05 DEBUG3 vsmserver.session: Calling verify_sessions: 1
> 2023-11-03 10:16:05 DEBUG3 vsmserver.session: Delaying another verify_sessions call..
> 
> ==> /var/log/vsmagent.log <==
> 2023-11-03 10:16:05 DEBUG2 vsmagent: Handling method verify_sessions from ('127.0.0.1', 1023)
> 
> ==> /var/log/vsmserver.log <==
> 2023-11-03 10:16:05.777588 Delaying response..
> 
> ==> /var/log/vsmserver.log <==
> 2023-11-03 10:16:06 DEBUG3 vsmserver.session: Calling verify_sessions: 2
> 
> ==> /var/log/vsmagent.log <==
> 2023-11-03 10:16:06 DEBUG2 vsmagent: Handling method verify_sessions from ('127.0.0.1', 1023)
> 
> ==> /var/log/vsmserver.log <==
> 2023-11-03 10:16:06.778801 Delaying response..
> 2023-11-03 10:16:10.778737 Handling response ..
> 2023-11-03 10:16:10 DEBUG2 vsmserver.session: Got session verification from agent 127.0.0.1
> 2023-11-03 10:16:10 DEBUG3 vsmserver.session: Session 127.0.0.1:10 for cendio successfully verified
> 
> ==> /var/log/vsmagent.log <==
> 2023-11-03 10:16:10 DEBUG2 vsmagent: Handling method verify_sessions from ('127.0.0.1', 1023)
> 
> ==> /var/log/vsmserver.log <==
> 2023-11-03 10:16:11.780042 Handling response ..
> 2023-11-03 10:16:11 DEBUG2 vsmserver.session: Got session verification from agent 127.0.0.1
> 2023-11-03 10:16:11 DEBUG3 vsmserver.session: Session 127.0.0.1:10 for cendio successfully verified
Comment 12 Samuel Mannehed cendio 2023-11-03 11:06:51 CET
The previous comment was meant to reference attachment 1168 [details] when testing build 3414.

We looked through the commit, at first sight it wasn't obvious why it fixed this race. Anyway, what we gathered was; since the old code re-used the socket for the second connection, we could end up in a broken state where the machinery required a response, but wasn't guaranteed to get one.

> MUST:
> 
>  * Servers should ignore all requests but the first one on each connection

They do, and we had to modify the server a bit to even be able to send a second one.
Comment 13 Samuel Mannehed cendio 2023-11-03 11:09:59 CET
The new code is built in a way where issues like this are virtually impossible. The architecture looks robust and there are way fewer subclass layers, which makes the data flow much easier to understand. Very nice.

Note You need to log in before you can comment on or make changes to this bug.