With VSM Server rev 14337 and earlier, at high load and with some bad luck, the following traceback was sometimes triggered: 2008-01-29 11:15:51 ERROR vsmserver.extcmd: Unhandled exception getting list of groups for user13: exceptions.NotImplementedError Unhandled write event on <thinlinc.vsm.extproc.GetUserGroups instance at 0x8e37b6c>:['/opt/thinlinc/libexec/tl-initgroups', u'user13']:35 Traceback (most recent call last): File "/opt/thinlinc/modules/thinlinc/vsm/async.py", line 115, in iii1i obj . handle_write_event ( fd ) File "/opt/thinlinc/modules/thinlinc/vsm/extproc.py", line 317, in handle_write_event raise NotImplementedError ( "Unhandled write event on %s:%s:%d" % ( repr ( self ) , repr ( self . args ) , fd ) ) NotImplementedError: Unhandled write event on <thinlinc.vsm.extproc.GetUserGroups instance at 0x8e37b6c>:['/opt/thinlinc/libexec/tl-initgroups', u'user13']:35 . 2008-01-29 11:15:51 ERROR vsmserver: Killing process with pid 3486 Traceback (most recent call last): File "/opt/thinlinc/sbin/vsmserver", line 20, in ? VSMServer ( sys . argv ) File "/opt/thinlinc/modules/thinlinc/vsm/vsmserver.py", line 141, in __init__ self . loop ( ) File "/opt/thinlinc/modules/thinlinc/vsm/async.py", line 381, in loop self . _select_loop ( oO000O0oo ) File "/opt/thinlinc/modules/thinlinc/vsm/async.py", line 245, in _select_loop iii1i ( oO0 , OOOo0OO000 ) File "/opt/thinlinc/modules/thinlinc/vsm/async.py", line 120, in iii1i obj . handle_error ( fd ) File "/opt/thinlinc/modules/thinlinc/vsm/extproc.py", line 351, in handle_error self . handle_close ( fd ) File "/opt/thinlinc/modules/thinlinc/vsm/extproc.py", line 282, in handle_close self . handle_exit ( ) File "/opt/thinlinc/modules/thinlinc/vsm/extproc.py", line 440, in handle_exit Ii11Ii = [ int ( i1iiIII ) for i1iiIII in self . stdout_data . strip ( ) . split ( " " ) ] ValueError: invalid literal for int(): The root cause for this problem was that the following could happen: 1) The list of readable fd's as returned by select.select was iterated over. 2) A readable object with fd X registered that was also in the writable list would find that it was no longer needed, and unregister itself. 3) Another readable object would start up a third object, that registered its FD, and that fd was by bad luck X. 4) When the list of writable fd's were iterated over, the wrong object would be found when looking up fd X in fd_map, and if that object was not equipped with a handle_write function, the traceback would occur. A good long-term solution to this is to make sure that the writable() function only returns True when absolutely neccesary. As a short term solution, I've made unregister_fd remove fd's that are unregistered from the list of writable objects.
We don't have our own select loop any more as of bug 7636. It's likely that this issue is resolved.
This issue is indeed solved now. The select loop is now handled by Python's asyncio, so the complexity is definitely out of our hands. The listed scenario is also safe under asyncio as it builds a queue of all callbacks after calling select(). That means any changes in callbacks for different fd:s won't take effect until the next iteration of the loop. In the scenario listed above, the original write callback will be called¹, not the new one. ¹ Which will likely crash as the fd is now closed. But that's a different issue and isn't necessarily the event loop's fault.
When testing, I decided to try and recreate the scenario described in comment 0 but in asyncio. Unfortunately, I did not succeed in recreating it. However, the code in asyncio looks appropriate for this scenario and we have fully swapped from using our own implementation to using asyncio. Because of this, I believe this bug can be considered done, even without recreating this edge case. Closing.