If there is a network error on the first poll of an agent then we get this crash: > 2020-07-08 12:03:05 WARNING vsmserver.loadinfo: [Errno 101] ENETUNREACH talking to VSM Agent tl.cendio.se:904 in request for loadinfo. Marking as down. > 2020-07-08 12:03:05 ERROR vsmserver: Exception in error handler for <thinlinc.vsm.call_getload.GetLoadCall at 0x7fe3b0206e10>: <type 'exceptions.AttributeError'> loadbalancer Traceback (most recent call last): > File "/opt/thinlinc/modules/thinlinc/vsm/xmlrpc.py", line 240, in handle_error > O0ooO0Oo00o = self . handle_known_errors ( ) > File "/opt/thinlinc/modules/thinlinc/vsm/call_getload.py", line 35, in handle_known_errors > self . parent . loadbalancer . update_loadinfo ( self . url , None ) > File "/opt/thinlinc/modules/thinlinc/vsm/async.py", line 439, in __getattr__ > raise AttributeError , attr > AttributeError: loadbalancer Unfortunately because of bug 7530 this wedges that agent in a permanently downed state.
Also see bug 4243, which is similar but not quite as severe. I suspect this is getting worse because of bug 4290 as we are likely starting before the network is up now.
Fixed now. Tester need to make sure there are no errors in vsmserver.log when starting vsmserver without network. Make sure that the load update cycle continues as expected. Also note that it is bug 7530 that makes sure we don't lose the agent forever, even with the error.
Reproduced this issue on 4.12.0 by utilizing 'unshare'. Tested on RHEL8 server with nightly (build 6718). No errors shown in vsmserver.log and the update cycle continues as expected. Also, relnotes looks good.