The sessions database (/var/lib/vsm/sessions) and HA changes database (/var/lib/vsm/changes) are overwritten if they cannot be read or parsed by vsmserver, making manual recovery of these, potentially valuable files, practically impossible. The consequence of this is loss of user sessions and HA nodes getting out of sync. Ideally we want to handle this in a better, preferably by terminating and letting the system administrator either remove or recover the file(s) manually. Steps to reproduce: > # systemctl stop vsmserver > # echo "this file is not healthy" > /var/lib/vsm/sessions > # systemctl start vsmserver > <start a new session> (the same applies to /var/lib/vsm/changes) We currently get one of these error logs in /var/log/vsmserver.log in case of the above scenario: > End of file reading /var/lib/vsm/sessions - was there an error writing sessions file to disk? > Error unpacking session database /var/lib/vsm/sessions. No session database loaded > Error decoding session database /var/lib/vsm/sessions. No session database loaded > End of file reading /var/lib/vsm/changes - was there an error writing HA changes file to disk? > Error unpacking HA changes database /var/lib/vsm/changes. No HA changes loaded > Error decoding HA changes database /var/lib/vsm/changes. No HA changes loaded
Note that many errors cause a crash of vsmagent instead of starting with an empty database. See bug 5631.
We currently have three failure modes when loading the HA and session database: 1. End of file was reached when parsing the database. This mode can be triggered by emptying the database file, for example: `> <database file>` 2. The database file cannot be parsed by pickle. This mode can be triggered by writing garbage to the database file, for example: `echo "ARGHHH!" > <database file>` 3. The database files contain python2 bytestrings with characters outside the ASCII-range. This mode can be triggered by writing a pickled non-ASCII string to the database file, for example: `echo -e "S'\\xf6l'\np0\n." > <database file>`. For these three failure modes we now expect that `vsmserver` terminates without overwriting the database and provide helpful messages in the logs explaining the situation. Note that the HA and sessions database are separate and defaults to these two paths: - Sessions: `/var/lib/vsm/sessions` - HA: `/var/lib/vsm/changes` Also, note that the HA database is only loaded when ThinLinc is running with HA enabled.
Tested on Fedora 33 and everything seems to work as expected. Marking as resolved.
Tested corrupting the databases in various ways on Ubuntu 20.04. Screwed up data: > 2021-07-20 09:03:15 ERROR vsmserver: Error loading session database: Error unpacking /var/lib/vsm/sessions (pickle data was truncated) > 2021-07-20 09:03:15 ERROR vsmserver: Session database needs manual recovery. > 2021-07-20 09:03:15 ERROR vsmserver: Exiting Empty file: > 2021-07-20 09:05:16 ERROR vsmserver: Error loading session database: End of file reading /var/lib/vsm/sessions > 2021-07-20 09:05:16 ERROR vsmserver: Session database needs manual recovery. > 2021-07-20 09:05:16 ERROR vsmserver: Exiting Bad string data: > 2021-07-20 09:14:51 ERROR vsmserver: Error loading session database: Error decoding string in /var/lib/vsm/sessions ('ascii' codec can't decode byte 0xe3 in position 5: ordinal not in range(128)) > 2021-07-20 09:14:51 ERROR vsmserver: Session database needs manual recovery. > 2021-07-20 09:14:51 ERROR vsmserver: Exiting
Also tested corrupt HA changes file on the same machine. Screwed up data: > 2021-07-20 10:05:29 ERROR vsmserver: Error loading HA changes database: Error unpacking /var/lib/vsm/changes (pickle data was truncated) > 2021-07-20 10:05:29 ERROR vsmserver: HA changes database needs manual recovery. > 2021-07-20 10:05:29 ERROR vsmserver: Exiting Empty file: > 2021-07-20 10:06:18 ERROR vsmserver: Error loading HA changes database: End of file reading /var/lib/vsm/changes > 2021-07-20 10:06:18 ERROR vsmserver: HA changes database needs manual recovery. > 2021-07-20 10:06:18 ERROR vsmserver: Exiting
(In reply to William Sjöblom from comment #0) > The sessions database (/var/lib/vsm/sessions) and HA changes database > (/var/lib/vsm/changes) are overwritten if they cannot be read or parsed by > vsmserver, making manual recovery of these, potentially valuable files, > practically impossible. The consequence of this is loss of user sessions and > HA nodes getting out of sync. > > Ideally we want to handle this in a better, preferably by terminating and > letting the system administrator either remove or recover the file(s) > manually. > > Steps to reproduce: > > # systemctl stop vsmserver > > # echo "this file is not healthy" > /var/lib/vsm/sessions > > # systemctl start vsmserver > > <start a new session> > (the same applies to /var/lib/vsm/changes) > > We currently get one of these error logs in /var/log/vsmserver.log in case > of the above scenario: > > End of file reading /var/lib/vsm/sessions - was there an error writing sessions file to disk? > > Error unpacking session database /var/lib/vsm/sessions. No session database loaded > > Error decoding session database /var/lib/vsm/sessions. No session database loaded > > End of file reading /var/lib/vsm/changes - was there an error writing HA changes file to disk? > > Error unpacking HA changes database /var/lib/vsm/changes. No HA changes loaded > > Error decoding HA changes database /var/lib/vsm/changes. No HA changes loaded Hi what is the solution for End of file reading /var/lib/vsm/sessions - was there an error writing sessions file to disk? when the vsmserver failed to load because of this and the /var/lib/vsm/sessions file is empty?
Hi Alvin, sorry that your question got forgotten and that we have not given you an answer. This bugzilla isn't the ideal place for discussing how to recover from errors. I realize that more than a year has passed, so if you're still having problems — please create a post in our community: https://community.thinlinc.com/