Whilst investigating bug 5098, we realised that we have a much broader problem with encoding of user names. The first step of a ThinLinc connection is to connect to SSH. Although the SSH protocol mandates UTF-8 for the user name, OpenSSH completely ignores this and just treats it as a binary blob. So, no matter how much fancy handling we do in ThinLinc, OpenSSH will never respect the locale. And even if it did, LANG is not properly set for sshd on Debian based systems. What this all means is that we have to make an assumption about what character encoding the user names are in. The client currently uses UTF-8 no matter what the client side locale is. One upside of such a restriction is that we can get rid of all the locale_encode()/locale_decode() handling we have everywhere we use a user name. With some luck we can also get rid of the other few cases they are used as well, which means that ever server processes are no longer dependent on a proper locale. This is extra beneficial on Debian systems where locale is normally not set for system daemons (see bug 5098).
One data point is that Gnome now requires a UTF-8 locale. See their FAQ on Gnome Terminal under Exit status 8: https://wiki.gnome.org/Apps/Terminal/FAQ#Exit_status_8
Python 3 is also affecting this issue in that it has a bunch of implicit conversions to UTF-8 in some cases and the current locale in other cases. Python 2 generally didn't do any implicit conversions, except for file names which uses the current locale. For some odd reason Python 3 uses UTF-8 there instead, even though it also is aware of the current locale.
Also note that we now start our services through systemd on all distributions (bug 4290), so we don't know what the current behaviour really is. Given the comments on bug 5098 it seems like the fix in r13738 might not be needed anymore.
(In reply to Pierre Ossman from comment #2) > > Python 2 generally didn't do any implicit conversions, except for file names > which uses the current locale. For some odd reason Python 3 uses UTF-8 there > instead, even though it also is aware of the current locale. Apparently this isn't true. Python 3 uses the locale for file names (and most system calls it seems). However for a bad locale, or the default locale it falls back to UTF-8 rather than ASCII. Test cases: > $ LANG=C python3 -c 'import sys; print(sys.getdefaultencoding(), sys.getfilesystemencoding())' > utf-8 utf-8 > > $ LANG=sv_SE.ISO8859-1 python3 -c 'import sys; print(sys.getdefaultencoding(), sys.getfilesystemencoding())' > utf-8 iso8859-1 > > $ LANG=sv_SE.UTF-8 python3 -c 'import sys; print(sys.getdefaultencoding(), sys.getfilesystemencoding())' > utf-8 utf-8 > > $ LANG=fofofo python3 -c 'import sys; print(sys.getdefaultencoding(), sys.getfilesystemencoding())' > utf-8 utf-8 Compared to Python 2: > $ LANG=C python2 -c 'import sys; print(sys.getdefaultencoding(), sys.getfilesystemencoding())' > ('ascii', 'ANSI_X3.4-1968') > > $ LANG=sv_SE.ISO8859-1 python2 -c 'import sys; print(sys.getdefaultencoding(), sys.getfilesystemencoding())' > ('ascii', 'ISO-8859-1') > > $ LANG=sv_SE.UTF-8 python2 -c 'import sys; print(sys.getdefaultencoding(), sys.getfilesystemencoding())' > ('ascii', 'UTF-8') > > $ LANG=fofofo python2 -c 'import sys; print(sys.getdefaultencoding(), sys.getfilesystemencoding())' > ('ascii', 'ANSI_X3.4-1968') This change was implemented in PEP 540: https://www.python.org/dev/peps/pep-0540/ However 3.6 and older don't have this change, so they are more similar to Python 2: > $ LANG=C python3 -c 'import sys; print(sys.getdefaultencoding(), sys.getfilesystemencoding())' > utf-8 ascii > > $ LANG=sv_SE.ISO8859-1 python3 -c 'import sys; print(sys.getdefaultencoding(), sys.getfilesystemencoding())' > utf-8 iso8859-1 > > $ LANG=sv_SE.UTF-8 python3 -c 'import sys; print(sys.getdefaultencoding(), sys.getfilesystemencoding())' > utf-8 utf-8 > > $ LANG=fofofo python3 -c 'import sys; print(sys.getdefaultencoding(), sys.getfilesystemencoding())' > utf-8 ascii
Python 3 also doesn't accept "bytes" everywhere. E.g. getpnam() requires "str", so we have no choice but to rely on Python's own encoding/decoding.
We'll be gradually removing the need for locale_encode()/locale_decode() as part of bug 4586. Once that is done all that is left here is to verify that services are indeed started with the correct locale, and poke OpenSSH about their handling.
Reported to OpenSSH: https://bugzilla.mindrot.org/show_bug.cgi?id=3225
With hiveconf now supporting Python 3, we have had to make a decision about which encoding it should use. Since the shipped .hconf-files of ThinLinc always are encoded in UTF-8, hiveconf have to always use UTF-8, regardless of what the system's locale is. For more information, see bug 7557.
systemd (and dbus) also seem to refuse to fully work with non-UTF-8 systems: > POSIX does not specify the encoding of non-ASCII environment variable names or > values and allows them to contain any non-zero byte, but neither dbus-daemon > nor systemd supports environment variables with non-UTF-8 names or values. > Accordingly, dbus-update-activation-environment assumes that any name or value > that appears to be valid UTF-8 is intended to be UTF-8, and ignores other names > or values with a warning. https://dbus.freedesktop.org/doc/dbus-update-activation-environment.1.html The username is unfortunately common in environment variables (e.g. the homedir, and the ThinLinc session dir).