I've been having problems at home with ThinLinc losing sessions because of problems with looking up users. Today I discovered it is because Python (and possibly glibc) doesn't properly handle when the connection to the user database is lost. What happens is that the first lookup done after a lost connection returns "no such user", but any requests after that will proceed normally. This could explain some of the weird behaviours we've seen on some installations. A careful implementation of bug 2754 could solve this. A quick and dirty solution is just to do a wrapper that does each lookup twice. We should also investigate if the problem is in Python or glibc and if we can work around it in a sane manner.
This bug might be related to Issue 6892. A quick inspection of Pythons Modules/pwdmodule.c indicates that it doesn't check errno, and thus cannot distinguish between "no more users" and error conditions.
Confirmed on Midi: [astrand@midi ~]$ python Python 2.4.3 (#1, Jan 14 2008, 18:32:40) [GCC 4.1.2 20070626 (Red Hat 4.1.2-14)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import pwd >>> pwd.getpwnam("astrand") ('astrand', '*', 164, 20164, 'Peter Astrand', '/home/astrand', '/bin/bash') >>> >>> pwd.getpwnam("astrand") Traceback (most recent call last): File "<stdin>", line 1, in ? KeyError: 'getpwnam(): name not found: astrand' >>> >>> pwd.getpwnam("astrand") ('astrand', '*', 164, 20164, 'Peter Astrand', '/home/astrand', '/bin/bash') >>> I was restarting slapd between the first successful call and the traceback one.
The bug in Python is that it doesn't distinguish between no user and errors. nss_ldap is also buggy, though, see bug 2956. Need to report upstream. We should probably also change our Python code to retry getpwnam an extra time if it raises KeyError.
Reported upstream: http://bugs.python.org/issue4261
We haven't had reports in ages, and this isn't really a bug in ThinLinc but rather in Python. Closing this.