We have gotten a report that the Windows client will lock up if the user has too many sessions running. The last line of the log is: > 2017-12-14T16:06:55: Calling XML-RPC method 'get_user_sessions' The problem does not occur on other platforms. The issue seems to be with the IPC between ssh and tlclient. Some more debug logging from XML-RPC shows this on Windows: > 2017-12-14T16:06:55: XmlRpcSocket::nbRead: read/recv returned 4095. > 2017-12-14T16:06:55: XmlRpcClient::readHeader: client has read 4095 bytes > 2017-12-14T16:06:55: client read content length: 11205 Whilst on Linux it doesn't hang there: > 2017-12-14T15:20:24: XmlRpcSocket::nbRead: read/recv returned 4095. > 2017-12-14T15:20:24: XmlRpcClient::readHeader: client has read 4095 bytes > 2017-12-14T15:20:24: client read content length: 11205 > 2017-12-14T15:20:24: XmlRpcSocket::nbRead: read/recv returned 4095. > 2017-12-14T15:20:24: XmlRpcSocket::nbRead: read/recv returned 3057. > 2017-12-14T15:20:24: XmlRpcClient::readResponse (read 11205 bytes) The IPC consists of pipes, and increasing the buffer of the pipes makes everything start working. So the issue seems to be that we aren't handling full pipe buffers properly. The tlclient side of things are very simple so I don't think the issue is there. So it's either in ssh, or the data gets dropped by Windows somewhere.
More debugging and the issue is in ssh. There is no way to check if a pipe is writeable, so we simply claim it always is. Microsoft's documentation claims that a pipe should be blocking by default, so the expected behaviour is intermittent hangs in ssh until tlclient empties the pipe buffer. However in practice it is non-blocking and write() returns ENOSPC. Need to check if the documentation is wrong or if write() is misbehaving.
The documentation was wrong. The pipes are non-blocking by default. And setting them to blocking solves the issue.
Or maybe not... I found some code in ssh that sets things to non-blocking (haven't checked if it is called yet). However that code also figured out some way to check the outgoing buffer. There might be room to improve things.
Seems to work well now. Tester should check that the Windows client can connect to a server where the user already has many (5+) sessions.
I could reproduce the issue on Windows 10 with client build 5621 and can verify that it is fixed in build 5656. I could start 10 sessions with the same user and the same server without any problem.