Under some circumstances the ThinLinc client fails to connect to the agent server leaving following line in the log file: ssh[E]: select: An operation was attempted on something that is not a socket. and then ssh client is terminating. Here follows a debug log for a successful connection: ssh[E]: debug1: Sending command: thinlinc-login dummy ssh[E]: THINLINC-LOGIN: HELLO 4.9.0 ssh[E]: debug1: client_input_channel_req: channel 2 rtype exit-status reply 0 ssh[E]: debug1: client_input_channel_req: channel 2 rtype eow@openssh.com reply 0 ssh[E]: ssh[E]: COMMAND_EXITSTATUS: 0 And when it fails the following debug log is retreived: ssh[E]: debug1: Sending command: thinlinc-login dummy ssh[E]: debug1: client_input_channel_req: channel 2 rtype exit-status reply 0 ssh[E]: debug1: client_input_channel_req: channel 2 rtype eow@openssh.com reply 0 ssh[E]: debug1: channel 0: free: port listener, nchannels 3 ssh[E]: debug1: channel 1: free: port listener, nchannels 2 ssh[E]: debug1: channel 2: free: client-session, nchannels 1 ssh[E]: debug1: fd 0 clearing O_NONBLOCK ssh[E]: debug1: fd 1 clearing O_NONBLOCK ssh[E]: debug1: fd 2 clearing O_NONBLOCK ssh[E]: select: An operation was attempted on something that is not a socket. ssh[E]: Transferred: sent 2516, received 3124 bytes, in f seconds ssh[E]: Bytes per second: sent f, received f ssh[E]: debug1: Exit status 0 WinProcess: Terminating process 6684 (ssh.exe)...
Gathered information from related support issues: - Issue only affects Windows client platform - Problem only occurs while client is connected to WiFi network, if connected to physical network the problem goes away - Issue does not affect all users with windows client platform - Users with problems retries the connection for 30 minutes and will eventually do a successful connection - Using ThinLinc client 4.8.1 solves the issue
I haven't been able to reproduce the issue for debugging, so I have looked into code. Found commit r32964 which seems related (rewrite of Win32 select() emualtion) that was a change for 4.9.0 release.
(In reply to comment #3) > Found commit r32964 which seems related (rewrite of Win32 select() > emualtion) that was a change for 4.9.0 release. > I reverted this commit and rebuilt ssh.exe to sent it to the two customers for testing. One replied with this: "I tried out the modified ssh.exe and as far as I can tell it does appear to fix the problem. It's a little hard to be 100% sure because the conditions to create the failure are not precisely known and so the failure is not always reproducible. In my case however, I did find the first time I tried to connect with the original unmodified ssh.exe, I had the failure occur. Then I tried 10 times with the modified ssh.exe and the thinlinc client connected correctly all 10 times. Hopefully your other customer will also get the same promising results." Which sounds good enough to dive into the code change in r32964 for bugs that could trigger the issue.
Commit r32964 removes a check for if a fd is a FIFO in the write set. This means that if write fd set only have a FIFO fd and that the read set is empty or only contains non tty/fifo fd(s), a select() call will be made on the FIFO which is invalid.
(In reply to comment #5) > Commit r32964 removes a check for if a fd is a FIFO in the write set. This > means that if write fd set only have a FIFO fd and that the read set is empty > or only contains non tty/fifo fd(s), a select() call will be made on the FIFO > which is invalid. > I added a debug log for 'client-session' creation and dumping write and read fd sets that produces the following log for when the issue appears: 2018-08-22T15:26:54: ssh[E]: debug1: channel 2: new [client-session] 2018-08-22T15:26:54: ssh[E]: debug1: ssh_session2_open: 'client-session' channel using fds in(3), out(4), err(5) 2018-08-22T15:26:54: ssh[E]: debug1: ssh_session2_open: channel_new: 2 . . . 2018-08-22T15:26:54: ssh[E]: debug1: client_input_channel_req: channel 2 rtype exit-status reply 0 2018-08-22T15:26:54: ssh[E]: debug1: client_input_channel_req: channel 2 rtype eow@openssh.com reply 0 2018-08-22T15:26:54: ssh[E]: debug1: win32_select writefds(5) 2018-08-22T15:26:54: ssh[E]: debug1: win32_select readfds(436, 440, 424) 2018-08-22T15:26:54: ssh[E]: debug1: channel 0: free: port listener, nchannels 3 2018-08-22T15:26:54: ssh[E]: debug1: channel 1: free: port listener, nchannels 2 2018-08-22T15:26:54: ssh[E]: debug1: channel 2: free: client-session, nchannels 1 2018-08-22T15:26:54: ssh[E]: debug1: fd 0 clearing O_NONBLOCK 2018-08-22T15:26:54: ssh[E]: debug1: fd 1 clearing O_NONBLOCK 2018-08-22T15:26:54: ssh[E]: debug1: fd 2 clearing O_NONBLOCK 2018-08-22T15:26:54: ssh[E]: select: En åtgärd försökte utföras på något som inte är en socket. 2018-08-22T15:26:54: ssh[E]: Transferred: sent 1804, received 2916 bytes, in f seconds 2018-08-22T15:26:54: ssh[E]: Bytes per second: sent f, received f 2018-08-22T15:26:54: ssh[E]: debug1: Exit status 0 fd 5 is connected to a FIFO and my conclusion is that all fds over 400 are sockets. This means that this bug as described in comment #5 is the reason for the failed select()
Created attachment 884 [details] Fix issue with select for fifos
Peter have tested the fix on his system which had a very how rating of reproducing the issue, one out of two tests fail. He verified that he couldn't reproduce the issue with the fix (attached patached).
We've gotten confirmation from one customer that the patch solves the issue.
Customers have reported seeing this issue even for active connections and not only for new ones.
(In reply to comment #12) > Customers have reported seeing this issue even for active connections and not > only for new ones. Disregard this comment, it seems like another bug.
I saw this issue on Windows 10 when connecting from our internal network out to eudemo.thinlinc.com with a regular wired connection.
The triggering condition seems to be that the final output from thinlinc-login and the command terminating happens at the same time. I.e. the ssh client will see both the output and the exit in the protocol stream without having to call select() to get more data. I have not figured out a good way to cause this buffering with an unmodified client. But patching ssh easily provokes. Simply put a "Sleep(500);" at the top of win32_select() and the bug will appear.
Should be fixed now. Tester probably needs to use a modified ssh client, or figure out some other way of provoking the bug.
With 4.9.0, I can reproduce the problem on my laptop with Windows 7, if connected via mobile phone tethering and downloading web pages during login. 2 out of 3 login fails. With 4.9.0post build 6045, it works correctly.