Bug 7285 - ssh hangs on connect on Debian 10 (Buster)
Summary: ssh hangs on connect on Debian 10 (Buster)
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: Client (show other bugs)
Version: 1.3.1
Hardware: PC Unknown
: P2 Normal
Target Milestone: 4.10.0
Assignee: Pierre Ossman
URL:
Keywords: derfian_tester, relnotes
Depends on:
Blocks:
 
Reported: 2018-11-28 15:30 CET by Pierre Ossman
Modified: 2018-12-18 16:31 CET (History)
1 user (show)

See Also:
Acceptance Criteria:


Attachments
test program (228 bytes, text/x-python)
2018-11-29 09:56 CET, Pierre Ossman
Details

Description Pierre Ossman cendio 2018-11-28 15:30:09 CET
We got a report that the ThinLinc client hangs on connect when used on Debian 10 (Buster). Specifically it is ssh that locks up somewhere early in the process, and this is the last lines in the log:

> 2018-11-16T21:58:52: ssh[E]: CONFIRM HOST KEY: xxx
> 2018-11-16T21:58:53: User accepted the new host key.
> 2018-11-16T21:58:53: Storing new host key for xxx.
Comment 1 Pierre Ossman cendio 2018-11-28 15:34:31 CET
After some digging it turns out that ssh tries to connect to a UNIX socket with an address of just "\0":s. This is apparently a valid address, but on most systems there isn't anything listening on it. However on the customer system "irqbalance" is listening on this address.

We don't know why ssh tries to connect here yet, but it has something to do with SSH agent support.

We also tried reproducing it here but initially failed. It turned out we did not get irqbalance installed by default. As soon as we installed it things broke for us as well. We don't know why irqbalance is only installed in some cases.
Comment 3 Pierre Ossman cendio 2018-11-28 15:38:08 CET
It was also reported on our mailing list:

http://lists.cendio.se/pipermail/thinlinc-technical/2018-November/005887.html
Comment 4 Pierre Ossman cendio 2018-11-28 15:39:37 CET
It seems like a bug in irqbalance that was introduced here:

https://github.com/Irqbalance/irqbalance/commit/19c25ddc5a13cf0b993cdb0edac0eee80143be34

So any distribution that uses 1.5.0 or newer will be affected.

I've filed a bug with them here:

https://github.com/Irqbalance/irqbalance/issues/85

We still need to fix ssh so it doesn't attempt to connect to that bogus address though.
Comment 5 Pierre Ossman cendio 2018-11-29 09:55:07 CET
It looks like this was caused by r23835 for bug 3430. In that commit we tried to make sure our ssh wouldn't use any random ssh agent, and we did so by setting  $SSH_AUTH_SOCK to "". Apparently ssh does not interpret "" as "no agent" and never has. Instead it tries to connect to "", which ends up with the address of just "\0":s.

It looks like what we need to do is completely remove $SSH_AUTH_SOCK, not just empty it.
Comment 6 Pierre Ossman cendio 2018-11-29 09:56:01 CET
Created attachment 899 [details]
test program

Test program that provokes the bug. Simply run this on your client and tlclient will hang on connect.
Comment 7 Pierre Ossman cendio 2018-11-29 13:51:22 CET
The behaviour is still present in latest OpenSSH, so I reported a bug here:

https://bugzilla.mindrot.org/show_bug.cgi?id=2936
Comment 10 Pierre Ossman cendio 2018-11-30 14:19:03 CET
The changed code affect passwords and public keys, so I tested password and both our public key methods ("regular" and smart card) on Linux, Windows and macOS.

I could no longer provoke the bug on Linux, and I could not see any regressions on any platform.
Comment 11 Karl Mikaelsson cendio 2018-12-18 16:31:35 CET
Verified that thinlinc-client_4.9.0post-5988_amd64 does work where thinlinc-client_4.9.0-5775_amd64 failed to start on Debian Buster.

The release notes are fine.

Note You need to log in before you can comment on or make changes to this bug.