Bug 5902 - audio sometimes refuses to work
Summary: audio sometimes refuses to work
Status: NEW
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: Sound (show other bugs)
Version: pre-1.0
Hardware: PC Unknown
: P2 Normal
Target Milestone: MediumPrio
Assignee: Pierre Ossman
URL:
Keywords: upstream
Depends on:
Blocks:
 
Reported: 2016-05-19 16:01 CEST by Pierre Ossman
Modified: 2016-06-08 12:21 CEST (History)
1 user (show)

See Also:
Acceptance Criteria:


Attachments

Description Pierre Ossman cendio 2016-05-19 16:01:02 CEST
I have been seeing an intermittent problem for some time that audio might refuse to work in a session. It's always been just as audio should start to play, and only on my Fedora 23 workstation. Sometimes it resolves itself, and sometimes I have to kill the session. No errors in any log.

The problem is also related somehow to the client's PulseAudio server, as bypassing the session server does not help. Local sound on the workstation still works though.

Unfortunately it has been very difficult to debug as it happens about once every two weeks. Until this week, where I found a test case that can reproduce it:

 - Ubuntu 14.04
 - Unity
 - Super Tux Kart

This seems to trigger the bug nine times out of ten. I have not tried to recreate the system from scratch to see how fragile the setup is.
Comment 1 Pierre Ossman cendio 2016-05-19 16:01:50 CEST
Discussion started on the upstream mailing list:

https://lists.freedesktop.org/archives/pulseaudio-discuss/2016-May/026240.html
Comment 2 Pierre Ossman cendio 2016-05-19 16:24:58 CEST
Problem identified. It is caused by a reduction in latency (buffer size) and all related parameters. The scenario is this:

 1. Large latency, large buffer, large target fill, large minimum request. Silence in queue (i.e. buffer is full).

 2. Buffer drains slightly, making it fall below target fill. It is however still below the minimum request, so nothing is sent to the client.

 3. The client requests a reduced latency, buffer is reduced, target fill is reduced, minimum request is reduced. The buffer now greatly exceeds target fill as it was almost up to the previous target fill level. This means that the server will not be asking the client for more data for a while.

 4. Some time later we've drained most of the excess and are almost back down to the target fill level. However the data requested in 2 is sufficiently large that we never fall back down below target fill. Hence we never start requesting for more data. And we already decided in 2 not to send a request for the first portion.

So the fundamental problem here is that requesting data from the client can be triggered not only by the buffer emptying, but also by parameters changing. And specifically changes to the minimum request size is not handled properly.

In theory this can be caused by any program that triggers a massive reduction in buffer latency.
Comment 3 Pierre Ossman cendio 2016-05-19 16:49:05 CEST
Sent suggested patches to upstream:

https://lists.freedesktop.org/archives/pulseaudio-discuss/2016-May/026248.html

However this only fixes the problem long term as the bug is in the system's PulseAudio, not ours. It's not obvious if we can do a workaround until then.
Comment 4 Pierre Ossman cendio 2016-05-19 16:58:04 CEST
The fix seems to provoke some glitches in the audio though. Not sure if it means the patch is bad, or if it simply exposes bugs in the tunnel module. I can see some chatter about buffer sizes in the log, but no underruns.
Comment 5 Pierre Ossman cendio 2016-05-20 11:14:46 CEST
I turned up logging on the other two servers (system and session), and unfortunately nothing logged from those either when the sounds is crackling.

A large glitch was however noticed by the system server, which promptly increased its minimum latency to 4.0 ms. However our tunnel module fought back a bit and it took a few turns until it got the latency up high enough.

There is definitely more that can be done here, but I'm moving it to a separate bug. Opened bug 5903 for improving the latency handling.

The initial crackling is still a mystery though. Perhaps we should just start at a few ms minimum rather than zero?

Note You need to log in before you can comment on or make changes to this bug.