5648 – multi-threaded VNC rect encoding (Xvnc side)

Bug 5648 - multi-threaded VNC rect encoding (Xvnc side)

Summary: multi-threaded VNC rect encoding (Xvnc side)

Status:	NEW

Alias:	None

Product:	ThinLinc
Classification:	Unclassified
Component:	VNC (show other bugs)
Version:	trunk
Hardware:	PC Unknown

Importance:	P2 Normal
Target Milestone:	MediumPrio
Assignee:	Pierre Ossman

URL:
Keywords:

Depends on:
Blocks:	performance
	Show dependency tree / graph

Reported:	2015-09-23 11:00 CEST by Peter Åstrand
Modified:	2022-03-29 08:51 CEST (History)
CC List:	1 user (show)

See Also:
Acceptance Criteria:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Peter Åstrand cendio

2015-09-23 11:00:35 CEST

We have bug 5618 for supporting multi threaded decoding on the client side. This bug is for multi threaded encoding on the server side. TurboVNC has this:

http://www.turbovnc.org/About/TigerVNC

TurboVNC:
Multi-threaded Tight encoding

Comment 1 Pierre Ossman cendio

2015-12-08 16:56:34 CET

This was a lot easier to do by copying a lot of the work done for bug 5618. I was able to get this up and running in a few hours:

https://github.com/TigerVNC/tigervnc/tree/master/tests/results/multicore

Results are however more mixed here. My i7 and a Xeon server we have show improvements of around 50%. But an Opteron server is regressing around 5% whilst burning through 50% more CPU. Need to investigate further what's happening.

Comment 2 Pierre Ossman cendio

2015-12-09 09:12:53 CET

Wrong URL. This is the proper one:

https://github.com/CendioOssman/tigervnc/tree/multicore

Comment 3 Pierre Ossman cendio

2015-12-22 16:08:22 CET

Urgh. This is turning out to be extremely complex to measure. The good news is that it seems like it is a win on all systems. But it is very difficult to get good numbers stating so.

 a) perf is broken on RHEL 6 (which the opteron machine runs). It fails to count threads in many cases, giving absurdly low values.

 b) I am having serious doubts that rusage/task_clock is being counted correctly. It is much higher in the multi-core cases, but no other measurement is. So it seems like it is not actually doing anything and some kind of idle time is being included in that figure. IOW the CPU should be available for other things. Looking at cycles and instructions is probably better, but a) is causing issues there.

 c) The tests have problems ramping up the CPU speed. This is the primary cause of why the Opteron looks so bad in the tests. Forcing maximum speed makes the multi-core tests surpass the single-core ones every time. So it seems like we keep ending up on cores that are clocked down and it takes a while for them to ramp up. Whilst in the single-core case we stay on the same core and get it up to a nice, fast speed. This explains why the Opteron is having so much problems as it is a 32-core machine and it is very likely that we end up on unused cores there.

Comment 4 Pierre Ossman cendio

2015-12-23 15:29:21 CET

I restructured the queueing a bit to avoid stalls and it is better now, but not completely fixed. It is however at the point where there are no regressions compared to the old, single-core code.

The github branch has been updated with the new code.

Comment 5 Pierre Ossman cendio

2022-03-29 08:51:03 CEST

KasmVNC has added OpenMP to TigerVNC for this:

https://github.com/kasmtech/KasmVNC/blob/ce78879132e679df898b05de491e3c14a52d8ad8/common/rfb/EncodeManager.cxx#L1201

Can't see much in the way of locking though, so I wonder how they handle shared resources like Tight's zlib state.

Note You need to log in before you can comment on or make changes to this bug.