We have bug 5618 for supporting multi threaded decoding on the client side. This bug is for multi threaded encoding on the server side. TurboVNC has this: http://www.turbovnc.org/About/TigerVNC TurboVNC: Multi-threaded Tight encoding
This was a lot easier to do by copying a lot of the work done for bug 5618. I was able to get this up and running in a few hours: https://github.com/TigerVNC/tigervnc/tree/master/tests/results/multicore Results are however more mixed here. My i7 and a Xeon server we have show improvements of around 50%. But an Opteron server is regressing around 5% whilst burning through 50% more CPU. Need to investigate further what's happening.
Wrong URL. This is the proper one: https://github.com/CendioOssman/tigervnc/tree/multicore
Urgh. This is turning out to be extremely complex to measure. The good news is that it seems like it is a win on all systems. But it is very difficult to get good numbers stating so. a) perf is broken on RHEL 6 (which the opteron machine runs). It fails to count threads in many cases, giving absurdly low values. b) I am having serious doubts that rusage/task_clock is being counted correctly. It is much higher in the multi-core cases, but no other measurement is. So it seems like it is not actually doing anything and some kind of idle time is being included in that figure. IOW the CPU should be available for other things. Looking at cycles and instructions is probably better, but a) is causing issues there. c) The tests have problems ramping up the CPU speed. This is the primary cause of why the Opteron looks so bad in the tests. Forcing maximum speed makes the multi-core tests surpass the single-core ones every time. So it seems like we keep ending up on cores that are clocked down and it takes a while for them to ramp up. Whilst in the single-core case we stay on the same core and get it up to a nice, fast speed. This explains why the Opteron is having so much problems as it is a 32-core machine and it is very likely that we end up on unused cores there.
I restructured the queueing a bit to avoid stalls and it is better now, but not completely fixed. It is however at the point where there are no regressions compared to the old, single-core code. The github branch has been updated with the new code.
KasmVNC has added OpenMP to TigerVNC for this: https://github.com/kasmtech/KasmVNC/blob/ce78879132e679df898b05de491e3c14a52d8ad8/common/rfb/EncodeManager.cxx#L1201 Can't see much in the way of locking though, so I wonder how they handle shared resources like Tight's zlib state.