This is a tracker bug to document ideas and plans for how we improve performance related aspects of the VNC/RFB portion in ThinLinc. This includes: - Bandwidth usage - Latency sensitivity - CPU usage - Image quality, perceived or otherwise.
Historical things: - Evaluate ZRLE (bug 1215) - Evaluate deferred updates (bug 2931) - Evaluate Comparing Update Tracker (bug 4020) - Handle high latency networks (bug 2566) - SIMD accelerated JPEG (bug 2926) - Optimise JPEG entry coding (bug 3009) - Double buffering (bug 2930)
Future plans: - Clean up and move magic out of the Tight encoder (bug 4915 and bug 5026) - Record sessions: Needed so we can properly evaluate parts of the system with real world data. Such recordings need to losslessly record every change, with timing information. - Evaluate existing components more thoroughly/systematically Are we getting a good trade off for CPU/bandwidth for these: * Comparing Update Tracker * Deferred updates * Solid area detection * Sub-rect analysis - Evaluate CODECs Write some framework that allows us to get reliably numbers for what the trade off is between CPU, bandwidth and quality for our CODECs. We need to consider the effects the compression and quality settings has, as well as what type of data might fit each CODEC best. A suitable metric for quality might be SSIM. - Evaluate sub-rect classes Connected to the above, we need to investigate if we are classifying sub-rects in a way that makes sense with regard to real world data, and allows each CODEC to play to its strengths. We also should look at the optimal size for sub rects. Perhaps we should have a fixed grid of e.g. 64x64 pixels? That could also simplify update tracking and save CPU time there, at the cost of sending a bit more data. - Server side compression selection The server is much better at selecting which CODEC to use and how much compression to apply. - Automatic lossless refresh (bug 2928) - Partial updates (bug 4735) Allows us to multiplex other packets when we have a large update and limited bandwidth. Primarily it allows us to send out more congestion control packets, which normally stops working as the update size approaches and exceeds BDP. A protocol extension will be needed to fix this. - Socket buffer (bug 4734) Currently we'll hang the server if we fill the outgoing socket. We should have a memory buffer for when this happens to avoid screwing up timing sensitive things (like the congestion control). - High level compression improvements: * Track multiple copy operations (limited to one right now) * More aggressive search for solid rects. * Reduce colours before encoding * Image caching (bug 1814) * Multi threaded encoding (OpenMP?) - New CODECs: * Video CODECs (h.263/264/265, VP7/VP8/VP9, Theora, Dirac, ...) * PNG * Delta compared to previous frame * JPEG encoding separate from Tight, to encourage use - Hardware acceleration (GPU, more SIMD) (bug 4982) - Tweak deflateInit You can give it hints about what kind of data to expect, and we might not be using that fully today. - Dithering (bug 3893) - Lossless hint The server might do lots of lossy stuff to the data, not just JPEG, so we need a general way for the client to say that it wants pixel perfect data at all times. - Indicate stalled/slow connection (bug 4956)
DRC har done some tests with the x264 codec: http://www.turbovnc.org/About/H264
* Check the overhead of the JPEG JFIF header. There is a lot of data there which might make JPEG a bad choice for small blocks. Perhaps there are pre-defined quantisation and huffman tables that can be used in such cases?
http://bellard.org/bpg/
- Finish SIMD on ARM (it apparently still lacks significant portions)
- The palette code doesn't mask away irrelevant bits. This means that a single colour might get multiple entries and reduce the efficiency of the compression.
- Look at using ORC from GStreamer. It can probably optimise and generalise many heavy loops. Might also be useful in libjpeg-turbo.
- Look at Google's Snappy as an alternative to zlib based encodings.
(In reply to comment #6) > - Finish SIMD on ARM (it apparently still lacks significant portions) The important bits have been fixed now. There are still some missing pieces, but nothing that is normally used. Coverage for SIMD optimisation in libjpeg-turbo can be seen here: http://www.libjpeg-turbo.org/About/SIMDCoverage
More Google compression algorithms to look at: - Zofpli: https://github.com/google/zopfli - Brotli: http://google-opensource.blogspot.se/2015/09/introducing-brotli-new-compression.html
- FLIF (http://flif.info/)
Both Intel and Cloudflare have implemented optimised versions of zlib that are a drop in replacement: https://github.com/jtkukunas/zlib https://github.com/cloudflare/zlib Benchmarks: https://www.snellman.net/blog/archive/2014-08-04-comparison-of-intel-and-cloudflare-zlib-patches.html https://blog.cloudflare.com/cloudflare-fights-cancer/
- The HTML client uses compression level 9 even though we've seen little benefit above level 3. The native client and the server use 2 by default.
We might want to see if pixman has some routines that can be used to speed up parts of the pipeline.
More benchmarking of modern compression routines: http://richg42.blogspot.se/2016/01/zlib-in-serious-danger-of-becoming.html
Apple joins the fray with their own compression algorithm: https://github.com/lzfse/lzfse
Dropbox Lepton: https://blogs.dropbox.com/tech/2016/07/lepton-image-compression-saving-22-losslessly-from-images-at-15mbs/
Facebook Zstandard: https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/
(In reply to comment #2) > > Are we getting a good trade off for CPU/bandwidth for these: > > * Comparing Update Tracker > I did a quick test running youtube under various desktop environments and measured how much compression the CUT achieved: Gnome 3: 1:1.6 KDE, Xfce (without composite), MATE (with composite): 1:5 So it seems to do a lot for this use case for most environments. However Gnome 3 maintained a much lower frame rate, which could be also be a large factor.
> Guetzli is a JPEG encoder that aims for excellent compression density at high visual quality. > Guetzli-generated images are typically 20-30% smaller than images of equivalent quality > generated by libjpeg. Guetzli generates only sequential (nonprogressive) JPEGs due to faster > decompression speeds they offer. https://arstechnica.com/information-technology/2017/03/google-jpeg-guetzli-encoder-file-size/ https://github.com/google/guetzli/
And another compression format: https://github.com/inikep/lizard
(bug 4333 degraded from own bug to just an idea comment:) Most SIMD instructions require a specific memory alignment to work optimally. As discovered on bug 4328, we're not always doing this properly. It could be worth investigating if we can improve alignment and thereby improve performance.
Microsoft developed a new lossless CODEC as part of RemoteFX which could be interesting to have in VNC as well. Some overview here: https://msdn.microsoft.com/en-us/library/ff635792.aspx
There has popped up some alternatives to pako (for zlib handling in Web Access) that claim to be faster: https://github.com/photopea/UZIP.js https://github.com/101arrowz/fflate
The client bandwidth estimation got a tweak upstream, which should hopefully make it a bit more accurate: https://github.com/TigerVNC/tigervnc/commit/2354ce7404b8bacced3249e9c9787a12de307d2a
the comparing update tracker got tweaked upstream here: https://github.com/TigerVNC/tigervnc/pull/1031 It now spends a bit more CPU in order to reduce the changed area.
(In reply to Pierre Ossman from comment #4) > * Check the overhead of the JPEG JFIF header. There is a lot of data there > which might make JPEG a bad choice for small blocks. Perhaps there are > pre-defined quantisation and huffman tables that can be used in such cases? Apparently RealVNC's JPEG encoding has some hacks in this area where some sections of the header are reused between rects. At least according to this PR to noVNC: https://github.com/novnc/noVNC/pull/1609 The downside to the approach used there though is that it makes it more difficult to decode several JPEG rects concurrently.
Most of the new image codec groups seems to have joined forces in favour of a new standard called JPEG XL. The main contributors are Google with their various formats, and the FLIF people. Performance is claimed to be equivalent to libjpeg-turbo, but with massive gains in quality per bit. Browser support is experimental in both chrome and Firefox, but many of the big players (Google, Facebook, Cloud flare) are backing this format. It is in the later stages of ISO standardisation and the format seems to be fixed at this point.
(In reply to Pierre Ossman from comment #13) > Both Intel and Cloudflare have implemented optimised versions of zlib that > are a drop in replacement: > > https://github.com/jtkukunas/zlib > https://github.com/cloudflare/zlib > There is now a third option, that seems to be some fork/merge of the above: https://github.com/zlib-ng/zlib-ng Fedora has apparently decided to switch to this library: https://fedoraproject.org/wiki/Changes/ZlibNGTransition
Google is apparently rolling out a new JPEG library that they claim is as fast as libjpeg-turbo but with better results: https://opensource.googleblog.com/2024/04/introducing-jpegli-new-jpeg-coding-library.html DRC is sceptical of this claim: https://groups.google.com/d/msgid/libjpeg-turbo-users/CF1DAF1F-93FB-4742-B97B-0D1CAC6D6D0D%40virtualgl.org