Bug 5106 (performance) - improve VNC performance (latency, bandwidth, CPU usage, quality, ...)
Summary: improve VNC performance (latency, bandwidth, CPU usage, quality, ...)
Status: NEW
Alias: performance
Product: ThinLinc
Classification: Unclassified
Component: VNC (show other bugs)
Version: trunk
Hardware: PC Unknown
: P2 Normal
Target Milestone: MediumPrio
Assignee: Pierre Ossman
URL:
Keywords:
Depends on: 1814 3455 3770 3893 4956 4982 5648 5752 7255 7800 7958 45 1215 2566 2926 2928 2930 2931 3009 4020 4333 4734 4735 4915 5026 5242 5618 5719 5748 5812 7139 7463
Blocks:
  Show dependency treegraph
 
Reported: 2014-04-22 12:45 CEST by Pierre Ossman
Modified: 2024-04-04 16:12 CEST (History)
2 users (show)

See Also:
Acceptance Criteria:


Attachments

Description Pierre Ossman cendio 2014-04-22 12:45:40 CEST
This is a tracker bug to document ideas and plans for how we improve performance related aspects of the VNC/RFB portion in ThinLinc. This includes:

 - Bandwidth usage

 - Latency sensitivity

 - CPU usage

 - Image quality, perceived or otherwise.
Comment 1 Pierre Ossman cendio 2014-04-22 13:00:32 CEST
Historical things:

 - Evaluate ZRLE (bug 1215)

 - Evaluate deferred updates (bug 2931)

 - Evaluate Comparing Update Tracker (bug 4020)

 - Handle high latency networks (bug 2566)

 - SIMD accelerated JPEG (bug 2926)

 - Optimise JPEG entry coding (bug 3009)

 - Double buffering (bug 2930)
Comment 2 Pierre Ossman cendio 2014-04-22 14:01:10 CEST
Future plans:

- Clean up and move magic out of the Tight encoder (bug 4915 and bug 5026)

- Record sessions:

Needed so we can properly evaluate parts of the system with real world data. Such recordings need to losslessly record every change, with timing information.

- Evaluate existing components more thoroughly/systematically

Are we getting a good trade off for CPU/bandwidth for these:

 * Comparing Update Tracker

 * Deferred updates

 * Solid area detection

 * Sub-rect analysis

- Evaluate CODECs

Write some framework that allows us to get reliably numbers for what the trade off is between CPU, bandwidth and quality for our CODECs. We need to consider the effects the compression and quality settings has, as well as what type of data might fit each CODEC best.

A suitable metric for quality might be SSIM.

- Evaluate sub-rect classes

Connected to the above, we need to investigate if we are classifying sub-rects in a way that makes sense with regard to real world data, and allows each CODEC to play to its strengths.

We also should look at the optimal size for sub rects. Perhaps we should have a fixed grid of e.g. 64x64 pixels? That could also simplify update tracking and save CPU time there, at the cost of sending a bit more data.

- Server side compression selection

The server is much better at selecting which CODEC to use and how much compression to apply.

- Automatic lossless refresh (bug 2928)

- Partial updates (bug 4735)

Allows us to multiplex other packets when we have a large update and limited bandwidth. Primarily it allows us to send out more congestion control packets, which normally stops working as the update size approaches and exceeds BDP. A protocol extension will be needed to fix this.

- Socket buffer (bug 4734)

Currently we'll hang the server if we fill the outgoing socket. We should have a memory buffer for when this happens to avoid screwing up timing sensitive things (like the congestion control).

- High level compression improvements:

 * Track multiple copy operations (limited to one right now)

 * More aggressive search for solid rects.

 * Reduce colours before encoding

 * Image caching (bug 1814)

 * Multi threaded encoding (OpenMP?)

- New CODECs:

 * Video CODECs (h.263/264/265, VP7/VP8/VP9, Theora, Dirac, ...)

 * PNG

 * Delta compared to previous frame

 * JPEG encoding separate from Tight, to encourage use

- Hardware acceleration (GPU, more SIMD) (bug 4982)

- Tweak deflateInit

You can give it hints about what kind of data to expect, and we might not be using that fully today.

- Dithering (bug 3893)

- Lossless hint

The server might do lots of lossy stuff to the data, not just JPEG, so we need a general way for the client to say that it wants pixel perfect data at all times.

- Indicate stalled/slow connection (bug 4956)
Comment 3 Pierre Ossman cendio 2014-07-10 13:10:04 CEST
DRC har done some tests with the x264 codec:

http://www.turbovnc.org/About/H264
Comment 4 Pierre Ossman cendio 2014-12-03 15:04:04 CET
* Check the overhead of the JPEG JFIF header. There is a lot of data there which might make JPEG a bad choice for small blocks. Perhaps there are pre-defined quantisation and huffman tables that can be used in such cases?
Comment 5 Aaron Sowry cendio 2014-12-14 10:21:25 CET
http://bellard.org/bpg/
Comment 6 Pierre Ossman cendio 2015-01-22 14:27:59 CET
- Finish SIMD on ARM (it apparently still lacks significant portions)
Comment 7 Pierre Ossman cendio 2015-02-12 14:18:48 CET
 - The palette code doesn't mask away irrelevant bits. This means that a single colour might get multiple entries and reduce the efficiency of the compression.
Comment 8 Pierre Ossman cendio 2015-02-12 15:16:42 CET
 - Look at using ORC from GStreamer. It can probably optimise and generalise many heavy loops. Might also be useful in libjpeg-turbo.
Comment 9 Pierre Ossman cendio 2015-04-24 09:36:52 CEST
 - Look at Google's Snappy as an alternative to zlib based encodings.
Comment 10 Pierre Ossman cendio 2015-07-30 10:22:47 CEST
(In reply to comment #6)
> - Finish SIMD on ARM (it apparently still lacks significant portions)

The important bits have been fixed now. There are still some missing pieces, but nothing that is normally used. Coverage for SIMD optimisation in libjpeg-turbo can be seen here:

http://www.libjpeg-turbo.org/About/SIMDCoverage
Comment 11 Pierre Ossman cendio 2015-09-23 12:42:05 CEST
More Google compression algorithms to look at:

 - Zofpli: https://github.com/google/zopfli

 - Brotli: http://google-opensource.blogspot.se/2015/09/introducing-brotli-new-compression.html
Comment 12 Pierre Ossman cendio 2015-10-05 11:46:59 CEST
 - FLIF (http://flif.info/)
Comment 13 Pierre Ossman cendio 2015-12-01 08:58:58 CET
Both Intel and Cloudflare have implemented optimised versions of zlib that are a drop in replacement:

https://github.com/jtkukunas/zlib
https://github.com/cloudflare/zlib

Benchmarks:

https://www.snellman.net/blog/archive/2014-08-04-comparison-of-intel-and-cloudflare-zlib-patches.html
https://blog.cloudflare.com/cloudflare-fights-cancer/
Comment 14 Pierre Ossman cendio 2015-12-03 14:36:39 CET
 - The HTML client uses compression level 9 even though we've seen little benefit above level 3. The native client and the server use 2 by default.
Comment 15 Pierre Ossman cendio 2015-12-11 13:08:10 CET
We might want to see if pixman has some routines that can be used to speed up parts of the pipeline.
Comment 16 Pierre Ossman cendio 2016-01-19 12:50:42 CET
More benchmarking of modern compression routines:

http://richg42.blogspot.se/2016/01/zlib-in-serious-danger-of-becoming.html
Comment 17 Pierre Ossman cendio 2016-07-07 15:25:25 CEST
Apple joins the fray with their own compression algorithm:

https://github.com/lzfse/lzfse
Comment 20 Pierre Ossman cendio 2016-11-08 16:40:19 CET
(In reply to comment #2)
> 
> Are we getting a good trade off for CPU/bandwidth for these:
> 
>  * Comparing Update Tracker
> 

I did a quick test running youtube under various desktop environments and measured how much compression the CUT achieved:

 Gnome 3: 1:1.6
 KDE, Xfce (without composite), MATE (with composite): 1:5

So it seems to do a lot for this use case for most environments. However Gnome 3 maintained a much lower frame rate, which could be also be a large factor.
Comment 21 Karl Mikaelsson cendio 2017-03-26 11:51:39 CEST
> Guetzli is a JPEG encoder that aims for excellent compression density at high visual quality.
> Guetzli-generated images are typically 20-30% smaller than images of equivalent quality 
> generated by libjpeg. Guetzli generates only sequential (nonprogressive) JPEGs due to faster 
> decompression speeds they offer.

https://arstechnica.com/information-technology/2017/03/google-jpeg-guetzli-encoder-file-size/

https://github.com/google/guetzli/
Comment 22 Pierre Ossman cendio 2017-11-13 12:04:36 CET
And another compression format:

https://github.com/inikep/lizard
Comment 23 Pierre Ossman cendio 2017-12-05 14:06:24 CET
(bug 4333 degraded from own bug to just an idea comment:)

Most SIMD instructions require a specific memory alignment to work optimally.
As discovered on bug 4328, we're not always doing this properly. It could be
worth investigating if we can improve alignment and thereby improve
performance.
Comment 25 Pierre Ossman cendio 2018-11-19 13:31:17 CET
Microsoft developed a new lossless CODEC as part of RemoteFX which could be interesting to have in VNC as well. Some overview here:

https://msdn.microsoft.com/en-us/library/ff635792.aspx
Comment 27 Pierre Ossman cendio 2021-07-21 09:28:13 CEST
There has popped up some alternatives to pako (for zlib handling in Web Access) that claim to be faster:

https://github.com/photopea/UZIP.js
https://github.com/101arrowz/fflate
Comment 28 Pierre Ossman cendio 2021-11-17 12:38:34 CET
The client bandwidth estimation got a tweak upstream, which should hopefully make it a bit more accurate:

https://github.com/TigerVNC/tigervnc/commit/2354ce7404b8bacced3249e9c9787a12de307d2a
Comment 29 Pierre Ossman cendio 2021-11-17 12:48:33 CET
the comparing update tracker got tweaked upstream here:

https://github.com/TigerVNC/tigervnc/pull/1031

It now spends a bit more CPU in order to reduce the changed area.
Comment 30 Pierre Ossman cendio 2021-11-22 09:58:09 CET
(In reply to Pierre Ossman from comment #4)
> * Check the overhead of the JPEG JFIF header. There is a lot of data there
> which might make JPEG a bad choice for small blocks. Perhaps there are
> pre-defined quantisation and huffman tables that can be used in such cases?

Apparently RealVNC's JPEG encoding has some hacks in this area where some sections of the header are reused between rects. At least according to this PR to noVNC:

https://github.com/novnc/noVNC/pull/1609

The downside to the approach used there though is that it makes it more difficult to decode several JPEG rects concurrently.
Comment 32 Pierre Ossman cendio 2022-01-19 19:19:43 CET
Most of the new image codec groups seems to have joined forces in favour of a new standard called JPEG XL. The main contributors are Google with their various formats, and the FLIF people. Performance is claimed to be equivalent to libjpeg-turbo, but with massive gains in quality per bit.

Browser support is experimental in both chrome and Firefox, but many of the big players (Google, Facebook, Cloud flare) are backing this format.

It is in the later stages of ISO standardisation and the format seems to be fixed at this point.
Comment 34 Pierre Ossman cendio 2024-03-01 10:01:38 CET
(In reply to Pierre Ossman from comment #13)
> Both Intel and Cloudflare have implemented optimised versions of zlib that
> are a drop in replacement:
> 
> https://github.com/jtkukunas/zlib
> https://github.com/cloudflare/zlib
> 

There is now a third option, that seems to be some fork/merge of the above:

https://github.com/zlib-ng/zlib-ng

Fedora has apparently decided to switch to this library:

https://fedoraproject.org/wiki/Changes/ZlibNGTransition
Comment 35 Pierre Ossman cendio 2024-04-04 16:12:09 CEST
Google is apparently rolling out a new JPEG library that they claim is as fast as libjpeg-turbo but with better results:

https://opensource.googleblog.com/2024/04/introducing-jpegli-new-jpeg-coding-library.html

DRC is sceptical of this claim:

https://groups.google.com/d/msgid/libjpeg-turbo-users/CF1DAF1F-93FB-4742-B97B-0D1CAC6D6D0D%40virtualgl.org

Note You need to log in before you can comment on or make changes to this bug.