I was playing around with bug 5648 and noticed that most of the time was not spent encoding, but rather a function called bits_image_fetch_bilinear_affine_pad_x8r8g8b8(). This is part of pixman. Our pixman is a few years behind, so I tried upgrading it to the latest. And there were massive improvements. The relevant function fell from being 50% of the CPU usage to 15%. Encoding is now back on top as the main CPU bottleneck. There were no problems playing youtube in fullscreen at 1080p when both this upgrade and bug 5648 were in place, whilst before there were noticeable frame drops.
For reference, the test case was Firefox 42.0 on my Fedora 23 workstation (i7-3770) and youtube in full screen using Firefox' video player.
I did some benchmarking using this tool I found here: http://cgit.freedesktop.org/~aplattner/xrenderbenchmark/ Unfortunately it showed no significant changes between the old and new code. But we must conclude that this test is then insufficient as we saw noticeable improvements with Firefox. So I found this little nice tirade on why all synthetic benchmarks suck: https://cworth.org/intel/performance_measurement/ And it also points to a tracing tool in cairo that can replay graphical operations from real application use. So let's see what that gives us.
Urgh. That didn't really show much either. Need to make sure I'm not doing the tests incorrectly. Could also try getting a trace from the firefox usage we saw was improved.
No dice. Firefox' rendering of video is not showing up in cairo traces.
Did some more digging using perf and gdb. Firefox is doing two CPU heavy operations; upscaling the video to the target size, and compositing it in the browser offscreen pixmap. The second of this is handled by sse2_composite_src_x888_8888 and was already present in the old version of pixman. The first step however was only partially accelerated in the old pixman, and Firefox was using things in a way that was not accelerated. The new modes that have been added are bilinear scaling with repeat modes active. The existing code could only handle scaling with no repeat active. There has also been some acceleration for adding a constant to all pixels of a buffer, and bilinear scaling with a simple mask. So the quick summary is that many more forms of scaling are now faster. I will try to get a test of exactly how much faster.
(In reply to comment #5) > > The first step however was only partially accelerated in the old pixman, and > Firefox was using things in a way that was not accelerated. The new modes that > have been added are bilinear scaling with repeat modes active. The existing > code could only handle scaling with no repeat active. > Only partially correct. The old code handled different repeat modes as well. What it didn't handle was format conversion from x888 to 8888. So it's becoming more and more of a corner case (although Firefox is a pretty common use case). I've modified xrenderbenchmark to replicate these conditions. With the old pixman I get: > $ DISPLAY=:2 ./xrenderbenchmark -ops SRC -tests filter -time 20 -argb > xrenderbenchmark version 1.0.2-agp1 > X Server from: The X.Org Foundation, Release: 11400000 > Xrender version: 0.11 > --------------------------------------------- > Test: Src > Transformation/Bilinear filter...................96600 frames in 20.002 seconds = 4829.511 FPS And perf shows this function being used: bits_image_fetch_bilinear_affine_pad_x8r8g8b8 An upgraded pixman shows: > $ DISPLAY=:2 ./xrenderbenchmark -ops SRC -tests filter -time 20 -argb > xrenderbenchmark version 1.0.2-agp1 > X Server from: The X.Org Foundation, Release: 11400000 > Xrender version: 0.11 > --------------------------------------------- > Test: Src > Transformation/Bilinear filter...................595650 frames in 20.0009 seconds = 29781.218 FPS And perf now shows this function instead: fast_composite_scaled_bilinear_sse2_x888_8888_pad_SRC So about a 500% increase. Not shabby. :)
There might also be more, smaller improvements in pixman. There has been 208 commits since our last update.
I can't find any regressions. I have tested build 4996 on Fedora 23 using a variety of different media players and browsers in the session. I have also compared the performance to the old code playing a video in firefox and verified the performance improvements. Closing.