Bug 4417 - Upgrade our X server
Summary: Upgrade our X server
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: VNC (show other bugs)
Version: 3.4.0
Hardware: PC Unknown
: P2 Normal
Target Milestone: 4.1.0
Assignee: Pierre Ossman
URL:
Keywords: hean01_tester
: 3616 4649 (view as bug list)
Depends on: 3074
Blocks: 2968 3836 4262 4416
  Show dependency treegraph
 
Reported: 2012-10-05 14:18 CEST by Aaron Sowry
Modified: 2013-05-31 13:01 CEST (History)
1 user (show)

See Also:
Acceptance Criteria:


Attachments

Description Aaron Sowry cendio 2012-10-05 14:18:13 CEST
I've managed to get Gnome Shell working under the TigerVNC shipped in Fedora 17:

$ rpm -q xorg-x11-server-Xorg 
xorg-x11-server-Xorg-1.12.3-2.fc17.x86_64

It's slow, but it's a start - Ubuntu doesn't seem to build their VNC server with the composite extension enabled, so that's a lost cause. Upgrading our X server should allow us to run at least Gnome Shell, and possibly Unity, via ThinLinc.
Comment 1 Aaron Sowry cendio 2012-10-05 14:45:22 CEST
Perhaps it's worth noting that I lied a little bit; Xvnc segfaults after a
short time running Gnome Shell. Posting the traceback here for reference.

Backtrace:
0: /usr/bin/Xvnc (xorg_backtrace+0x36) [0x591236]
1: /usr/bin/Xvnc (0x400000+0x194c99) [0x594c99]
2: /lib64/libpthread.so.0 (0x3083c00000+0xefe0) [0x3083c0efe0]
3: /usr/bin/Xvnc (_ZN11InputDevice8keyEventEjb+0x46) [0x506ca6]
4: /usr/bin/Xvnc (_ZN3rfb16VNCSConnectionST8keyEventEjb+0x100) [0x525f70]
5: /usr/bin/Xvnc (_ZN3rfb16VNCSConnectionST15processMessagesEv+0x38) [0x5252d8]
6: /usr/bin/Xvnc (_ZN14XserverDesktop13wakeupHandlerEP6fd_seti+0x177)
[0x505b07]
7: /usr/bin/Xvnc (0x400000+0xfc49c) [0x4fc49c]
8: /usr/bin/Xvnc (WakeupHandler+0x6b) [0x54842b]
9: /usr/bin/Xvnc (WaitForSomething+0x1a6) [0x58ed46]
10: /usr/bin/Xvnc (Dispatch+0xa1) [0x544211]
11: /usr/bin/Xvnc (main+0x35a) [0x441c5a]
12: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x3083821735]
13: /usr/bin/Xvnc (0x400000+0x4309d) [0x44309d]

Segmentation fault at address 0xb8

Fatal server error:
Caught signal 11 (Segmentation fault). Server aborting
Comment 2 Aaron Sowry cendio 2012-10-05 15:14:17 CEST
(In reply to comment #1)
> Perhaps it's worth noting that I lied a little bit; Xvnc segfaults after a
> short time running Gnome Shell.

https://bugzilla.redhat.com/show_bug.cgi?id=863431
Comment 3 Pierre Ossman cendio 2012-11-20 16:21:17 CET
Should be easy since Fedora has already made sure Xvnc works with the latest Xorg. It's still a lot of menial work upgrading all the dependencies though.

We cannot move all the dependencies into the build system because we have an ancient libX11 on Solaris (which messes with the builds of other packages). But maybe we can move some of the packages in there.
Comment 4 Pierre Ossman cendio 2013-03-22 16:10:53 CET
I've got a working 1.14 in my working copy now. Going back to look at XKB for a while.
Comment 5 Pierre Ossman cendio 2013-04-04 16:44:50 CEST
Vendor drop done in r26951. A lot of cleanup and integration work remains though.
Comment 6 Pierre Ossman cendio 2013-04-05 23:56:43 CEST
*** Bug 3616 has been marked as a duplicate of this bug. ***
Comment 7 Pierre Ossman cendio 2013-04-09 18:39:27 CEST
As part of this I discovered that we weren't getting proper log output. So now I've added -verbose to our Xvnc (stolen from Xorg) and set xserver_args to "-verbose 3", which is Xorg's default.

Tester should verify that we get some X log lines (e.g. "(II)...") in xinit.log.
Comment 8 Pierre Ossman cendio 2013-04-10 17:01:25 CEST
The nightly build now has Xorg 1.14 and Mesa 9.1.1. New scripts and integration for XKB will be handled on bug 3074.

Tester should verify as much X11 functionality as possible. GLX also need to be tested, both direct and indirect. glean/piglet/etc. are probably suitable.
Comment 9 Pierre Ossman cendio 2013-04-23 14:40:52 CEST
Broken on Solaris:

ld.so.1: Xvnc: allvarligt: libXau.so.6: öppningen avbröts: Det finns ingen fil eller katalog med det namnet
ld.so.1: Xvnc: allvarligt: libfreetype.so.6: öppningen avbröts: Det finns ingen fil eller katalog med det namnet
Comment 10 Pierre Ossman cendio 2013-04-23 15:03:32 CEST
(In reply to comment #9)
> Broken on Solaris:
> 
> ld.so.1: Xvnc: allvarligt: libXau.so.6: öppningen avbröts: Det finns ingen fil
> eller katalog med det namnet
> ld.so.1: Xvnc: allvarligt: libfreetype.so.6: öppningen avbröts: Det finns ingen
> fil eller katalog med det namnet

Fixed in r27149.
Comment 11 Peter Åstrand cendio 2013-04-30 14:22:37 CEST
With the latest nightly build, I do not get any mouse pointer when starting a new session, ie in the profile selection dialog. After downgrading to http://www.cendio.com/downloads/updates/b4547/ it works. 

Perhaps this has something to do with the fact that I'm using some fancy high color cursors. I activated this a long time ago. I have no idea where this setting is stored, but it's probably in my home dir somewhere.
Comment 12 Pierre Ossman cendio 2013-05-02 09:57:52 CEST
Libreoffice is exhibiting some redraw problem with its GTK labels (menus, dialogs, etc.) on at least Fedora 18 and Ubuntu 12.04.
Comment 13 Pierre Ossman cendio 2013-05-07 13:03:35 CEST
(In reply to comment #12)
> Libreoffice is exhibiting some redraw problem with its GTK labels (menus,
> dialogs, etc.) on at least Fedora 18 and Ubuntu 12.04.

This seems to be a somewhat common error:

http://en.libreofficeforum.org/node/3319
http://aptosid.com/index.php?name=PNphpBB2&file=viewtopic&t=2254
http://ask.libreoffice.org/en/question/1429/libreoffice-3512-menu-problem/
http://www.oooforum.org/forum/viewtopic.phtml?t=124957
Comment 14 Pierre Ossman cendio 2013-05-07 13:16:19 CEST
(In reply to comment #12)
> Libreoffice is exhibiting some redraw problem with its GTK labels (menus,
> dialogs, etc.) on at least Fedora 18 and Ubuntu 12.04.

Upstream report:

https://bugs.freedesktop.org/show_bug.cgi?id=57814
Comment 15 Pierre Ossman cendio 2013-05-07 13:27:08 CEST
Seems to be a VNC thing. Taking a screen shot shows all the entries in place.
Comment 16 Pierre Ossman cendio 2013-05-07 16:28:26 CEST
(In reply to comment #15)
> Seems to be a VNC thing. Taking a screen shot shows all the entries in place.

Fixed in r27336.
Comment 17 Pierre Ossman cendio 2013-05-08 13:28:09 CEST
(In reply to comment #11)
> With the latest nightly build, I do not get any mouse pointer when starting a
> new session, ie in the profile selection dialog. After downgrading to
> http://www.cendio.com/downloads/updates/b4547/ it works. 
> 
> Perhaps this has something to do with the fact that I'm using some fancy high
> color cursors. I activated this a long time ago. I have no idea where this
> setting is stored, but it's probably in my home dir somewhere.

The problem was animated cursors. It has now been fixed in r27349.
Comment 18 Henrik Andersson cendio 2013-05-16 12:12:43 CEST
(In reply to comment #16)
> (In reply to comment #15)
> > Seems to be a VNC thing. Taking a screen shot shows all the entries in place.
> 
> Fixed in r27336.

Verified on SLED 11 using build 3945
Comment 19 Henrik Andersson cendio 2013-05-17 09:19:13 CEST
Run the freedesktop xts testsuit using ThinLinc Build 3949 on Ubuntu 12.04.

Xvnc didnt crash an xts completed successfully with following result:

========================
145 of 1007 tests failed
(275 tests were not run)
========================
Comment 20 Peter Åstrand cendio 2013-05-20 13:41:52 CEST
Found when testing Bug 2968. 

On RHEL6, the xserver crashes when trying to run:

LIBGL_ALWAYS_INDIRECT=1 compiz --replace

Core file here: 
/home/astrand/tmp/core.5559.gz
Comment 21 Pierre Ossman cendio 2013-05-21 08:59:03 CEST
(In reply to comment #20)
> Found when testing Bug 2968. 
> 
> On RHEL6, the xserver crashes when trying to run:
> 
> LIBGL_ALWAYS_INDIRECT=1 compiz --replace
> 
> Core file here: 
> /home/astrand/tmp/core.5559.gz

Fixed in r27415 and reported upstream:

https://bugs.freedesktop.org/show_bug.cgi?id=64791
Comment 22 Peter Åstrand cendio 2013-05-21 09:11:38 CEST
*** Bug 4649 has been marked as a duplicate of this bug. ***
Comment 23 Peter Åstrand cendio 2013-05-21 09:12:42 CEST
Handling bug 4649 on this bug, since it is likely a regression caused by the Xserver upgrade:

Verified on RHEL6, nightly build, all updates. To reproduce:

* Start a new session in window mode

* Switch to full screen. I have two monitors and "all monitors" activated. 

* Switch back to window mode. 

Node, the desktop environment will show a window "Starting up file manager...".
Then there's another one, and another one, etc...  dmesg shows:

__ratelimit: 6 callbacks suppressed
nautilus[15932]: segfault at 8 ip 00007fd519e05c84 sp 00007fffaf239ef0 error 4
in libgnome-desktop-2.so.11.4.2[7fd519dee000+28000]
Comment 24 Henrik Andersson cendio 2013-05-21 12:38:08 CEST
naev (simple 3d space game) glxgear and google earth tested with both indirect/direct rendering without any problems.
Comment 25 Pierre Ossman cendio 2013-05-22 13:34:40 CEST
(In reply to comment #23)
> Handling bug 4649 on this bug, since it is likely a regression caused by the
> Xserver upgrade:
> 
> Verified on RHEL6, nightly build, all updates. To reproduce:
> 
> * Start a new session in window mode
> 
> * Switch to full screen. I have two monitors and "all monitors" activated. 
> 
> * Switch back to window mode. 
> 
> Node, the desktop environment will show a window "Starting up file manager...".
> Then there's another one, and another one, etc...  dmesg shows:
> 
> __ratelimit: 6 callbacks suppressed
> nautilus[15932]: segfault at 8 ip 00007fd519e05c84 sp 00007fffaf239ef0 error 4
> in libgnome-desktop-2.so.11.4.2[7fd519dee000+28000]

This is messy. First off, the reason it happens now is because of GTK+ requiring RandR 1.3, even though 1.2 would be sufficient for what it wants to do. That's why we didn't see this in ThinLinc 4.0.0.

Nautilus crashes because it is trying to display a background on the disabled output/crtc. But since it has no mode it gets 0x0 as the dimensions. This screws up its internal logic, which results in NULL for the background pixmap and there subsequent code always expects a valid pointer.

There are two things that our RandR code does differently compared to a "real" server, and fixing either would make nautilus behave properly:

 a) We always pretend that an output is connected. Nautilus ignores disconnected outputs (even if they might still be in use).

 b) We clear the mode of the CRTC when "disabling" an output. A "real" server also disassociates the output from the CRTC.


Fixing a) could be done by automatically toggling the connection state depending on if an output has a CRTC and/or mode set. This is not quite the same as what would happen with a "real" server, but we have no external events to map connection state to, so it might have to be good enough.

Fixing b) is a bit trickier as we need to update the code that is also shared with libvnc.so. That code would need to become more clever in figuring out a free CRTC to connect to. Doable, but might not be trivial.
Comment 26 Pierre Ossman cendio 2013-05-22 13:38:19 CEST
(In reply to comment #17)
> 
> The problem was animated cursors. It has now been fixed in r27349.

Broken for older X servers:

r5090 breaks when compiled against Xorg 7.5:

Making all in vnc
  CXX   xf86vncModule.o
  CXX   vncExtInit.o
  CXX   vncHooks.o
  CXX   XserverDesktop.o
vncExtInit.cc: In function 'void vncExtensionInit()':
vncExtInit.cc:197: warning: deprecated conversion from string constant to
'char*'
vncExtInit.cc: In function 'int ProcVncExtListParams(_Client*)':
vncExtInit.cc:762: warning: unused variable 'stuff'
vncExtInit.cc: In function 'int ProcVncExtGetClientCutText(_Client*)':
vncExtInit.cc:857: warning: unused variable 'stuff'
vncExtInit.cc: In function 'int ProcVncExtGetQueryConnect(_Client*)':
vncExtInit.cc:1010: warning: unused variable 'stuff'
vncHooks.cc: In function 'void GlyphRegion(int, _GlyphList*, _Glyph**,
pixman_region16*)':
vncHooks.cc:675: error: 'RegionUninit' was not declared in this scope
vncHooks.cc:697: error: 'RegionInitBoxes' was not declared in this scope
vncHooks.cc: In function 'void vncHooksGlyphs(CARD8, _Picture*, _Picture*,
_PictFormat*, INT16, INT16, int, _GlyphList*, _Glyph**)':
vncHooks.cc:721: error: 'RegionTranslate' was not declared in this scope
vncHooks.cc:728: error: 'RegionInit' was not declared in this scope
vncHooks.cc:730: error: 'RegionIntersect' was not declared in this scope
vncHooks.cc:732: error: 'RegionUninit' was not declared in this scope
vncHooks.cc: In function 'void vncPostScreenResize(_Screen*, Bool)':
vncHooks.cc:784: warning: the address of 'box' will never be NULL
make[3]: *** [libvnccommon_la-vncHooks.lo] Error 1
make[3]: *** Waiting for unfinished jobs....
XserverDesktop.cc: In member function 'virtual unsigned int
XserverDesktop::setScreenLayout(int, int, const rfb::ScreenSet&)':
XserverDesktop.cc:981: warning: 'crtc' may be used uninitialized in this
function
make[2]: *** [all] Error 2
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1
Comment 27 Pierre Ossman cendio 2013-05-22 14:49:53 CEST
(In reply to comment #16)
> (In reply to comment #15)
> > Seems to be a VNC thing. Taking a screen shot shows all the entries in place.
> 
> Fixed in r27336.

Actually it was this that broke things for older Xorg.
Comment 28 Pierre Ossman cendio 2013-05-22 15:05:59 CEST
(In reply to comment #25)
> 
> There are two things that our RandR code does differently compared to a "real"
> server, and fixing either would make nautilus behave properly:
> 

Apparently I had been a bit proactive when I originally did this so it was easier to fix than expected. Sorted out in r27434.
Comment 29 Pierre Ossman cendio 2013-05-22 15:06:45 CEST
(In reply to comment #27)
> (In reply to comment #16)
> > (In reply to comment #15)
> > > Seems to be a VNC thing. Taking a screen shot shows all the entries in place.
> > 
> > Fixed in r27336.
> 
> Actually it was this that broke things for older Xorg.

Also fixed in r27434. Tester needs to recheck that Libreoffice works.
Comment 30 Henrik Andersson cendio 2013-05-24 07:02:43 CEST
(In reply to comment #29)
> (In reply to comment #27)
> > (In reply to comment #16)
> > > (In reply to comment #15)
> > > > Seems to be a VNC thing. Taking a screen shot shows all the entries in place.
> > > 
> > > Fixed in r27336.
> > 
> > Actually it was this that broke things for older Xorg.
> 
> Also fixed in r27434. Tester needs to recheck that Libreoffice works.

Libreoffice verified
Comment 31 Henrik Andersson cendio 2013-05-24 08:01:05 CEST
First pass running piglet test suit using direct rendering got stuck on a test after a few hours. But it didnt crash Xvnc, but a few crashes in the piglet tools along the way.

Second pass running piglet using indirect rendering Xvnc crashes almost immediately with following backtrace:  

Program received signal SIGSEGV, Segmentation fault.
0x00000000004b4fed in DoGetString (cl=0xdfa578, pc=0x431eff8 "\003\037", need_swap=0 '\000')
    at single2.c:349
349	    string = (const char *) CALL_GetString(GET_DISPATCH(), (name));
(gdb) bt
#0  0x00000000004b4fed in DoGetString (cl=0xdfa578, pc=0x431eff8 "\003\037", need_swap=0 '\000')
    at single2.c:349
#1  0x00000000004a7c54 in __glXDispatch (client=<optimized out>) at glxext.c:581
#2  0x000000000056394e in Dispatch () at dispatch.c:432
#3  0x000000000055106a in main (argc=<optimized out>, argv=0x7fffffffe2c8, envp=<optimized out>)
    at main.c:295
(gdb)
Comment 32 Pierre Ossman cendio 2013-05-24 14:10:56 CEST
(In reply to comment #31)
> 
> Second pass running piglet using indirect rendering Xvnc crashes almost
> immediately with following backtrace:  
> 

The problem is this test:

[Fri May 24 14:07:42 2013] ::  running :: spec/EXT_packed_depth_stencil/depthstencil-render-miplevels 1024 d=z24_s8_s=z24_s8

And the crash is caused by the GLX function dispatch table being set to NULL. Probably because of a bug in the test case, but it still shouldn't be able to crash the server.
Comment 33 Pierre Ossman cendio 2013-05-27 09:03:20 CEST
(In reply to comment #32)
> (In reply to comment #31)
> > 
> > Second pass running piglet using indirect rendering Xvnc crashes almost
> > immediately with following backtrace:  
> > 
> 
> The problem is this test:
> 
> [Fri May 24 14:07:42 2013] ::  running ::
> spec/EXT_packed_depth_stencil/depthstencil-render-miplevels 1024
> d=z24_s8_s=z24_s8
> 
> And the crash is caused by the GLX function dispatch table being set to NULL.
> Probably because of a bug in the test case, but it still shouldn't be able to
> crash the server.

Scratch that. The problem only occurs when you are running multiple OpenGL programs in parallel (which piglit does by default). Something must be broken with how the GLX code switches between client contexts.
Comment 34 Pierre Ossman cendio 2013-05-27 11:27:42 CEST
(In reply to comment #33)
> 
> Scratch that. The problem only occurs when you are running multiple OpenGL
> programs in parallel (which piglit does by default). Something must be broken
> with how the GLX code switches between client contexts.

The X server actually had an extra safety net to protect against this scenario. It was removed some time ago though because "it wasn't needed". Bug filed upstream:

https://bugs.freedesktop.org/show_bug.cgi?id=65030

Patched back the safety net in r27449. Going to have one more look to see if I can fix the underlying problem.
Comment 35 Pierre Ossman cendio 2013-05-28 16:15:36 CEST
(In reply to comment #34)
> 
> Patched back the safety net in r27449. Going to have one more look to see if I
> can fix the underlying problem.

Should hopefully be fixed in r27457.
Comment 36 Henrik Andersson cendio 2013-05-30 10:40:04 CEST
(In reply to comment #31)
> First pass running piglet test suit using direct rendering got stuck on a test
> after a few hours. But it didnt crash Xvnc, but a few crashes in the piglet
> tools along the way.
> 
> Second pass running piglet using indirect rendering Xvnc crashes almost
> immediately with following backtrace:  
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x00000000004b4fed in DoGetString (cl=0xdfa578, pc=0x431eff8 "\003\037",
> need_swap=0 '\000')
>     at single2.c:349
> 349        string = (const char *) CALL_GetString(GET_DISPATCH(), (name));
> (gdb) bt
> #0  0x00000000004b4fed in DoGetString (cl=0xdfa578, pc=0x431eff8 "\003\037",
> need_swap=0 '\000')
>     at single2.c:349
> #1  0x00000000004a7c54 in __glXDispatch (client=<optimized out>) at
> glxext.c:581
> #2  0x000000000056394e in Dispatch () at dispatch.c:432
> #3  0x000000000055106a in main (argc=<optimized out>, argv=0x7fffffffe2c8,
> envp=<optimized out>)
>     at main.c:295
> (gdb)

Update my Ubuntu 12.04 with ubild 3966 and restarted a piglet test using indirect rendering and the test completed without crashing the XVnc.
Comment 37 Henrik Andersson cendio 2013-05-31 13:01:50 CEST
Closing this bug due to there are no more issues found and I'm out of ideas to test this further.

Note You need to log in before you can comment on or make changes to this bug.