Bug 8533 - Autoreconf fails when building glib2
Summary: Autoreconf fails when building glib2
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: Build system (show other bugs)
Version: trunk
Hardware: PC Unknown
: P2 Normal
Target Milestone: 4.19.0
Assignee: Samuel Mannehed
URL: https://git.cendio.se/thinlinc/cenbui...
Keywords: adaha_tester, prosaic
Depends on:
Blocks: 8368
  Show dependency treegraph
 
Reported: 2025-02-25 13:53 CET by Samuel Mannehed
Modified: 2025-04-01 13:41 CEST (History)
1 user (show)

See Also:
Acceptance Criteria:
MUST * Running autoreconf for glib2 must work. SHOULD * Autoconf should be upgraded to the latest version. * The full reason for the autoreconf failure should be documented and understood.


Attachments

Description Samuel Mannehed cendio 2025-02-25 13:53:56 CET
As part of the work to upgrade glibc in cenbuild, we encountered an issue with glib2.

The glibc upgrade means that some macros that our version of glib2 uses has moved. We use glib2 version 2.34.3, it was fixed in 2.39. However, upgrading glib2 is a larger step that will require more work.

Fortunately, the patch from v2.39 applies to our 2.34.3 as well. It simply requires us to run autoreconf since the patch modifies glib2's configure.ac.

Onfortunately, running autoreconf for glib2 fails:

> gtk-doc.make:302: error: HAVE_GTK_DOC does not appear in AM_CONDITIONAL
> docs/reference/gio/Makefile.am:140:   'gtk-doc.make' included from here
> ...
> autoreconf: error: automake failed with exit status: 1
Comment 1 Samuel Mannehed cendio 2025-02-25 13:54:46 CET
The notes from the research done so far can be found here:

https://git.cendio.se/thinlinc/cenbuild/-/merge_requests/35#note_2647
https://git.cendio.se/thinlinc/cenbuild/-/merge_requests/35#note_2700
Comment 2 Samuel Mannehed cendio 2025-03-03 16:44:36 CET
Turns out that the problem came from autom4te's cache.

When comparing the build directories of a failed and a successful build, we saw that the ONLY difference was in the "autom4te.cache" directory. After breaking apart autoreconf into its subcommands and adding a lot of debugging information we could see a key difference in the file "autom4te.cache/output.1". What stood out, was one line in particular. A failed build had this line:
> if test ${ac_cv_path_GTKDOC_CHECK+y}
A successful build had this line instead:
> if test ${ac_cv_prog_GTKDOC_CHECK+y}

The above code is an expanded m4 macro from gtk-doc, the new version uses AC_CHECK_PROG, as seen in this commit:
https://gitlab.gnome.org/GNOME/gtk-doc/-/commit/7ae93808d823ad10c4a62b2495a170a461ec60ec

And, as seen here, the variable names above correspond to AC_CHECK_PROG (for the new code), and AC_CHECK_PATH (for the old code).
https://www.gnu.org/software/autoconf/manual/autoconf-2.68/html_node/Generic-Programs.html

When the 2.34.2 release of glib2 was packaged (in 2012), the old AC_CHECK_PATH was used. When we later (in 2025) properly regenerated new m4 macros, we got the new AC_CHECK_PROG. The difference seen above verified that the cache did indeed contain old data when the glib2 build failed.

The above is a clear indicator that the failed builds work with old cached data, while the successful builds have an up-to-date cache. All information in the build directories was correct, aside from the cache. The crash in automake came from a bad autom4te-cache.
Comment 3 Samuel Mannehed cendio 2025-03-03 16:46:42 CET
The m4 macros mentioned in comment 2 are built into glib2's "aclocal.m4" file by the `aclocal` command, which is called by `autoreconf`. This happened correctly in both successful and failed builds - the aclocal.m4 file was correct in both cases. Only the cached information about these m4 macros differed. The cache is supposed to contain updated information from aclocal.m4. Why was there sometimes a difference in the cache, when the circumstances were the same?

The answer was timing. For a failed build, when looking at the "Modified" timestamp from `stat` of both aclocal.m4 and the cache, we could see the following:

> $ stat autom4te.cache/output.1
>      Size: 1068438   	Blocks: 2088       IO Block: 1048576 regular file
>    Modify: 2025-02-28 15:16:29.085434010 +0000
> $ stat autom4te.cache/traces.1
>      Size: 198693    	Blocks: 392        IO Block: 4096   regular file
>    Modify: 2025-02-28 15:16:29.085434010 +0000
> $ stat aclocal.m4
>      Size: 76621     	Blocks: 152        IO Block: 4096   regular file
>    Modify: 2025-02-28 15:16:29.833428215 +0000
As can be seen above, the "aclocal.m4" file was slightly newer than both "autom4te.cache/output.1" and "autom4te.cache/traces.1". This was bad since it meant the cache was outdated.

When looking at the same timestamps from `stat`, but for a successful build, things looked more sane:

> $ stat autom4te.cache/output.1
>      Size: 1068438   	Blocks: 2088       IO Block: 1048576 regular file
>    Modify: 2025-02-28 15:17:12.383098528 +0000
> $ stat autom4te.cache/traces.1
>      Size: 199734    	Blocks: 392        IO Block: 4096   regular file
>    Modify: 2025-02-28 15:17:12.383098528 +0000
> $ stat aclocal.m4
>      Size: 76621     	Blocks: 152        IO Block: 4096   regular file
>    Modify: 2025-02-28 15:17:11.547105006 +0000
Here, for a successful build, the cache was newer than the data it was supposed to build on, which made sense.
Comment 4 Samuel Mannehed cendio 2025-03-03 16:50:31 CET
With the conclusions in comment 3 established, we moved on to investigate when things went wrong. Since we had broken down autoreconf into subcommands, we added --verbose to the subcommands, and ran `stat` after each step. We found that after autoreconf's second `aclocal` call, things went awry. In a failed build we saw these timestamps:

> $ stat autom4te.cache/output.1
>     Size: 1068438        Blocks: 2088       IO Block: 4096   regular file
>   Modify: 2025-02-28 15:16:29.086434002 +0000
> $ stat autom4te.cache/traces.1
>     Size: 198693         Blocks: 392        IO Block: 4096   regular file
>   Modify: 2025-02-28 15:16:29.085434010 +0000
> $ stat aclocal.m4
>     Size: 76621          Blocks: 152        IO Block: 4096   regular file
>   Modify: 2025-02-28 15:16:29.833428215 +0000
And this verbose logline from `autom4te`:
> autom4te: up_to_date (autom4te.cache/traces.1): up to date
Here, for the failed build, we saw that autom4te considered the cache to be up-to-date, despite the "aclocal.m4" file being modified 0.7 seconds later than the cache.

---

In a successful build we saw these timestamps:

> $ stat autom4te.cache/output.1
>      Size: 1068438        Blocks: 2088       IO Block: 4096   regular file
>    Modify: 2025-02-28 15:17:10.786110902 +0000
> $ stat autom4te.cache/traces.1
>      Size: 198693         Blocks: 392        IO Block: 4096   regular file
>    Modify: 2025-02-28 15:17:10.785110910 +0000
> $ stat aclocal.m4
>      Size: 76621          Blocks: 152        IO Block: 4096   regular file
>    Modify: 2025-02-28 15:17:11.547105006 +0000
And this verbose logline from `autom4te`:
> autom4te: up_to_date (autom4te.cache/traces.1): outdated: aclocal.m4
For the successful build, we were lucky that the modification time of the "aclocal.m4" file had a different second-stamp. This correctly caused autom4te to consider the cache to be outdated.

The `autom4te` tool is part of the "autoconf" package. Inspection of autom4te's code showed that it only has second-resolution when comparing file-timestamps. However, new versions of autom4te will consider the file to be out of date if timestamps have the same second-stamp:

https://git.savannah.gnu.org/cgit/autoconf.git/commit/bin/autom4te.in?id=713d9822bbfb2923115065efaefed34a0113f8a1

And as it happens, we have the exact last version of autoconf that DOESNT include this fix. We have 2.71 and the fix was included in 2.72.

I tested upgrading autoconf, it fixed the problem!
Comment 5 Samuel Mannehed cendio 2025-03-05 16:52:55 CET
In my branch where autoconf is upgraded, I tested a full repo/rebuild --no-repo cendio-build-everything, and got this error when building cendio-build-gtk2-x86_64:
> configure: error: 
> *** Checks for JPEG loader failed. You can build without it by passing
> *** --without-libjpeg to configure but some programs using GTK+ may
> *** not work properly
And in config.log:
> configure:25207: checking for jpeg_destroy_decompress in -ljpeg
> configure:25236: x86_64-unknown-linux-gnu-gcc -o conftest -DGDK_PIXBUF_DISABLE_DEPRECATED -O2 -g -Wall  -DG_DISABLE_SINGLE_INCLUDES -DATK_DISABLE_SINGLE_INCLUDES -DGDK_PIXBUF_DISABLE_SINGLE_INCLUDES -DGTK_DISABLE_SINGLE_INCLUDES -g conftest.c -ljpeg    >&5
> configure:25236: $? = 0
> configure:25248: result: yes
> configure:25261: checking for jpeglib.h
> configure:25271:   -DG_DISABLE_SINGLE_INCLUDES -DATK_DISABLE_SINGLE_INCLUDES -DGDK_PIXBUF_DISABLE_SINGLE_INCLUDES -DGTK_DISABLE_SINGLE_INCLUDES conftest.c
> /opt/docker-cenbuild/cenbuild/repo/rpmbuild/BUILD/gtk+-2.16.5/configure: line 2314: -DG_DISABLE_SINGLE_INCLUDES: command not found
Part of the command seems to have disappeared, -DG_DISABLE_SINGLE_INCLUDES is supposed to be a flag to gcc. This is likely a quoting issue. The code in question in configure.in uses the macro AC_TRY_CPP, which is obsoleted:

https://www.gnu.org/software/autoconf/manual/autoconf-2.60/html_node/Obsolete-Macros.html

The comment at the top of that obsolete macros page, "typically they failed to quote properly", is a good indication that our gut feeling about AC_TRY_CPP was correct. 

The sane way of checking for a header would be to use AC_CHECK_HEADER. The code for jpeglib.h is 24 years old and many things were different back then. 

gtk2 stops using this macro for jpeglib.h here:

https://gitlab.gnome.org/GNOME/gtk/-/commit/cb29d2770714943af7b488a6a94f1f37b7466c8f
	
A quick test patch that replaces AC_TRY_CPP with AC_CHECK_HEADER for jpeglib.h works well - I'll roll with that.
Comment 8 Samuel Mannehed cendio 2025-03-06 13:06:15 CET
> MUST
> 
> * Running autoreconf for glib2 must work.
Yes, this works now.

> SHOULD
> 
> * Autoconf should be upgraded to the latest version.
Yep, a newer version was available, that included an essential fix. We're now at the latest 2.72 version.
> * The full reason for the autoreconf failure should be documented and understood.
Yes, the problem was that automake operated on outdated macros in autom4te's cache. See the full explanation in comment 2, comment 3 and comment 4.
Comment 9 Samuel Mannehed cendio 2025-03-06 13:07:36 CET
Note that the fix included in autoconf 2.72 might have fixed other intermittent issues in cenbuild. It is unclear how long the autom4te cache mechanism has had this flaw, it is possible it has been there since the beginning of cenbuild.
Comment 10 Samuel Mannehed cendio 2025-03-07 12:58:14 CET
(In reply to Samuel Mannehed from comment #5)
> gtk2 stops using this macro for jpeglib.h here:
> 
> https://gitlab.gnome.org/GNOME/gtk/-/commit/
> cb29d2770714943af7b488a6a94f1f37b7466c8f

Correction - gtk just broke this out to a separate project "gdk-pixbuf". This macro survived until they migrated to meson. The last version of gdk-pixbuf that had a configure.ac file was 2.36.12:

https://gitlab.gnome.org/GNOME/gdk-pixbuf/-/blob/2.36.12/configure.ac?ref_type=tags
Comment 11 Samuel Mannehed cendio 2025-03-19 16:52:52 CET
I have missed repo/rsync-to-repo here.
Comment 12 Samuel Mannehed cendio 2025-03-19 17:20:58 CET
Sync done.
Comment 13 Adam Halim cendio 2025-03-25 16:29:17 CET
Tested building glib2 by just ticking up the release number and did not see any issues.

Also tested server build 3957 on RHEL 9 and client build 3847 on macOS 15, Windows 10 and Fedora 41. Did some light testing with smart cards (pcsctun depends on glib2), did not see any issues.

> MUST
> * Running autoreconf for glib2 must work.
Indeed, we haven't seen any build issues since upgrading autoreconf.
> SHOULD
> * Autoconf should be upgraded to the latest version.
Indeed, we have the latest.
> * The full reason for the autoreconf failure should be documented and understood.
The issues seen have been documented very thoroughly.

Commits look good, closing.

Note You need to log in before you can comment on or make changes to this bug.