7842 – rpm2debpkg doesn't handle deconfigure events

Bug 7842 - rpm2debpkg doesn't handle deconfigure events

Summary: rpm2debpkg doesn't handle deconfigure events

Status:	CLOSED FIXED

Alias:	None

Product:	ThinLinc
Classification:	Unclassified
Component:	Build system (show other bugs)
Version:	trunk
Hardware:	PC Unknown

Importance:	P2 Normal
Target Milestone:	4.15.0
Assignee:	William Sjöblom

URL:
Keywords:	ossman_tester, prosaic

Depends on:
Blocks:

Reported:	2022-02-23 14:57 CET by Pierre Ossman
Modified:	2022-04-22 07:59 CEST (History)
CC List:	2 users (show)

See Also:	4885
Acceptance Criteria:	- rpm2debpkg should produce DEBs that closely replicate the behavior of the source RPMs

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Pierre Ossman cendio

2022-02-23 14:57:06 CET

Unfortunately there are more cases we need to consider when it comes to the scripts in our RPMs when they are converted to .deb. We looked at this in bug 5062 and bug 3744 but still missed things.

From the deb-prerm man page we can see there are more events that trigger this script than are currently handled. Specifically "prerm deconfigure in-favour new-package new-version".

When we encounter this we do "exit 1", which aborts the entire transaction.

To make things worse, we also don't handle the postinst call that happens when the prerm fails.

This happened in practice on bug 4885 where we are replacing multiple thinlinc packages with a single one. We then tried using --auto-deconfigure as a workaround of dpkg not being clever enough resolving dependencies.


The output seen in the log:

> dpkg: error processing archive /home/cendio/tl-4.14.0post-server/packages/thinlinc-server_4.14.0post-0.unknown_amd64.deb (--install):
>  installed thinlinc-webaccess package pre-removal script subprocess returned error exit status 1
> dpkg: error while cleaning up:
>  installed thinlinc-webaccess package post-installation script subprocess returned error exit status 1

Comment 1 Frida Flodin cendio

2022-03-11 13:25:57 CET

So all our deb script helpers in rpm2debpkg exits with 1 when called in a way that we didn't expect. I could not find why we do this, it seems to have been that way since the code was first written. 

It feels more reasonable to just do nothing ("exit 0") when a script is called in a way that does not translate to an rpm script call.

Comment 2 Frida Flodin cendio

2022-03-11 14:46:49 CET

I did a test run with rpm2debpkg changed to add "exit 0" instead of "exit 1" in all four script helpers. I then built our old packages before bug 4885 and installed them on Ubuntu 20.04.

This is the result I got when trying to upgrade to the nightly server build using the --auto-deconfigure flag:

> cendio@ubuntu2004:~$ sudo dpkg --install --auto-deconfigure --force-confmiss --force-confold --force-overwrite nightly-server/packages/thinlinc-server_4.14.0post-2502_amd64.deb

> Selecting previously unselected package thinlinc-server.
> dpkg: considering removing thinlinc-tladm in favour of thinlinc-server ...
> dpkg: yes, will remove thinlinc-tladm in favour of thinlinc-server
> dpkg: considering removing thinlinc-tlmisc in favour of thinlinc-server ...
> dpkg: yes, will remove thinlinc-tlmisc in favour of thinlinc-server
> dpkg: considering removing thinlinc-tlmisc-libs in favour of thinlinc-server ...
> dpkg: yes, will remove thinlinc-tlmisc-libs in favour of thinlinc-server
> dpkg: considering removing thinlinc-tlmisc-libs32 in favour of thinlinc-server ...
> dpkg: yes, will remove thinlinc-tlmisc-libs32 in favour of thinlinc-server
> dpkg: considering removing thinlinc-tlprinter in favour of thinlinc-server ...
> dpkg: yes, will remove thinlinc-tlprinter in favour of thinlinc-server
> dpkg: considering removing thinlinc-vnc-server in favour of thinlinc-server ...
> dpkg: yes, will remove thinlinc-vnc-server in favour of thinlinc-server
> dpkg: considering removing thinlinc-vsm in favour of thinlinc-server ...
> dpkg: yes, will remove thinlinc-vsm in favour of thinlinc-server
> dpkg: considering removing thinlinc-webaccess in favour of thinlinc-server ...
> dpkg: yes, will remove thinlinc-webaccess in favour of thinlinc-server
> (Reading database ... 182989 files and directories currently installed.)
> Preparing to unpack .../thinlinc-server_4.14.0post-2502_amd64.deb ...
> De-configuring thinlinc-vnc-server (4.14.0post-0.unknown), to allow removal of thinlinc-tlmisc (4.14.0post-0.unknown) ...
> De-configuring thinlinc-vsm (4.14.0post-0.unknown), to allow removal of thinlinc-tlmisc (4.14.0post-0.unknown) ...
> De-configuring thinlinc-webaccess (4.14.0post-0.unknown), to allow removal of thinlinc-tlmisc (4.14.0post-0.unknown) ...
> Removed /etc/systemd/system/multi-user.target.wants/tlwebadm.service.
> Removed /etc/systemd/system/multi-user.target.wants/vsmagent.service.
> Removed /etc/systemd/system/multi-user.target.wants/vsmserver.service.
> Removed /etc/systemd/system/multi-user.target.wants/tlwebaccess.service.
> Unpacking thinlinc-server (4.14.0post-2502) ...
> Setting up thinlinc-server (4.14.0post-2502) ...
> dpkg: error processing package thinlinc-webaccess (--install):
>  package thinlinc-webaccess is not ready for configuration
>  cannot configure (current status 'config-files')
> dpkg: error processing package thinlinc-vsm (--install):
>  package thinlinc-vsm is not ready for configuration
>  cannot configure (current status 'config-files')
> Processing triggers for hicolor-icon-theme (0.17-2) ...
> Processing triggers for gnome-menus (3.36.0-1ubuntu1) ...
> Processing triggers for desktop-file-utils (0.24-1ubuntu3) ...
> Processing triggers for mime-support (3.64ubuntu1) ...
> Errors were encountered while processing:
>  thinlinc-webaccess
>  thinlinc-vsm


The deconfiguration of thinlinc-webaccess, thinlinc-vsm and thinlinc-vnc workes. But it seems like dpkg tries to configure them again once thinlinc-server is installed. I'm guessing this fails since those packages have been removed? I'm not sure if we could do something to avoid these errors.

Comment 3 William Sjöblom cendio

2022-03-17 16:20:14 CET

Here are a couple of links providing a better description of the different trigger script events:

* Pretty flowcharts for common scenarios: https://wiki.debian.org/MaintainerScripts 
* Detailed documentation: https://www.debian.org/doc/debian-policy/ch-maintainerscripts.html

These two should provide a solid ground for making our RPM to DEB conversion trigger scripts a little saner.

Comment 4 William Sjöblom cendio

2022-03-18 09:17:50 CET

I've had a look at the seemingly promising alien (https://wiki.debian.org/Alien). Sadly, alien does is even more naive than we currently are when converting RPMs to DEBs regarding script conversions.

Comment 5 William Sjöblom cendio

2022-03-18 14:43:55 CET

According to the Fedora guidelines, trigger scripts MUST exit with code 0
[1]. This means that assuming our RPM trigger scripts are well behaved, we have
no possibility of reaching the =abort-*= states in the DEB trigger script state
machine from the scripts/states defined in our source RPM file.

DEB has a lot of error handling in the case that its trigger scripts return with
a non-zero exit code [2]. RPM does not have any such features. Additionally that
we cannot reasonably generate sane handlers, neither based on the RPM spec nor
any general ones. So, for the =failed-*= and =abort-*= calls to the DEB trigger
scripts, the most promising choice is to hard-code exit codes such that we get
out of the error handling states and back to the actual install states as soon
as possible. Worth noting is that since none of our RPM trigger scripts can
return a non-zero exit code in practice, the scenario of us ending up in error handling states is currently very hypothetical.

So, the most reasonable approach to get out of the "error unwind" states seems
to be to let both our =failed-*= and =abort-*= handlers always exit with
code 0. For the common install/upgrade/remove/purge operations this can be seen
visually in [3]. Note that exiting with 0 for both fails and aborts does not
always get us out of the "error unwind" states, but it seems to be our best bet
as of now.


[1] https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#_syntax
[2] See the "Error unwinds" under
    https://www.debian.org/doc/debian-policy/ch-maintainerscripts.html
[3] https://wiki.debian.org/MaintainerScripts

Comment 6 William Sjöblom cendio

2022-03-18 14:59:37 CET

Other than the currently unhandled `failed-*' and `abort-*' calls, we have the following trigger script calls to decide reasonable return codes for:
 
- `postinst triggered'
- `prerm deconfigure'
- `postrm disappear'. 

The documentation for these calls is pretty scarce, so some more digging is needed. Though it is likely that a simple `exit 0' will do for these too. If that proves to be the case, all trigger script calls that are not handled by our RPM trigger scripts will simply exit with code 0 as was suspected in comment 1.

We also likely want to change the default exit code in the DEB trigger script prelude from `1' to `0' for future-proofing.

Comment 7 William Sjöblom cendio

2022-03-18 16:30:42 CET

Note that the above use (comment 3 and comment 5) of "trigger scripts"
actually refer to "scriptlets". My bad!

Since we don't use dpkg triggers ( this time I mean actual triggers ;) ) 
we can simply `exit 0' for these. Similarly, we cannot do much
other than exiting with 0 for the remaining `postinst deconfigure' and
`postrm disappear'.

So, all in all, Fridas approach in comment 1 of just running with `exit 0' 
seem more than reasonable.

Comment 12 William Sjöblom cendio

2022-03-21 13:56:12 CET

Based on the reasoning and conclusion in comment 5, comment 6 and
comment 7, I have now implemented a fix. Please note that the issue with
`--auto-deconfigure' mentioned in comment 2 still persists as it is
unrelated to this bug and a consequence of the mediocre handling of
dependencies in dpkg.

I have tested the following on Ubuntu 18.04 (where 4.14.0post indicates
a build with the updated rpm2debpkg):
• ☑ Fresh install of 4.14.0post (with new DEBs)
  • ☑ Server (install of build #2519)
  • ☑ Client (install of build #2424)
• ☑ Upgrade from 4.14.0 to 4.14.0post (with new DEBs)
  • ☑ Server (upgrade to build #2519)
  • ☑ Client (upgrade to build #2424)
• ☑ Upgrade from 4.14.0post to locally built 4.15.0
  • ☑ Server (upgrade from build #2519)
  • ☑ Client (upgrade from build #2424)


Note that this bug is pretty hypothetical in nature and, from my point
of view, has two primary purposes (if anyone else has anything to add
here, please do):

1. Minimizing the risk of the dpkg package database getting into a
   broken state that is non-trivial to fix
2. As an upgrade precaution in case we want to do more exotic in our
   package scriptlets or tlinstaller in the future

Thus, the testing efforts required for this bug is pretty fuzzy and is
likely very exploratory in nature.

Marking as resolved.

Comment 13 Pierre Ossman cendio

2022-03-23 13:10:01 CET

Commits look good, as does the reasoning here in bugzilla.

Tested on Ubuntu 20.04:

 * Upgrade from 4.14.0
 * Downgrade to older 4.14.0post
 * Upgrade from older 4.14.0post
 * Fresh install
 * Reinstall of same version
 * Remove
 * Purge

Things I looked for to see that the scripts were run when intended:

 * .upgrade.stamp created
 * /usr/share/icons/hicolor touched
 * chmod a+x /opt/thinlinc/etc/xsession
 * /bin/systemctl daemon-reload
 * Services stopped

Could not see any issues for these standard cases.

Comment 14 Pierre Ossman cendio

2022-03-23 13:23:19 CET

Also tried re-converting 4.9.0 and adapting an upgrade to 4.14.0post, which got the same results as in comment 2. So not ideal, but further than before indicating that the scripts no longer fail at least.

Can't see anything more realistic to test, and the unit tests should cover the theoretical cases.

Good enough for me. :)

Note You need to log in before you can comment on or make changes to this bug.