Python has signalled that they will likely change the default encoding for files and subprocesses to UTF-8 in the future, regardless of the configured locale: https://peps.python.org/pep-0597/ The rationale is that many files are meant to be shared, so they should use a common encoding rather than a system specific one. The vast majority of systems already use UTF-8, so in practice this will change very little. It can affect the few users that use a non-standard encoding, though. In ThinLinc, we've been very careful about specifying the correct encoding when needed, so everything is designed with the assumption that Python will respect the system encoding if none is explicitly specified. If Python changes this default behaviour, then many of our calls need to be adjusted. For convenience, they've added the special "locale" encoding in these cases¹ so you don't have to look up the actual encoding in every place. However, that support is not until Python 3.10, which means it is likely we will have a period where we need to support both newer Pythons with changed default, and older Pythons without encoding="locale" support. ¹ It's not a standard codec though, so it cannot be used everywhere.
To convolute things even further, they've changed what "current locale" means in slightly Python 3.11. Previously, it meant "locale.getpreferredencoding(False)", but they've now changed it to "locale.getencoding()". The difference between the two isn't terribly clear, and the documentation just states that they are basically the same, except that UTF-8 mode is ignored in the new one. Which seems like an odd minor difference. To make things extra confusing, this is how the new handling is implemented in subprocess: > if sys.flags.utf8_mode: > return "utf-8" > else: > return locale.getencoding() Which makes it look like it is basically doing the same thing as before, i.e. "locale.getpreferredencoding(False)".
I checked the code¹, and locale.getpreferredencoding() is now only a wrapper around locale.getencoding() that first checks UTF-8. Which makes it super confusing what subprocess is doing. ¹ https://github.com/python/cpython/blob/main/Lib/locale.py
Python has added a new warning for this transition, but left that warning disabled by default. They indicate that they'll upgrade that warning to a deprecation warning at some point. One thing to note is that the warning isn't present in just the file and subprocess handling, but also in locale.getpreferredencoding() as well. So I guess those are meant to become deprecated as well?
I found the PEP for this change: https://peps.python.org/pep-0686/ They've apparently decided to change the default in Python 3.15, which should be released in 2026. Unfortunately for us, we won't be able to raise our requirements to Python 3.10 until 2032 when RHEL 9 is EOL. So the situation is that doing open() on a system with LC_CTYPE=latin1 will give you: Python < 3.15: latin1 Python >= 3.15: UTF-8 With LC_CTYPE=C it gets more complex because of UTF-8 mode: Python < 3.7: ASCII Python >= 3.7, < 3.11: UTF-8 Python >= 3.11, < 3.15: ASCII Python >= 3.15: UTF-8
Note that we mimic Python's subprocess.Popen handling of encodings in our extproc.subprocess_run(). We might want to mimic whatever Python is doing here as well, when it comes to chosen encoding and warnings.