gh-152905: Decode LC_TIME items in nl_langinfo() from glibc wide data by serhiy-storchaka · Pull Request #152911 · python/cpython

serhiy-storchaka · 2026-07-02T18:12:25Z

On glibc, decode the LC_TIME items from the wide (_NL_W*) locale data, so the result no longer depends on the LC_CTYPE encoding.

The wide constant is always _NL_W + the narrow name, so it is filled into langinfo_constants[] by token pasting — one scan yields both the item and its wide form. ERA has no wide counterpart and keeps the narrow path.

_{🤖 Generated with Claude Code}

Issue: locale.nl_langinfo(): decode LC_TIME items from glibc wide locale data #152905

…e data On glibc, locale.nl_langinfo() now decodes the LC_TIME text items from the wide (_NL_W*) locale data, independently of the LC_CTYPE encoding. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

read-the-docs-community · 2026-07-02T18:16:35Z

Documentation build overview

📚 cpython-previews | 🛠️ Build #33431572 | 📁 Comparing 85d729d against main (31864bd)

🔍 Preview build

3 files changed

± library/concurrent.futures.html
± library/locale.html
± whatsnew/changelog.html

The encoding-independence guaranteed by the wide (_NL_W*) decode is glibc-specific, so gate test_nl_langinfo_encoding_independent on glibc (which also covers the previously skipped musl case). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vstinner · 2026-07-03T15:40:36Z

On my Fedora 44, this change is not only technical, it does actually change nl_langinfo() output on multiple locales.

Some examples.

Locale ast_ES.iso885915: MON_10: "d'ochobre" => MON_10: 'd’ochobre' (different quote: U+0027 => U+2019)
Locale br_FR.iso885915@euro: D_T_FMT: "D'ar %A %d a viz %B %Y %T" => 'Dʼar %A %d a viz %B %Y %T' (different quote)
Locale es_ES.iso885915@euro: AM_STR: 'a.\xa0m.' => 'a.\u202fm.'
Locale oc_FR.iso88591: MON_4: "d'abril" => 'd’abril' (different quote)
Locale ro_RO.iso88592: DAY_3: 'marţi' => DAY_3: 'marți' (U+0163 => U+021b)
Locale yi_US: ABDAY_2: "מאָנ'" => "מאָנ'" (U+05d0 U+05b8 => U+fb2f)

I wrote this script to dump all nl_langinfo() values of all locales on Linux:

import locale
import subprocess

def get_all_locales():
    cmd = ['locale', '-a']
    proc = subprocess.run(cmd, stdout=subprocess.PIPE, text=True)
    stdout = proc.stdout
    return stdout.splitlines()

langinfo_constants = [
    "DAY_1",
    "DAY_2",
    "DAY_3",
    "DAY_4",
    "DAY_5",
    "DAY_6",
    "DAY_7",
    "ABDAY_1",
    "ABDAY_2",
    "ABDAY_3",
    "ABDAY_4",
    "ABDAY_5",
    "ABDAY_6",
    "ABDAY_7",
    "MON_1",
    "MON_2",
    "MON_3",
    "MON_4",
    "MON_5",
    "MON_6",
    "MON_7",
    "MON_8",
    "MON_9",
    "MON_10",
    "MON_11",
    "MON_12",
    "ABMON_1",
    "ABMON_2",
    "ABMON_3",
    "ABMON_4",
    "ABMON_5",
    "ABMON_6",
    "ABMON_7",
    "ABMON_8",
    "ABMON_9",
    "ABMON_10",
    "ABMON_11",
    "ABMON_12",
    "RADIXCHAR",
    "THOUSEP",
    "CRNCYSTR",
    "D_T_FMT",
    "D_FMT",
    "T_FMT",
    "AM_STR",
    "PM_STR",
    "CODESET",
    "T_FMT_AMPM",
    "ERA",
    "ERA_D_FMT",
    "ERA_D_T_FMT",
    "ERA_T_FMT",
    "ALT_DIGITS",
    "YESEXPR",
    "NOEXPR",
    "_DATE_FMT",
]
langinfo_constants = [
    name
    for name in langinfo_constants
    if hasattr(locale, name)
]
langinfo_constants.sort()

all_locales = get_all_locales()
all_locales.sort()

for loc in all_locales:
    title = f"Locale {loc}"
    print(title)
    print("=" * len(title))
    print()

    locale.setlocale(locale.LC_ALL, loc)

    for name in langinfo_constants:
        key = getattr(locale, name)
        value = locale.nl_langinfo(key)
        print(f'{name}: {value!r}')
    print()

print(f"Total: nl_langinfo() values: {len(langinfo_constants)} per locale")

vstinner · 2026-07-03T15:49:50Z

+                values.append([nl_langinfo(item) for item in items])
+            if len(values) < 2:
+                continue
+            with self.subTest(locales=avail):


I don't see the purpose of avail. Here, it's always equal to locs (except that it's a list instead of a tuple). I suggest removing avail.

vstinner · 2026-07-03T15:52:23Z

+                for other in values[1:]:
+                    self.assertEqual(values[0], other)


This loop would be needed if variants values would have more than 2 locales. But currently, it's available 2 locales, so this loops seems complicated just to do:

self.assertEqual(values[0], values[1])

vstinner · 2026-07-03T16:05:38Z

+                    self.assertEqual(values[0], other)
+                tested = True
+        if not tested:
+            self.skipTest('no suitable locale pairs')


Currently, when the test fails, it generates a long output which can be hard to debug (I modified the code to inject a bug on purpose):

====================================================================== FAIL: test_nl_langinfo_encoding_independent (test.test__locale._LocaleTests.test_nl_langinfo_encoding_independent) (locales=['el_GR.UTF-8', 'el_GR.ISO8859-7']) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/vstinner/python/main/Lib/test/test__locale.py", line 320, in test_nl_langinfo_encoding_independent self.assertEqual(values[0], values[1]) ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^ AssertionError: Lists differ: ['Ιαν[365 chars]μμ', '%a %d %b %Y %T %Z', '%d/%m/%Y', '%T', '%I:%M:%S %p', ''] != ['Ιαν[365 chars]μμ', '%a %d %b %Y %T %Z', '%d/%m/%Y', '%T', '%I:%M:%S %p', 'x'] First differing element 44: '' 'x' ['Ιανουαρίου', 'Φεβρουαρίου', 'Μαρτίου', 'Απριλίου', 'Μαΐου', 'Ιουνίου', 'Ιουλίου', 'Αυγούστου', 'Σεπτεμβρίου', 'Οκτωβρίου', 'Νοεμβρίου', 'Δεκεμβρίου', 'Ιαν', 'Φεβ', 'Μαρ', 'Απρ', 'Μαΐ', 'Ιουν', 'Ιουλ', 'Αυγ', 'Σεπ', 'Οκτ', 'Νοε', 'Δεκ', 'Κυριακή', 'Δευτέρα', 'Τρίτη', 'Τετάρτη', 'Πέμπτη', 'Παρασκευή', 'Σάββατο', 'Κυρ', 'Δευ', 'Τρι', 'Τετ', 'Πεμ', 'Παρ', 'Σαβ', 'πμ', 'μμ', '%a %d %b %Y %T %Z', '%d/%m/%Y', '%T', '%I:%M:%S %p', - ''] + 'x'] ? +

An alternative is to compare a single value rather than comparing two arrays:

@unittest.skipUnless(nl_langinfo, "nl_langinfo is not available") @unittest.skipUnless(libc_ver()[0] == 'glibc', "wide nl_langinfo variants are glibc-specific") def test_nl_langinfo_encoding_independent(self): # gh-152905: The LC_TIME text items are decoded independently of the # LC_CTYPE encoding (on glibc via the wide nl_langinfo variants), so # the same locale in different encodings yields identical strings. self.addCleanup(setlocale, LC_TIME, setlocale(LC_TIME)) names = [f'MON_{i}' for i in range(1, 13)] names += [f'ABMON_{i}' for i in range(1, 13)] names += [f'DAY_{i}' for i in range(1, 8)] names += [f'ABDAY_{i}' for i in range(1, 8)] names += ['AM_STR', 'PM_STR', 'D_T_FMT', 'D_FMT', 'T_FMT'] if hasattr(locale, 'T_FMT_AMPM'): names.append('T_FMT_AMPM') if hasattr(locale, 'ALT_DIGITS'): names.append('ALT_DIGITS') items = [(name, getattr(locale, name)) for name in names] # The same language in a Unicode and a legacy encoding. variants = [ ('ja_JP.UTF-8', 'ja_JP.EUC-JP'), ('fr_FR.UTF-8', 'fr_FR.ISO8859-1'), ('el_GR.UTF-8', 'el_GR.ISO8859-7'), ] tested = False for locs in variants: values = [] for loc in locs: try: setlocale(LC_TIME, loc) except Error: continue values.append({name: nl_langinfo(item) for name, item in items}) if len(values) < 2: continue tested = True for name, item in items: with self.subTest(locales=locs, name=name): self.assertEqual(values[0][name], values[1][name]) if not tested: self.skipTest('no suitable locale pairs')

vstinner · 2026-07-03T16:07:15Z

+        if hasattr(locale, 'T_FMT_AMPM'):
+            items.append(locale.T_FMT_AMPM)
+        if hasattr(locale, 'ALT_DIGITS'):
+            items.append(locale.ALT_DIGITS)


You should also test _DATE_FMT, no?

Why not testing ERA_D_FMT, ERA_D_T_FMT and ERA_T_FMT?

vstinner · 2026-07-03T16:09:10Z

+    def test_nl_langinfo_encoding_independent(self):
+        # gh-152905: The LC_TIME text items are decoded independently of the
+        # LC_CTYPE encoding (on glibc via the wide nl_langinfo variants), so
+        # the same locale in different encodings yields identical strings.


Please mention that ERA has no wide character variant and so is not test.

vstinner · 2026-07-03T16:14:39Z

-#endif
-#ifdef ERA
            if (item == ERA && *result) {
                pyresult = decode_strings(result, SIZE_MAX);


You might remove max_count of decode_strings() since it's no longer needed.

vstinner · 2026-07-03T16:15:58Z


+   .. versionchanged:: next
+      On glibc, the ``LC_TIME`` items are now decoded
+      independently of the ``LC_CTYPE`` encoding.


Except of ERA, no?

vstinner · 2026-07-03T16:17:09Z

On my Fedora 44, this change is not only technical, it does actually change nl_langinfo() output on multiple locales.

I'm fine with the change anyway. But since the nl_langinfo() output changes on some locales, I would prefer to not backport this change.

serhiy-storchaka · 2026-07-03T16:48:48Z

On my Fedora 44, this change is not only technical, it does actually change nl_langinfo() output on multiple locales.

I know, this is a point. Our strftime wraps wcsftime instead of strftime if possible. On gcc it produces results consistent with wide nl_langinfo. If we re-implemented it in Python (there are such plans), we need a wide nl_langinfo.

On other hand, I found these discrepancies when tried to use nl_langinfo in strptime. strptime should be permissive in any case, accept the output of wcsftime and strftime, it should normalize apostrophes, etc, and current code already do this (or many of this, I need to check my non-merged patches).

* Rewrite test_nl_langinfo_encoding_independent to compare each item individually (clearer failures), listing only the legacy locales and deriving the UTF-8 variant; broaden coverage to 20 locales across 17 legacy encodings. Also test ERA_D_FMT, ERA_D_T_FMT, ERA_T_FMT and _DATE_FMT; note that ERA has no wide variant and is not tested. * Drop the now-unused max_count parameter of decode_strings(). * Mention in the docs that ERA is not affected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

bedevere-app Bot mentioned this pull request Jul 2, 2026

locale.nl_langinfo(): decode LC_TIME items from glibc wide locale data #152905

Open

bedevere-app Bot added the awaiting core review label Jul 2, 2026

serhiy-storchaka requested a review from vstinner July 2, 2026 18:18

vstinner reviewed Jul 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-152905: Decode LC_TIME items in nl_langinfo() from glibc wide data#152911

gh-152905: Decode LC_TIME items in nl_langinfo() from glibc wide data#152911
serhiy-storchaka wants to merge 3 commits into
python:mainfrom
serhiy-storchaka:gh-152905-nl-langinfo-wide

serhiy-storchaka commented Jul 2, 2026 •

edited by bedevere-app Bot

Loading

Uh oh!

read-the-docs-community Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

vstinner commented Jul 3, 2026

Uh oh!

vstinner Jul 3, 2026

Uh oh!

vstinner Jul 3, 2026

Uh oh!

vstinner Jul 3, 2026

Uh oh!

vstinner Jul 3, 2026

Uh oh!

vstinner Jul 3, 2026

Uh oh!

vstinner Jul 3, 2026

Uh oh!

vstinner Jul 3, 2026

Uh oh!

vstinner commented Jul 3, 2026

Uh oh!

serhiy-storchaka commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

serhiy-storchaka commented Jul 2, 2026 • edited by bedevere-app Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

read-the-docs-community Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation build overview

Uh oh!

vstinner commented Jul 3, 2026

Uh oh!

vstinner Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

vstinner Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

vstinner Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

vstinner Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

vstinner Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

vstinner Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

vstinner Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

vstinner commented Jul 3, 2026

Uh oh!

serhiy-storchaka commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

serhiy-storchaka commented Jul 2, 2026 •

edited by bedevere-app Bot

Loading

read-the-docs-community Bot commented Jul 2, 2026 •

edited

Loading