gh-152248: Reject a POSIX TZ abbreviation with non-ASCII-letter characters in pure-Python zoneinfo by tonghuaroot · Pull Request #152249 · python/cpython

tonghuaroot · 2026-06-26T05:39:30Z

The pure-Python zoneinfo parser accepts a POSIX TZ string whose unquoted std/dst abbreviation contains characters other than ASCII letters (for example an embedded space or a non-ASCII letter), while the C implementation rejects it. The unquoted alternative in the parser regex is a negated class ([^<0-9:.+-]+) that admits anything except a few delimiters, whereas the C parse_abbr walks the unquoted form with Py_ISALPHA (ASCII letters only), as POSIX (via RFC 8536) requires for the unquoted form.

This tightens the unquoted alternative to [a-zA-Z]+, matching the C accelerator and POSIX, and leaves the quoted <...> form untouched. Every well-formed TZ string and all bundled IANA zones still parse unchanged; only the previously-accepted strings now raise ValueError.

The non-ASCII case is reachable through the public from_file path, which UTF-8-decodes the footer, so it is covered by a dedicated regression test in addition to the whitespace cases added to the shared invalid_tzstrs list.

Issue: zoneinfo: pure-Python POSIX TZ unquoted abbreviation regex accepts whitespace or non-ASCII letters (C rejects) #152248

… characters in pure-Python zoneinfo

StanFromIreland · 2026-06-26T10:02:01Z

+        tzstr = "ABÀC3"
+        footer = tzstr.encode("utf-8")
+
+        def from_footer():


We can give zone_from_tzstr a new parameter for encoding rather than duplicating.

Done, added an encoding parameter to zone_from_tzstr and reused it. I kept this a separate method only because the C and pure errors differ (bytes repr vs decoded text), so each is matched against its own message.

StanFromIreland · 2026-06-26T10:04:06Z

    parser_re = re.compile(
        r"""
-        (?P<std>[^<0-9:.+-]+|<[a-zA-Z0-9+-]+>)
+        (?P<std>[a-zA-Z]+|<[a-zA-Z0-9+-]+>)


And I see another divergence, C accepts an empty <>. :'-(

Good catch. The direction is the reverse of this PR though: here C is the lenient side. Its parse_abbr quoted branch has no empty check, while its own unquoted branch rejects an empty run (if (str_end == str_start) return -1;), so the pure parser is correct. Want me to fold a small C fix in here, or open a separate issue?

Please add it here, it's in the scope of POSIX TZ strings. This is actually spelled out by recent versions of the standard:

the quoting characters do not contribute to the three byte minimum length and {TZNAME_MAX} maximum length.

Done. The C parser now rejects an empty <>, mirroring its unquoted branch.

pythongh-152248: Reject a POSIX TZ abbreviation with non-ASCII-letter…

8c164ba

… characters in pure-Python zoneinfo

tonghuaroot requested review from StanFromIreland and pganssle as code owners June 26, 2026 05:39

bedevere-app Bot added the awaiting review label Jun 26, 2026

bedevere-app Bot mentioned this pull request Jun 26, 2026

zoneinfo: pure-Python POSIX TZ unquoted abbreviation regex accepts whitespace or non-ASCII letters (C rejects) #152248

Open

tonghuaroot added 2 commits June 26, 2026 13:42

Trim test comments

8a6bcd2

Wrap test comment to 79 columns

6dd23ff

StanFromIreland reviewed Jun 26, 2026

View reviewed changes

tonghuaroot added 2 commits June 26, 2026 18:34

Reuse zone_from_tzstr with an encoding parameter in the non-ASCII test

bf26921

Reject an empty quoted abbreviation in the C zoneinfo parser

1931021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-152248: Reject a POSIX TZ abbreviation with non-ASCII-letter characters in pure-Python zoneinfo#152249

gh-152248: Reject a POSIX TZ abbreviation with non-ASCII-letter characters in pure-Python zoneinfo#152249
tonghuaroot wants to merge 5 commits into
python:mainfrom
tonghuaroot:gh-152248-zoneinfo-abbr-charset

tonghuaroot commented Jun 26, 2026 •

edited by bedevere-app Bot

Loading

Uh oh!

StanFromIreland Jun 26, 2026

Uh oh!

tonghuaroot Jun 26, 2026

Uh oh!

StanFromIreland Jun 26, 2026

Uh oh!

tonghuaroot Jun 26, 2026

Uh oh!

StanFromIreland Jun 26, 2026 •

edited

Loading

Uh oh!

tonghuaroot Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

tonghuaroot commented Jun 26, 2026 • edited by bedevere-app Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StanFromIreland Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

tonghuaroot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

StanFromIreland Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

tonghuaroot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

StanFromIreland Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tonghuaroot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tonghuaroot commented Jun 26, 2026 •

edited by bedevere-app Bot

Loading

StanFromIreland Jun 26, 2026 •

edited

Loading