Skip to content

gh-152997: Support system locale encodings via an iconv-based codec#153001

Open
serhiy-storchaka wants to merge 3 commits into
python:mainfrom
serhiy-storchaka:iconv-codec
Open

gh-152997: Support system locale encodings via an iconv-based codec#153001
serhiy-storchaka wants to merge 3 commits into
python:mainfrom
serhiy-storchaka:iconv-codec

Conversation

@serhiy-storchaka

@serhiy-storchaka serhiy-storchaka commented Jul 4, 2026

Copy link
Copy Markdown
Member

On platforms providing the C library's iconv(), the codecs module now exposes every encoding known to iconv for which Python has no built-in codec — the POSIX counterpart of the Windows code-page support.

An encoding is used by its name (for example cp1133), or with an iconv: prefix (for example iconv:latin1) to force the iconv engine even when a built-in codec of the same name exists.

The engine is a last-resort search function and never shadows a built-in codec. Both directions pivot through native-endian UTF-32, so one input unit is one code point, giving error handlers the exact string position.

🤖 Generated with Claude Code

…odec

Where the C library provides iconv(), the codecs module now exposes every
encoding iconv() knows but Python has no built-in codec for -- the POSIX
counterpart of the Windows code-page support.  Use it by name (e.g.
"cp1133"), or with an "iconv:" prefix to force it over a built-in codec.

The codec is a last-resort search function and never shadows a built-in.
Both directions pivot through native-endian UTF-32, keeping one input unit
per code point so error handlers get the exact string position.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@read-the-docs-community

read-the-docs-community Bot commented Jul 4, 2026

Copy link
Copy Markdown

Documentation build overview

📚 cpython-previews | 🛠️ Build #33438415 | 📁 Comparing b0ec052 against main (8b1dbb1)

  🔍 Preview build  

3 files changed
± library/codecs.html
± whatsnew/3.16.html
± whatsnew/changelog.html

…r test

Freeze encodings._iconv_codecs, like encodings._win_cp_codecs, so the
frozen encodings package can import it during startup.  Without this,
builds where iconv is available fail to import the encodings module at
bootstrap (the module-level import runs before the filesystem stdlib is
available).

Also drop the non-portable UTF-32LE assertion from
test_encode_surrogate_pair: GNU libiconv accepts lone surrogates in
UTF-32 where glibc rejects them.  The backslashreplace check remains.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The wasm platforms ship a non-conforming iconv() that links but silently
fails to report unencodable characters, which would make the iconv codec
lossy.  A link-only configure test cannot detect this, and these targets
are cross-compiled so a runtime probe would not run either.  Treat iconv
as unavailable there; encodings without a built-in codec simply remain
unavailable, as on any platform whose iconv lacks them.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant