gh-152997: Support system locale encodings via an iconv-based codec#153001
Open
serhiy-storchaka wants to merge 3 commits into
Open
gh-152997: Support system locale encodings via an iconv-based codec#153001serhiy-storchaka wants to merge 3 commits into
serhiy-storchaka wants to merge 3 commits into
Conversation
…odec Where the C library provides iconv(), the codecs module now exposes every encoding iconv() knows but Python has no built-in codec for -- the POSIX counterpart of the Windows code-page support. Use it by name (e.g. "cp1133"), or with an "iconv:" prefix to force it over a built-in codec. The codec is a last-resort search function and never shadows a built-in. Both directions pivot through native-endian UTF-32, keeping one input unit per code point so error handlers get the exact string position. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Documentation build overview
|
…r test Freeze encodings._iconv_codecs, like encodings._win_cp_codecs, so the frozen encodings package can import it during startup. Without this, builds where iconv is available fail to import the encodings module at bootstrap (the module-level import runs before the filesystem stdlib is available). Also drop the non-portable UTF-32LE assertion from test_encode_surrogate_pair: GNU libiconv accepts lone surrogates in UTF-32 where glibc rejects them. The backslashreplace check remains. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The wasm platforms ship a non-conforming iconv() that links but silently fails to report unencodable characters, which would make the iconv codec lossy. A link-only configure test cannot detect this, and these targets are cross-compiled so a runtime probe would not run either. Treat iconv as unavailable there; encodings without a built-in codec simply remain unavailable, as on any platform whose iconv lacks them. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On platforms providing the C library's
iconv(), thecodecsmodule now exposes every encoding known toiconvfor which Python has no built-in codec — the POSIX counterpart of the Windows code-page support.An encoding is used by its name (for example
cp1133), or with aniconv:prefix (for exampleiconv:latin1) to force the iconv engine even when a built-in codec of the same name exists.The engine is a last-resort search function and never shadows a built-in codec. Both directions pivot through native-endian UTF-32, so one input unit is one code point, giving error handlers the exact string position.
🤖 Generated with Claude Code