Skip to content

gh-95555: Allow a negated property as a character set member#152245

Merged
serhiy-storchaka merged 1 commit into
python:mainfrom
serhiy-storchaka:re-properties-negset
Jun 26, 2026
Merged

gh-95555: Allow a negated property as a character set member#152245
serhiy-storchaka merged 1 commit into
python:mainfrom
serhiy-storchaka:re-properties-negset

Conversation

@serhiy-storchaka

Copy link
Copy Markdown
Member

Follow-up to gh-95555 (#151969).

A negated multi-range property such as \P{ASCII} or \P{Pattern_Syntax} was rejected inside a character class. Such a property compiles to a complemented charset, so it could not simply be flattened into the member union. It is now set aside and alternated in with the other members:

  • [\P{ASCII}] alone is just the negated charset;
  • [\P{ASCII}abc] becomes [abc] | [^ASCII].

Leading ^ (De Morgan) and double negation ([^\P{ASCII}] = ASCII) are handled by the existing set-complement code, and it composes with the set operations: [\w--\P{ASCII}] is the ASCII word characters.

Category-backed \P{...} (e.g. \P{Lu}, \P{digit}), which is a single CATEGORY opcode, always composed fine; only the multi-range properties were affected.

No news entry: \p{...} is unreleased (.. versionadded:: next), so the lifted restriction never shipped.

A negated multi-range property such as \P{ASCII} or \P{Pattern_Syntax} was
rejected inside a character class.  Such members are now alternated in with
the other members: [\P{ASCII}abc] becomes [abc] | [^ASCII], and [\P{ASCII}]
alone is just the negated charset.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@serhiy-storchaka serhiy-storchaka merged commit 8eb6fb0 into python:main Jun 26, 2026
55 checks passed
@serhiy-storchaka serhiy-storchaka deleted the re-properties-negset branch June 26, 2026 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant