Skip to content

gh-152100: Fuse set intersection into a single charset via INVERT#153022

Open
serhiy-storchaka wants to merge 1 commit into
python:mainfrom
serhiy-storchaka:re-set-ops-invert
Open

gh-152100: Fuse set intersection into a single charset via INVERT#153022
serhiy-storchaka wants to merge 1 commit into
python:mainfrom
serhiy-storchaka:re-set-ops-invert

Conversation

@serhiy-storchaka

@serhiy-storchaka serhiy-storchaka commented Jul 4, 2026

Copy link
Copy Markdown
Member

Extend the set-operation charset fusion to intersection (&&), not just difference (--). A new charset-body opcode INVERT is the dual of NEGATE: NEGATE flips the running verdict, INVERT flips each following membership test, so [A&&B] — and chained or nested set operations — compile to a single charset instead of a positive lookbehind assertion.

A --/&& chain is a pure conjunction (the character must be in the head charset, not in any subtrahend, and in every intersection operand), so _fuse_difference generalizes to _fuse_setops by appending one more kind of fail item to the fused charset. An intersection operand must reduce to a single member and is not fused under IGNORECASE, where case folding could split it; the optimizer walk threads the flag context through groups, so a scoped (?i:...) is skipped too.

Intersection-bearing set operations match 4–6× faster (no per-character recursive lookbehind sub-match); pure difference compiles bit-for-bit as before.

Verified with a 231k-check differential over random --/&& chains (plain, (?i), (?i:...) and (?-i:...) contexts) against per-operand membership as the oracle, with no mismatches.

Extend the set-operation charset fusion to intersection (&&), not just
difference (--).  A new charset-body opcode INVERT is the dual of NEGATE:
NEGATE flips the running verdict, INVERT flips each following membership
test, so [A&&B] -- and chained or nested set operations -- compile to a
single charset instead of a positive lookbehind assertion.

A --/&& chain is a pure conjunction, so _fuse_difference generalizes to
_fuse_setops by appending one more kind of fail item.  An intersection
operand must reduce to a single member and is not fused under IGNORECASE
(case folding could split it); the optimizer walk threads the flag context
through groups, so a scoped (?i:...) is skipped too.

Intersection-bearing set operations match 4-6x faster; pure difference
compiles bit-for-bit as before.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant