gh-152100: Fuse set intersection into a single charset via INVERT#153022
Open
serhiy-storchaka wants to merge 1 commit into
Open
gh-152100: Fuse set intersection into a single charset via INVERT#153022serhiy-storchaka wants to merge 1 commit into
serhiy-storchaka wants to merge 1 commit into
Conversation
Extend the set-operation charset fusion to intersection (&&), not just difference (--). A new charset-body opcode INVERT is the dual of NEGATE: NEGATE flips the running verdict, INVERT flips each following membership test, so [A&&B] -- and chained or nested set operations -- compile to a single charset instead of a positive lookbehind assertion. A --/&& chain is a pure conjunction, so _fuse_difference generalizes to _fuse_setops by appending one more kind of fail item. An intersection operand must reduce to a single member and is not fused under IGNORECASE (case folding could split it); the optimizer walk threads the flag context through groups, so a scoped (?i:...) is skipped too. Intersection-bearing set operations match 4-6x faster; pure difference compiles bit-for-bit as before. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Extend the set-operation charset fusion to intersection (
&&), not just difference (--). A new charset-body opcode INVERT is the dual of NEGATE: NEGATE flips the running verdict, INVERT flips each following membership test, so[A&&B]— and chained or nested set operations — compile to a single charset instead of a positive lookbehind assertion.A
--/&&chain is a pure conjunction (the character must be in the head charset, not in any subtrahend, and in every intersection operand), so_fuse_differencegeneralizes to_fuse_setopsby appending one more kind of fail item to the fused charset. An intersection operand must reduce to a single member and is not fused under IGNORECASE, where case folding could split it; the optimizer walk threads the flag context through groups, so a scoped(?i:...)is skipped too.Intersection-bearing set operations match 4–6× faster (no per-character recursive lookbehind sub-match); pure difference compiles bit-for-bit as before.
Verified with a 231k-check differential over random
--/&&chains (plain,(?i),(?i:...)and(?-i:...)contexts) against per-operand membership as the oracle, with no mismatches.