Skip to content

Don't extract suffix-shaped parenthesized/quoted content as nicknames#189

Merged
derek73 merged 11 commits into
masterfrom
worktree-suffix-in-parens
Jul 1, 2026
Merged

Don't extract suffix-shaped parenthesized/quoted content as nicknames#189
derek73 merged 11 commits into
masterfrom
worktree-suffix-in-parens

Conversation

@derek73

@derek73 derek73 commented Jul 1, 2026

Copy link
Copy Markdown
Owner

Summary

  • parse_nicknames() used to extract everything inside parens/quotes as a nickname unconditionally, so suffix-like content such as (Ret), (Jr.), (MBA) was wrongly classified as a nickname instead of a suffix (Check for known suffixes when processing nicknames #111).
  • Added SUFFIX_ACRONYMS_AMBIGUOUS (ed, jd) as a small standalone exception list for acronym suffixes that also plausibly collide with a common given-name nickname, wired into Constants as a plain SetManager attribute. Existing SUFFIX_ACRONYMS/suffix_acronyms are otherwise unchanged (no API break).
  • Moved '(ret)'/'(vet)' out of SUFFIX_ACRONYMS into SUFFIX_NOT_ACRONYMS as bare 'ret'/'vet' — those literal-paren entries were a Wikipedia-import artifact and effectively dead code under the old unconditional-extraction behavior.
  • Rewrote parse_nicknames() to use a per-match callback: suffix-shaped or period-terminated content is left in place (undelimited) for normal downstream parsing instead of being routed to nickname_list. JEFFREY (JD) BRICKEN still parses JD as a nickname by default (regression guard), since jd is in the ambiguous set.
  • Also fixed an unrelated, pre-existing bug caught during review: a missing comma between 'msc' and 'mscmsm' in SUFFIX_ACRONYMS caused Python's implicit string-literal concatenation to silently merge them into a bogus 'mscmscmsm' entry.

Known, documented limitation (not fixed, out of scope): parse_full_name's no-comma suffix detection only recognizes a trailing run of suffix-shaped pieces. A suffix-shaped word freed from parens by this fix only lands in suffix when it's already in a trailing/comma position in the source string — e.g. "Andrew Perkins (MBA)" → suffix MBA, but "Lon (Jr.) Williams"Jr. lands in middle, identical to how the bare string "Lon Jr. Williams" already parses on master. This fix's guarantee is that the delimited and undelimited forms of a name now parse identically — not that a freed suffix always reaches suffix_list regardless of position.

Test plan

  • pytest -q — 978 passed, 4 skipped, 22 xfailed, no failures
  • New regression/behavior tests for suffix-shaped content in parens, single quotes, and double quotes
  • Regression test confirming JD stays a nickname by default
  • Behavior tests confirming suffix_acronyms_ambiguous is customizable (adding/removing entries flips classification)

Closes #111

derek73 added 8 commits June 30, 2026 23:32
Includes the corrected scope-boundary note discovered during Task 2
implementation: parse_full_name's suffix detection only recognizes a
trailing run of suffix-shaped pieces, so mid-name content freed from
parens/quotes doesn't always land in suffix -- documented as a known,
pre-existing limitation rather than worked around.
Addresses code-review minor findings on the parse_nicknames rewrite:
comments explaining why is_suffix() isn't reused directly and why
handle_match is shared across all three delimiter regexes, plus test
coverage for suffix-shaped content in single/double quotes (previously
only parenthesis was covered).
Python's implicit string-literal concatenation silently merged
'msc' and 'mscmsm' into a single bogus 'mscmscmsm' entry in
SUFFIX_ACRONYMS, dropping both real entries. Caught during code
review of an unrelated change; unrelated to issue #111 but small
and low-risk enough to fix alongside it.
A parse-behavior test on two specific acronyms doesn't generalize to
catch other missing-comma bugs in SUFFIX_ACRONYMS; it only documents
this one instance. Not the right way to guard against this class of bug.
Design specs and implementation plans are working notes, not
published documentation. Keep them on disk locally but stop
tracking them in git.
…ACRONYMS

Easier to find right after SUFFIX_NOT_ACRONYMS instead of scrolling
past the large acronym list.
@derek73 derek73 self-assigned this Jul 1, 2026
@derek73 derek73 added this to the v1.3.0 milestone Jul 1, 2026
derek73 added 3 commits July 1, 2026 00:13
…knames

handle_match() checked stripped (lc()-only, leading/trailing periods
stripped) against suffix_acronyms, unlike is_suffix() which also
strips internal periods. This meant an acronym suffix like "M.D"
(periods between letters, no trailing period) fell through to
nickname_list instead of being recognized as a suffix. Found during
PR review by the pr-test-analyzer subagent.

Also adds the release_log.rst entries for this fix, per CLAUDE.md's
maintenance requirement, flagged by the code-reviewer subagent.
…ACRONYMS

Without this, a future maintainer diffing suffixes.py in isolation
could read the (ret)/(vet) removal as an accidental regression and
re-add the parenthesized form, silently reintroducing the #111 bug.
Flagged by the comment-analyzer subagent during PR review.
nameparser/config/__init__.py's suffix_acronyms_ambiguous attribute
(import, docstring, type hint, constructor param, and __init__
assignment -- all from Task 1, commit 135395b) was accidentally
reverted by commit c2efbd8. Root cause: a review subagent was
running concurrently in this same shared worktree directory and
appears to have checked out an older commit in place (rather than in
an isolated copy) to compare pre/post-PR behavior for its "verified
against unmodified master" claims; that checkout's staged state got
swept into c2efbd8's commit alongside my own staged changes, since
`git commit` includes everything staged, not just newly `git add`ed
paths.

The working tree itself was never wrong -- parser.py's handle_match()
has referenced self.C.suffix_acronyms_ambiguous correctly this whole
time, and pytest passed after every commit since the corruption,
because pytest reads the filesystem, not git history. Only the
*committed* nameparser/config/__init__.py (and, transiently, the
already-pushed PR branch) lacked the attribute -- a fresh clone of
c2efbd8..4e91b7a would have raised AttributeError on any name
containing parens/quotes. Verified via `git show HEAD:...` and
`git diff 135395b HEAD -- nameparser/config/__init__.py` that this
commit is the only remaining gap; suffixes.py, parser.py, and the
tests were independently re-added correctly in later commits.

Also includes docs/usage.rst and docs/customize.rst updates
documenting the new suffix_acronyms_ambiguous behavior and
SUFFIX_ACRONYMS_AMBIGUOUS constant.
@derek73 derek73 merged commit bb02739 into master Jul 1, 2026
8 checks passed
@derek73 derek73 deleted the worktree-suffix-in-parens branch July 1, 2026 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check for known suffixes when processing nicknames

1 participant