Don't extract suffix-shaped parenthesized/quoted content as nicknames#189
Merged
Conversation
Includes the corrected scope-boundary note discovered during Task 2 implementation: parse_full_name's suffix detection only recognizes a trailing run of suffix-shaped pieces, so mid-name content freed from parens/quotes doesn't always land in suffix -- documented as a known, pre-existing limitation rather than worked around.
Addresses code-review minor findings on the parse_nicknames rewrite: comments explaining why is_suffix() isn't reused directly and why handle_match is shared across all three delimiter regexes, plus test coverage for suffix-shaped content in single/double quotes (previously only parenthesis was covered).
Python's implicit string-literal concatenation silently merged 'msc' and 'mscmsm' into a single bogus 'mscmscmsm' entry in SUFFIX_ACRONYMS, dropping both real entries. Caught during code review of an unrelated change; unrelated to issue #111 but small and low-risk enough to fix alongside it.
A parse-behavior test on two specific acronyms doesn't generalize to catch other missing-comma bugs in SUFFIX_ACRONYMS; it only documents this one instance. Not the right way to guard against this class of bug.
Design specs and implementation plans are working notes, not published documentation. Keep them on disk locally but stop tracking them in git.
…ACRONYMS Easier to find right after SUFFIX_NOT_ACRONYMS instead of scrolling past the large acronym list.
…knames handle_match() checked stripped (lc()-only, leading/trailing periods stripped) against suffix_acronyms, unlike is_suffix() which also strips internal periods. This meant an acronym suffix like "M.D" (periods between letters, no trailing period) fell through to nickname_list instead of being recognized as a suffix. Found during PR review by the pr-test-analyzer subagent. Also adds the release_log.rst entries for this fix, per CLAUDE.md's maintenance requirement, flagged by the code-reviewer subagent.
…ACRONYMS Without this, a future maintainer diffing suffixes.py in isolation could read the (ret)/(vet) removal as an accidental regression and re-add the parenthesized form, silently reintroducing the #111 bug. Flagged by the comment-analyzer subagent during PR review.
nameparser/config/__init__.py's suffix_acronyms_ambiguous attribute (import, docstring, type hint, constructor param, and __init__ assignment -- all from Task 1, commit 135395b) was accidentally reverted by commit c2efbd8. Root cause: a review subagent was running concurrently in this same shared worktree directory and appears to have checked out an older commit in place (rather than in an isolated copy) to compare pre/post-PR behavior for its "verified against unmodified master" claims; that checkout's staged state got swept into c2efbd8's commit alongside my own staged changes, since `git commit` includes everything staged, not just newly `git add`ed paths. The working tree itself was never wrong -- parser.py's handle_match() has referenced self.C.suffix_acronyms_ambiguous correctly this whole time, and pytest passed after every commit since the corruption, because pytest reads the filesystem, not git history. Only the *committed* nameparser/config/__init__.py (and, transiently, the already-pushed PR branch) lacked the attribute -- a fresh clone of c2efbd8..4e91b7a would have raised AttributeError on any name containing parens/quotes. Verified via `git show HEAD:...` and `git diff 135395b HEAD -- nameparser/config/__init__.py` that this commit is the only remaining gap; suffixes.py, parser.py, and the tests were independently re-added correctly in later commits. Also includes docs/usage.rst and docs/customize.rst updates documenting the new suffix_acronyms_ambiguous behavior and SUFFIX_ACRONYMS_AMBIGUOUS constant.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
parse_nicknames()used to extract everything inside parens/quotes as a nickname unconditionally, so suffix-like content such as(Ret),(Jr.),(MBA)was wrongly classified as a nickname instead of a suffix (Check for known suffixes when processing nicknames #111).SUFFIX_ACRONYMS_AMBIGUOUS(ed,jd) as a small standalone exception list for acronym suffixes that also plausibly collide with a common given-name nickname, wired intoConstantsas a plainSetManagerattribute. ExistingSUFFIX_ACRONYMS/suffix_acronymsare otherwise unchanged (no API break).'(ret)'/'(vet)'out ofSUFFIX_ACRONYMSintoSUFFIX_NOT_ACRONYMSas bare'ret'/'vet'— those literal-paren entries were a Wikipedia-import artifact and effectively dead code under the old unconditional-extraction behavior.parse_nicknames()to use a per-match callback: suffix-shaped or period-terminated content is left in place (undelimited) for normal downstream parsing instead of being routed tonickname_list.JEFFREY (JD) BRICKENstill parsesJDas a nickname by default (regression guard), sincejdis in the ambiguous set.'msc'and'mscmsm'inSUFFIX_ACRONYMScaused Python's implicit string-literal concatenation to silently merge them into a bogus'mscmscmsm'entry.Known, documented limitation (not fixed, out of scope):
parse_full_name's no-comma suffix detection only recognizes a trailing run of suffix-shaped pieces. A suffix-shaped word freed from parens by this fix only lands insuffixwhen it's already in a trailing/comma position in the source string — e.g."Andrew Perkins (MBA)"→ suffixMBA, but"Lon (Jr.) Williams"→Jr.lands inmiddle, identical to how the bare string"Lon Jr. Williams"already parses onmaster. This fix's guarantee is that the delimited and undelimited forms of a name now parse identically — not that a freed suffix always reachessuffix_listregardless of position.Test plan
pytest -q— 978 passed, 4 skipped, 22 xfailed, no failuresJDstays a nickname by defaultsuffix_acronyms_ambiguousis customizable (adding/removing entries flips classification)Closes #111