load_csv: skip blank lines, surface line number on row-length and key-column errors#49
Open
HrachShah wants to merge 2 commits into
Open
Conversation
added 2 commits
June 15, 2026 19:36
…r runs csv-diff against an empty file, csv.reader returns no rows and the previous code let StopIteration bubble out of next(fp), producing a confusing traceback at the top of the call stack with no indication that the input was empty; the new try/except translates StopIteration into a typed ValueError with a descriptive message so the CLI shows 'CSV input is empty (no header row found)' and downstream loaders / Click error handling can react to it explicitly
…-column errors
A trailing blank line in a CSV file (the kind GitHub and most editors emit
by default) crashed the diff with a bare KeyError on the key column, far
from the actual problem. csv.reader yields an empty list for a fully-blank
line, and dict(zip(headings, [])) returned {}, so the next line raised
KeyError('a') inside the keyfn lambda. Issue simonw#29.
While reading the file, track the 1-based source line number alongside
each row and use it to surface clear, line-numbered ValueError messages
for the other two error paths that previously leaked as tracebacks:
- rows with fewer fields than the header (the previous list-comprehension
silently accepted them and produced dicts with missing keys, then
crashed inside the keyfn lambda or in compare()'s next(iter(...)));
- a --key column that isn't in the header (was also a KeyError, this
time on r[key] inside keyfn).
Trailing and interior blank lines (csv.reader yields [] for them) are
now silently skipped, matching the POSIX text-file convention and the
behaviour of most other CSV tools. Rows with more fields than the header
are still accepted, since the trailing-comma pattern is a real-world way
to express an empty last column; the FIVE test fixture exercises that.
Added five tests in tests/test_csv_diff.py:
- test_trailing_blank_line_ignored (issue simonw#29 reproducer)
- test_interior_blank_line_ignored
- test_trailing_blank_line_with_no_key
- test_mismatched_row_length_raises_clear_error
- test_missing_key_column_raises_clear_error
All 29 tests pass (5 new + 24 existing).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #29.
A trailing blank line in a CSV file (the kind GitHub and most editors emit by
default) crashed the diff with a bare
KeyErroron the key column, far from theactual problem.
csv.readeryields an empty list for a fully-blank line, anddict(zip(headings, []))returned{}, so the next line raisedKeyError('a')inside the keyfn lambda. Issue #29.
While reading the file, track the 1-based source line number alongside each row
and use it to surface clear, line-numbered
ValueErrormessages for the othertwo error paths that previously leaked as tracebacks:
silently accepted them and produced dicts with missing keys, then crashed
inside the keyfn lambda or in
compare()'snext(iter(...)));--keycolumn that isn't in the header (was also aKeyError, this timeon
r[key]inside keyfn).Trailing and interior blank lines (
csv.readeryields[]for them) are nowsilently skipped, matching the POSIX text-file convention and the behaviour of
most other CSV tools. Rows with more fields than the header are still accepted,
since the trailing-comma pattern is a real-world way to express an empty last
column; the
FIVEtest fixture exercises that.Added five tests in
tests/test_csv_diff.py:test_trailing_blank_line_ignored(issue BUG: new line at end of file causes crash #29 reproducer)test_interior_blank_line_ignoredtest_trailing_blank_line_with_no_keytest_mismatched_row_length_raises_clear_errortest_missing_key_column_raises_clear_errorAll 29 tests pass (5 new + 24 existing).