Skip to content

load_csv: skip blank lines, surface line number on row-length and key-column errors#49

Open
HrachShah wants to merge 2 commits into
simonw:mainfrom
HrachShah:fix/empty-trailing-line-and-mismatched-row-length
Open

load_csv: skip blank lines, surface line number on row-length and key-column errors#49
HrachShah wants to merge 2 commits into
simonw:mainfrom
HrachShah:fix/empty-trailing-line-and-mismatched-row-length

Conversation

@HrachShah

Copy link
Copy Markdown

Fixes #29.

A trailing blank line in a CSV file (the kind GitHub and most editors emit by
default) crashed the diff with a bare KeyError on the key column, far from the
actual problem. csv.reader yields an empty list for a fully-blank line, and
dict(zip(headings, [])) returned {}, so the next line raised KeyError('a')
inside the keyfn lambda. Issue #29.

While reading the file, track the 1-based source line number alongside each row
and use it to surface clear, line-numbered ValueError messages for the other
two error paths that previously leaked as tracebacks:

  • rows with fewer fields than the header (the previous list-comprehension
    silently accepted them and produced dicts with missing keys, then crashed
    inside the keyfn lambda or in compare()'s next(iter(...)));
  • a --key column that isn't in the header (was also a KeyError, this time
    on r[key] inside keyfn).

Trailing and interior blank lines (csv.reader yields [] for them) are now
silently skipped, matching the POSIX text-file convention and the behaviour of
most other CSV tools. Rows with more fields than the header are still accepted,
since the trailing-comma pattern is a real-world way to express an empty last
column; the FIVE test fixture exercises that.

Added five tests in tests/test_csv_diff.py:

  • test_trailing_blank_line_ignored (issue BUG: new line at end of file causes crash #29 reproducer)
  • test_interior_blank_line_ignored
  • test_trailing_blank_line_with_no_key
  • test_mismatched_row_length_raises_clear_error
  • test_missing_key_column_raises_clear_error

All 29 tests pass (5 new + 24 existing).

Zo Bot added 2 commits June 15, 2026 19:36
…r runs csv-diff against an empty file, csv.reader returns no rows and the previous code let StopIteration bubble out of next(fp), producing a confusing traceback at the top of the call stack with no indication that the input was empty; the new try/except translates StopIteration into a typed ValueError with a descriptive message so the CLI shows 'CSV input is empty (no header row found)' and downstream loaders / Click error handling can react to it explicitly
…-column errors

A trailing blank line in a CSV file (the kind GitHub and most editors emit
by default) crashed the diff with a bare KeyError on the key column, far
from the actual problem. csv.reader yields an empty list for a fully-blank
line, and dict(zip(headings, [])) returned {}, so the next line raised
KeyError('a') inside the keyfn lambda. Issue simonw#29.

While reading the file, track the 1-based source line number alongside
each row and use it to surface clear, line-numbered ValueError messages
for the other two error paths that previously leaked as tracebacks:

  - rows with fewer fields than the header (the previous list-comprehension
    silently accepted them and produced dicts with missing keys, then
    crashed inside the keyfn lambda or in compare()'s next(iter(...)));
  - a --key column that isn't in the header (was also a KeyError, this
    time on r[key] inside keyfn).

Trailing and interior blank lines (csv.reader yields [] for them) are
now silently skipped, matching the POSIX text-file convention and the
behaviour of most other CSV tools. Rows with more fields than the header
are still accepted, since the trailing-comma pattern is a real-world way
to express an empty last column; the FIVE test fixture exercises that.

Added five tests in tests/test_csv_diff.py:
  - test_trailing_blank_line_ignored (issue simonw#29 reproducer)
  - test_interior_blank_line_ignored
  - test_trailing_blank_line_with_no_key
  - test_mismatched_row_length_raises_clear_error
  - test_missing_key_column_raises_clear_error

All 29 tests pass (5 new + 24 existing).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: new line at end of file causes crash

1 participant