fix(store): preserve corrupt DB instead of deleting (#557)#620
Open
yasku wants to merge 2 commits into
Open
Conversation
The startup/query integrity check unlink()'d a project's .db (plus its -wal/-shm sidecars) on any anomaly, causing total data loss with no backup, no recovery path, and no visible warning. The trigger is often a false positive: a WAL leftover after a SIGKILL, a binary variant swap, or a stale-lock break can all make a recoverable DB look corrupt. Replace the destructive unlink with a rename to <name>.db.corrupt.<ts> and emit a prominent stderr message so the user can inspect, recover, or attach the file to a bug report. Two call sites are fixed: - mcp.c resolve_store(): the reported path, hit on first query. - pipeline.c try_incremental_or_delete_db(): only preserves on real corruption; a healthy DB rebuilt due to a mode change is still deleted so .corrupt files don't accumulate on normal reindexes. Add a regression test (resolve_store_preserves_corrupt_db_issue557) that seeds a corrupt projects row, issues a query, and asserts the original .db is gone while a .corrupt sidecar remains. Signed-off-by: yasku <aguss.cba@gmail.com>
Author
|
Working on fixing this |
CodeQL flagged a time-of-check/time-of-use race in try_incremental_or_delete_db(): the file at db_path was probed with stat() and then operated on by name (rename/unlink), so the underlying file could change between the two. Drop the redundant stat() existence probe and derive existence from a query-mode open (cbm_store_open_path_query, which never creates the file): a NULL result means absent/unreadable, so we return without touching the path. This removes the stat()->rename() dataflow CodeQL reports and matches resolve_store() in mcp.c, which already uses this pattern. Behavior is unchanged (absent -> skip, healthy -> reindex, corrupt -> preserve as .corrupt). Signed-off-by: yasku <aguss.cba@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes #557. The startup/query integrity check
unlink()'d a project's.db(plus its-wal/-shmsidecars) on any anomaly, causing total data loss with no backup, no recovery path, and no visible warning. The trigger is frequently a false positive: a WAL leftover after aSIGKILL, a binary variant swap (standard↔ui), or a stale-lock break can all make a recoverable DB look corrupt.This replaces the destructive
unlinkwith arenameto<name>.db.corrupt.<timestamp>and emits a prominentstderrmessage, matching the issue's suggested fix. The user can now inspect, recover, or attach the file to a bug report.Two call sites are fixed:
src/mcp/mcp.cresolve_store()— the reported path, hit on the first query against a project.src/pipeline/pipeline.ctry_incremental_or_delete_db()— only preserves on real corruption; a healthy DB rebuilt due to a mode change is still deleted, so.corruptfiles don't accumulate on normal reindexes.If
renamefails (permissions, cross-device), it falls back tounlinkso no orphan files are left behind.Checklist
git commit -s) — DCO verified viascripts/check-dco.shmake -f Makefile.cbm test) — 5684 passedresolve_store_preserves_corrupt_db_issue557seeds a corruptprojectsrow, issues a query, and asserts the original.dbis gone while a.corruptsidecar remains.Notes on the lint checklist item (transparency)
A few constraints worth making explicit so the maintainer isn't surprised:
make lint-cidoes not pass cleanly locally, but only on pre-existing code I did not touch. My local cppcheck is 2.21.0, which is newer/stricter than CI's and flags pre-existing findings insrc/cypher/cypher.candsrc/cli/cli.c(e.g.knownConditionTrueFalse,compareValueOutOfTypeRangeError). Running cppcheck scoped to just my three changed files exits clean (0). I deliberately did not touch those files to keep this PR focused (per CONTRIBUTING's "avoid unrelated reformatting").clang-format-20; I only haveclang-format-22locally. Both modified source files (mcp.c,pipeline.c) are whole-file clean under v22, and since the surrounding pre-existing code shows no diff under v22 either, v20 and v22 appear to agree on this file's style. If CI's v20 disagrees on any of my lines, I'm happy to reformat.strncmp(..., 19)/strncmp(..., 10)literals, matching the existing style intests/(e.g.test_cli.c,test_httpd.c,test_pipeline.c). This is fine because test sources are not inLINT_SRCS, and CI's lint step runs cppcheck + clang-format only (clang-tidy is enforced locally, not in CI).bug,stability/performance) to match cbm v0.8.1 silently deletes project DBs on "corrupt" detection — data loss with no recovery #557. Please add whatever is appropriate.