Skip to content

fix(pipeline): preserve ADRs across re-indexing (#516)#623

Open
yasku wants to merge 4 commits into
DeusData:mainfrom
yasku:fix/516-preserve-adr
Open

fix(pipeline): preserve ADRs across re-indexing (#516)#623
yasku wants to merge 4 commits into
DeusData:mainfrom
yasku:fix/516-preserve-adr

Conversation

@yasku

@yasku yasku commented Jun 25, 2026

Copy link
Copy Markdown

What does this PR do?

Fixes #516manage_adr ADRs are silently lost on re-index.

manage_adr stores ADRs in the project_summaries SQLite table (the store backend unified in #256). The custom B-tree writer (internal/cbm/sqlite_writer.c) serializes a fresh database with an empty project_summaries btree on every dump, so any re-index that rewrites the DB silently wipes the stored ADR — manage_adr(mode="get") then returns no_adr.

Root cause — the loss happens at two dump sites, not one

The issue report (and its confirming comment) identified only the full-reindex path. Tracing the pipeline showed the incremental path loses the ADR too:

Path Function ADR?
Incremental, no file changes early-return, no dump ✅ survives
Incremental, a file changed/added dump_and_persist (pipeline_incremental.c) ❌ wiped
Full re-index dump_and_persist_hashes (pipeline.c) ❌ wiped

Both dump functions already re-persist file_hashes after the rewrite but had no equivalent step for project_summaries — that asymmetry is the bug. Fixing only the full path (as the report suggested) would have left the bug live on any single-file edit.

The fix — mirror the existing file_hashes preservation

  • Capture the ADR from the old DB before it is rewritten, in try_incremental_or_delete_db() — the single choke point that opens the old DB for both paths.
  • Carry it on the pipeline (carried_adr, exposed to the incremental translation unit via a cbm_pipeline_carried_adr() accessor, matching the project's existing opaque-struct accessor pattern e.g. cbm_pipeline_repo_path()).
  • Re-insert it with cbm_store_adr_store() after each dump, right next to the file_hashes restore.

Behavior is otherwise unchanged (absent DB → skip, healthy → reindex, corrupt → preserved as .corrupt). Memory: carried_adr is strdup'd, freed once in cbm_pipeline_free, NULL-initialized via the existing calloc.

How it was researched (provenance)

The root cause and fix shape were grounded in the repo's own history and source, not assumed:

  • manage_adr (MCP/CLI) and the UI /api/adr endpoints use separate storage backends #256 (manage_adr storage) — closed by commit 70e3ed7 "fix(adr): unify manage_adr onto the SQLite store shared with the UI". Confirms the current backend is the SQLite project_summaries table (cbm_store_adr_get/store), i.e. the right place to fix.
  • internal/cbm/sqlite_writer.c — the writer emits project_summaries as an empty btree by design (// Autoindex for project_summaries — empty (0 rows)), confirming the wipe is structural, not incidental.
  • file_hashes preservation (pipeline.c / pipeline_incremental.c) — the established precedent this fix mirrors: capture-before-rewrite, re-insert-after-reopen.
  • manage_adr get path: use-after-free causes MCP client to hang indefinitely #125 (prior manage_adr use-after-free, closed) — reviewed to confirm the ADR code path is sensitive; this PR does not touch mcp.c ADR handling.
  • Forward-compat note: Make the tool import existing ADR docs in a repository #504 / Support Markdown Architectural Decision Records #507 propose multiple/MADR ADRs. The current schema is one row per project (project TEXT PRIMARY KEY); this fix matches that. If multi-ADR lands later, the carry generalizes to all project_summaries rows.
  • SQLite architecture check: SQLite's native cross-DB copy mechanisms (Online Backup API, ATTACH + INSERT … SELECT) require both sides to be engine-managed databases. The dump here is a hand-serialized B-tree file (sqlite_writer.c), so those native mechanisms do not apply to it — the correct approach is the app-level capture/re-insert via the public store API, which is exactly the existing file_hashes pattern.

Checklist

  • Every commit is signed off (git commit -s) — DCO verified locally via scripts/check-dco.sh
  • Tests pass locally (make -f Makefile.cbm test) — 5687 passed, 0 failures
  • Lint passes — clang-format clean; cppcheck (CI suppress flags) exits 0 on all changed files
  • New behavior is covered by a test (reproduce-first) — see below

Tests added

Both regression tests live in tests/test_pipeline.c and assert the ADR is retrievable after a re-index (store an ADR, re-index, cbm_store_adr_get must still return it):

  • pipeline_full_reindex_preserves_adr_issue516 — adds files to force the full dump path.
  • pipeline_incremental_reindex_preserves_adr_issue516 — modifies one file (same file count) to force the incremental dump path.

Also verified end-to-end via the CLI: index → manage_adr update → change a source file → re-index → manage_adr get returns the stored ADR (returned no_adr before the fix).

Validation

All checks were run locally and pass:

Check Command Result
Full test suite make -f Makefile.cbm test ✅ 5687 passed
Build (prod) make -f Makefile.cbm cbm ✅ no warnings (-Werror)
Formatting clang-format (changed files) ✅ clean
Static analysis cppcheck w/ CI suppress flags (changed files) ✅ exit 0
No-skips policy scripts/check-no-test-skips.sh ✅ OK
Security audit scripts/security-audit.sh ✅ passed
DCO scripts/check-dco.sh ✅ signed

Test environment

Machine Apple M1, 8-core (4P + 4E), 8 GB RAM
OS macOS 26.5 (build 25F71), arm64 (Darwin 25.5.0)
Compiler Apple clang 21.0.0 (clang-2100.1.1.101)
clang-format 22.1.8
cppcheck 2.21.0

Note: the one unreadVariable finding cppcheck surfaced was a pre-existing unused rc assignment in the unrelated usages_creates_edges test that the larger file exposed; this PR drops that dead assignment so the changed file stays lint-clean. No other unrelated code is touched.

Scope

Four files, focused on #516:

  • src/pipeline/pipeline.c — struct field, accessor, capture, full-path restore.
  • src/pipeline/pipeline_incremental.c — incremental-path restore.
  • src/pipeline/pipeline_internal.h — accessor declaration.
  • tests/test_pipeline.c — two regression tests (+ the dead-assignment cleanup noted above).

yasku added 4 commits June 24, 2026 22:30
The startup/query integrity check unlink()'d a project's .db (plus its
-wal/-shm sidecars) on any anomaly, causing total data loss with no
backup, no recovery path, and no visible warning. The trigger is often a
false positive: a WAL leftover after a SIGKILL, a binary variant swap, or
a stale-lock break can all make a recoverable DB look corrupt.

Replace the destructive unlink with a rename to <name>.db.corrupt.<ts>
and emit a prominent stderr message so the user can inspect, recover, or
attach the file to a bug report. Two call sites are fixed:

- mcp.c resolve_store(): the reported path, hit on first query.
- pipeline.c try_incremental_or_delete_db(): only preserves on real
  corruption; a healthy DB rebuilt due to a mode change is still deleted
  so .corrupt files don't accumulate on normal reindexes.

Add a regression test (resolve_store_preserves_corrupt_db_issue557) that
seeds a corrupt projects row, issues a query, and asserts the original
.db is gone while a .corrupt sidecar remains.

Signed-off-by: yasku <aguss.cba@gmail.com>
CodeQL flagged a time-of-check/time-of-use race in
try_incremental_or_delete_db(): the file at db_path was probed with
stat() and then operated on by name (rename/unlink), so the underlying
file could change between the two.

Drop the redundant stat() existence probe and derive existence from a
query-mode open (cbm_store_open_path_query, which never creates the
file): a NULL result means absent/unreadable, so we return without
touching the path. This removes the stat()->rename() dataflow CodeQL
reports and matches resolve_store() in mcp.c, which already uses this
pattern. Behavior is unchanged (absent -> skip, healthy -> reindex,
corrupt -> preserve as .corrupt).

Signed-off-by: yasku <aguss.cba@gmail.com>
Adds pipeline_reindex_preserves_corrupt_db_issue557: indexes a temp repo,
corrupts the resulting DB (numeric root_path trips the integrity check),
re-indexes, and asserts the DB is preserved as <db>.corrupt.<ts> rather
than deleted. Guards both the DeusData#557 data-loss fix and the TOCTOU refactor
of try_incremental_or_delete_db (existence via query-mode open, not stat).

Signed-off-by: yasku <aguss.cba@gmail.com>
manage_adr stores ADRs in the project_summaries table (SQLite store,
unified in DeusData#256). Both dump paths serialize a fresh DB with an empty
project_summaries — the custom B-tree writer writes an empty summaries
btree — so any re-index that rewrites the DB silently wipes the ADR.

The data loss occurs at TWO sites, not one (the issue report identified
only the full path):
  - dump_and_persist_hashes (pipeline.c) — full re-index.
  - dump_and_persist (pipeline_incremental.c) — incremental re-index when
    a file changed/was added (the no-change incremental early-returns
    without a dump, so the ADR survives there).

Mirror the existing file_hashes preservation: capture the ADR from the
old DB before it is rewritten (in try_incremental_or_delete_db, the single
choke point that opens the old DB for both paths), carry it on the
pipeline, and re-insert it after each dump via cbm_store_adr_store.

Adds regression tests for both the full and incremental paths. Also drops
an unused rc assignment in the pre-existing usages_creates_edges test that
the larger file surfaced under cppcheck (unreadVariable).

Signed-off-by: yasku <aguss.cba@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

manage_adr data loss: ADRs in project_summaries are deleted during graph re-indexing

1 participant