Skip to content

feat: add serial crash-safe mutation recovery#142

Open
gwokhou wants to merge 1 commit into
VectifyAI:mainfrom
gwokhou:pr/serial-mutation-recovery
Open

feat: add serial crash-safe mutation recovery#142
gwokhou wants to merge 1 commit into
VectifyAI:mainfrom
gwokhou:pr/serial-mutation-recovery

Conversation

@gwokhou

@gwokhou gwokhou commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

TL;DR

This PR makes document import fail more cleanly. If openkb add is interrupted or errors halfway through, OpenKB should no longer leave half-written KB files behind; it can recover the previous stable state before accepting the next mutation.

Summary

Adds crash-safe mutation recovery for serial KB mutations without bringing back the concurrent add pipeline from #104.

This PR keeps the scope focused on the first split-out piece from the abandoned PR: journaled snapshot/rollback for add and PageIndex Cloud import, plus the atomic writer prerequisites needed for hardlink-backed rollback.

What changed

  • Added openkb/mutation.py with:

    • mutation snapshots
    • active/committed/rolled_back journals
    • rollback and recovery of interrupted mutations
    • bounded rollback retry
    • malformed journal handling
    • staged artifact publishing with atomic rename + EXDEV copy fallback
  • Wired recovery into serial mutation paths:

    • openkb add <file> converts into staging, snapshots final KB paths, publishes, compiles, then commits via registry write + journal mark
    • openkb add --from-pageindex-cloud prepares cloud data read-only, snapshots doc-specific paths, writes/compiles/registers under the same recovery contract
    • exclusive KB lock acquisition drains pending journals before mutating
  • Kept conversion serial while adding staging support:

    • convert_document(..., staging_dir=...) writes raw/source artifacts into isolated staging before commit
    • no parallel prepare hook or doc-name override path is introduced
  • Made wiki writers atomic where hardlink snapshots depend on temp+replace semantics:

    • compiler summary/concept/entity/index writes
    • lint broken-link cleanup

Explicitly out of scope

This does not reintroduce the concurrency half of #104:

  • no ThreadPoolExecutor
  • no file_processing_jobs
  • no jobs>1 directory add path
  • no parallel prepare pipeline
  • no batch doc-name reservation logic

Validation

  • UV_CACHE_DIR=/tmp/uv-cache UV_PYTHON=3.13 uv run --extra dev pytest -q

    • 898 passed, 5 warnings
  • UV_CACHE_DIR=/tmp/uv-cache UV_PYTHON=3.13 uv run --extra dev pytest tests/test_add_command.py::TestImportFromPageindexCloud tests/test_mutation.py -q

    • 25 passed
  • UV_CACHE_DIR=/tmp/uv-cache UV_PYTHON=3.13 uv run --extra dev pytest tests/test_converter.py tests/test_add_command.py tests/test_mutation.py -q

    • 72 passed
  • UV_CACHE_DIR=/tmp/uv-cache UV_PYTHON=3.13 uv run --extra dev ruff check openkb/converter.py openkb/mutation.py tests/test_mutation.py

    • passed

Notes

Full-project ruff and ty are not clean on current main; this PR fixes the new diagnostics introduced by the recovery changes but does not widen scope to existing lint/type debt.

@gwokhou gwokhou force-pushed the pr/serial-mutation-recovery branch from 5346945 to 4ca33c7 Compare June 26, 2026 10:38
@gwokhou gwokhou marked this pull request as ready for review June 26, 2026 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant