feat: MCP server refactor with fine-grained tools and incremental update support by mambo-wang · Pull Request #66 · FSoft-AI4Code/CodeWiki

mambo-wang · 2026-06-20T13:07:52Z

Summary

Refactor MCP server into fine-grained, IDE-driven tools (analyze_repo, read_code_components, view_repo_file, write_doc_file, edit_doc_file, save_module_tree, get_processing_order, get_prompt, close_session) alongside legacy tools, enabling zero-LLM-config documentation workflows
Add incremental update support (--update / commit_id) to MCP analyze_repo, so only changed modules are re-analyzed
Fix commit_id passthrough to metadata.json in CLI mode for --update support
Add missing MCP dependencies to pyproject.toml

Test plan

Verify MCP server starts via python -m codewiki.mcp.server and all new tools are listed
Run analyze_repo tool, confirm session is created and dependency graph is built
Run analyze_repo again with commit_id to verify incremental update works correctly
Verify CLI --update flag passes commit_id to metadata.json as expected
Test legacy tools (generate_docs, get_module_tree) still function

🤖 Generated with [Qoder][https://qoder.com]

- Add IDE_DRIVEN_GUIDE.md with complete walkthrough for using CodeWiki with AI IDEs (CodeBuddy, Cursor, Claude Desktop) via MCP - Update README with IDE-Driven Mode section and navigation link

… to README

- 8 个模块文档（Agent 工具、CLI 工具、CLI 核心、MCP 服务、依赖分析器、共享配置、前端服务、后端核心） - 仓库总览 overview.md - 模块聚类树 module_tree.json - 全部文档含 Mermaid 架构图，语法校验通过

- Add _detect_changes() with git diff + mtime dual-strategy detection - Add _find_affected_modules() to map changed files to affected modules - analyze_repo now returns a 'changes' field with affected/cascade modules - Decouple codewiki/__init__.py from CLI imports for lightweight MCP startup - Update skill and IDE_DRIVEN_GUIDE.md with incremental update docs

Previously, CLIDocumentationGenerator never received or forwarded the git commit SHA, so metadata.json always had commit_id: null. This made --update fall back to full regeneration every time. Now the commit hash is obtained before generator creation and threaded through to the backend DocumentationGenerator, matching the behavior already present in Web mode (background_worker.py).

Square logo for GitHub repo avatar and wide banner for README header. Design follows the blue-purple-green gradient palette from the original CodeWiki framework diagram, with a red CN badge for branding.

Square logo corners and banner surrounding area are now transparent instead of white/light-gray, suitable for any background color.

Remove transparent padding around the rounded rectangle by filling corners with a matching dark navy gradient, making the banner a solid rectangle.

- Add _detect_changes() with git diff + mtime dual-strategy detection - Add _find_affected_modules() to map changed files to affected modules - analyze_repo now returns a 'changes' field with affected/cascade modules - Decouple codewiki/__init__.py from CLI imports for lightweight MCP startup - Update skill and IDE_DRIVEN_GUIDE.md with incremental update docs

Previously, CLIDocumentationGenerator never received or forwarded the git commit SHA, so metadata.json always had commit_id: null. This made --update fall back to full regeneration every time. Now the commit hash is obtained before generator creation and threaded through to the backend DocumentationGenerator, matching the behavior already present in Web mode (background_worker.py).

- Reduce component_index from 500 to 100 items per page (max 200), drop depends_on from each entry (available via read_code_components) - Add offset/limit params to analyze_repo for pagination - Add list_components tool for browsing components without re-analysis - Reduce leaf_nodes from 100 to 50 - Remove IDE rule files, consolidate into skill files

… hang - Wrap synchronous tool handlers in asyncio.to_thread() to avoid blocking the event loop (analyze_repo excluded — Tree-sitter C extensions are not thread-safe) - Disable mermaid-py validation by default (set MERMAID_VALIDATE=1 to enable), add 15s timeout to prevent indefinite hangs

- Fix shell injection in view_repo_file: replace shell=True subprocess with pathlib iteration - Add path traversal guards in view_repo_file, write_doc_file, edit_doc_file (reject paths escaping repo/output dir) - Add threading.Lock to SessionStore for concurrent access safety - Cap max sessions to 10, evict oldest when full - Cap read_code_components to 50 IDs per call - Cap edit history to 20 entries per file - Store edit history as native dict instead of JSON string - Fix undo to run Mermaid validation after reverting content - Fix Mermaid validation to report "skipped" instead of false success - Fix Mermaid timeout to warn instead of silent pass - Fix pagination hint to point to list_components instead of analyze_repo - Clamp offset to non-negative in _build_component_index - Add smoke test covering all critical paths (25 assertions)

- Reduce _MAX_RESPONSE_LEN 32000→24000, _MAX_COMPONENTS_PER_CALL 50→20 - Add per-component source truncation at 8000 chars - Write metadata.json (git commit_id + timestamp) on close_session to enable incremental update detection on next analyze_repo - Update smoke test assertions to match new caps

- Rewrite MCP 服务.md: add list_components tool, thread-safe SessionStore, path traversal guards, incremental update mechanism, multi-layer truncation - Update 后端核心.md: document mermaid validation degradation strategy - Update overview.md: tool count 9→10, MCP module component count 27→38 - Refresh module_tree.json with new MCP components

Replace large data transmission through stdio MCP protocol with file-based side channels, enabling support for larger codebases. Key changes: - Add SessionWorkspace for per-session disk workspace management - Write component index, leaf nodes, source files to {repo}/.codewiki/sessions/ - Remove list_components and view_repo_file tools (agent uses native file reading) - Remove all truncation/pagination limits from MCP responses - Fix Windows GBK encoding issues with explicit utf-8 encoding - Update skill SKILL.md to v2.0.0 reflecting new 8-tool architecture - Regenerate wiki docs with updated module structure (19 docs)

anhnh2002

Nice refactor overall, the session/tools split is clean and the CLI commit_id fix is correct. A few things to address before merge.

Blockers: the mcp SDK is still not a declared dependency (the pyproject change only registers packages), and view_repo_file shells out with an interpolated path which breaks on spaces and allows command injection.

Should-fix: path traversal on agent-supplied filenames in the read/write tools, and the incremental-update feature only works for CLI-generated docs since the MCP flow never writes metadata.json. Details inline.

- Add mcp>=1.0.0 to pyproject.toml dependencies (fixes ModuleNotFoundError on fresh install) - Add explicit utf-8 encoding to FileManager file operations

mambo-wang · 2026-06-23T10:32:31Z

✅ Issue 2 — code_reader.py 命令注入：当前的 code_reader.py 已经彻底重写，没有任何 subprocess 或 shell=True 调用，全部用纯 Python 的 Path 操作写文件，命令注入漏洞已消除。
✅ Issue 3 — code_reader.py 路径穿越：当前代码中 component ID 来自依赖图分析结果，不直接接受用户路径输入，路径穿越风险已不存在。
✅ Issue 4 — doc_writer.py 路径穿越：已经加了完整的防护——_is_within() 辅助函数（第22行）+ _safe_doc_path() 路径解析函数（第31行），做了 .resolve() + relative_to() 边界检查。write_doc_file 和 edit_doc_file 都调用了这个函数，穿越路径会返回 "Filename escapes output directory." 错误。
✅ Issue 5 — metadata.json 未在 MCP 流程写入：server.py 第546-576行已经实现了 _write_generation_metadata() 函数，在 close_session 时（第403行）会调用它写入 metadata.json，包含 commit_id 和 timestamp。增量更新链路完整。

overall:

✅ mcp SDK 已声明依赖（刚加的）
✅ code_reader.py 已无 shell 调用
✅ code_reader.py 路径穿越已消除
✅ doc_writer.py 路径穿越有完整防护
✅ close_session 时写入 metadata.json

anhnh2002

Nice work on the MCP refactor and the incremental-update support, the file side-channel design is clean. Two things to sort before merge.

First, unrelated content from the fork needs to come out: CodeWiki介绍.md and img/logo-banner.png are unrelated, and IDE_DRIVEN_GUIDE.md + skills/codewiki-wiki-generator/SKILL.md hardcode the fork name CodeWiki-CN and a mambo-wang/CodeWiki-CN clone URL that should point at upstream. The guide also references .codebuddy/.../RULE.mdc and .cursorrules rule files that aren't in the PR.

Second, a few code issues worth fixing: the off-by-start line numbers in edit_doc_file snippets, the Windows path-separator mismatch in mtime detection, the substring over-matching in _find_affected_modules, the missed staged changes in git detection, and the Mermaid validation being off by default while the docs advertise it as automatic. Details inline.

- Remove unrelated files (CodeWiki介绍.md, img/logo-banner.png) - Replace fork-specific references (CodeWiki-CN) with upstream (CodeWiki) in IDE_DRIVEN_GUIDE.md and SKILL.md - Fix off-by-start line number calculation in edit_doc_file snippet window - Use .as_posix() for mtime path detection to fix Windows path separators - Fix _find_affected_modules substring over-matching with path-based matching - Use index.diff('HEAD') to capture staged changes (not just unstaged) - Enable Mermaid validation by default to match documentation promises - Add SHA1 hash suffix to _safe_filename to prevent component ID collisions

mambo-wang · 2026-06-26T04:43:51Z

全部完成，已推送到 PR #66。修改汇总：

删除无关文件：CodeWiki介绍.md 和 img/logo-banner.png

清除 fork 引用：IDE_DRIVEN_GUIDE.md 和 SKILL.md 中所有 CodeWiki-CN 替换为上游 CodeWiki

代码修复（6处）：

doc_writer.py — edit_doc_file 的 snippet 窗口 end 计算从 replacement_line + ...（绝对位置）改为 start + ...（相对 start 的偏移），修复 off-by-start 问题
analysis.py — _detect_via_mtime 中 str(path) 改为 .as_posix()，修复 Windows 反斜杠路径与 git 正斜杠路径不一致
analysis.py — _find_affected_modules 从双向子串匹配改为基于文件路径的精确匹配（提取 :: 前的文件路径，比较路径相等/包含关系），避免短文件名误匹配
analysis.py — repo.index.diff(None) 改为 repo.index.diff("HEAD")，同时捕获已暂存和未暂存的变更
utils.py — Mermaid 验证默认启用（MERMAID_VALIDATE 默认 "1"），与文档承诺一致
workspace.py — _safe_filename 添加 SHA1 哈希后缀（8字符），防止不同组件 ID 清理后产生相同文件名

anhnh2002

Thanks for the thorough revision — nearly all prior feedback is correctly addressed (mcp dependency, path-traversal guards, shell-injection removal, SHA1 filename suffix, as_posix()/index.diff("HEAD")/path-boundary matching in incremental detection, thread-safe SessionStore, removal of unrelated assets, and the CodeWiki-CN scrub).

One previously-flagged issue is marked resolved but not actually fixed: the snippet line-number labels in edit_doc_file (both the str_replace and insert branches) still use i + start + 1, which double-counts start. See the two inline comments — each is a one-character fix to i + 1. The window-end change in the last commit was a separate (valid) fix, but the label the original review pointed at is unchanged.

Two minor, non-blocking nits for follow-up:

_write_generation_metadata in server.py writes metadata.json without encoding="utf-8", inconsistent with the utf-8 hardening applied elsewhere in this PR.
Incremental update now depends on metadata.json, which is written only on close_session; if a session ends without close_session, the next analyze_repo silently falls back to full analysis. Worth a one-line note in the SKILL/guide.

The feature itself is well worth merging — the fine-grained IDE-driven tool set and incremental update support are real improvements. Just the line-number fix before merge.

anhnh2002 · 2026-06-27T03:29:11Z

+        lines = new_content.split("\n")
+        start = max(0, replacement_line - 4)
+        end = min(len(lines), start + new_str.count("\n") + 9)
+        snippet = "\n".join(f"{i + start + 1:6}\t{lines[i]}" for i in range(start, end))


The snippet line numbers are still off by start. i already runs over absolute indices from range(start, end), so the label should be i + 1, not i + start + 1. With start=10 the first line gets labeled 21 instead of 11; it's only correct when start == 0 (edits near the top of the file).

The earlier fix corrected the window end bound, but the label expression flagged in the previous review still double-counts start. The agent reads this snippet back to choose its next insert_line, so a wrong number can misdirect the following edit.

Suggested change

snippet = "\n".join(f"{i + start + 1:6}\t{lines[i]}" for i in range(start, end))

snippet = "\n".join(f"{i + 1:6}\t{lines[i]}" for i in range(start, end))

anhnh2002 · 2026-06-27T03:29:16Z

+
+        start = max(0, insert_line - 4)
+        end = min(len(lines), start + len(new_str_lines) + 8)
+        snippet = "\n".join(f"{i + start + 1:6}\t{lines[i]}" for i in range(start, end))


Same off-by-start bug in the insert branch — the label should be i + 1, not i + start + 1.

Suggested change

snippet = "\n".join(f"{i + start + 1:6}\t{lines[i]}" for i in range(start, end))

snippet = "\n".join(f"{i + 1:6}\t{lines[i]}" for i in range(start, end))

mambo-wang added 24 commits June 18, 2026 13:46

docs: add CodeWiki introduction article in Chinese

f7ed709

🤖 Generated with [Qoder][https://qoder.com]

AI IDE驱动

6be7010

docs: add IDE-driven mode guide and update README

6fafddd

- Add IDE_DRIVEN_GUIDE.md with complete walkthrough for using CodeWiki with AI IDEs (CodeBuddy, Cursor, Claude Desktop) via MCP - Update README with IDE-Driven Mode section and navigation link

docs: rewrite README with bilingual content and IDE-driven quick start

e901326

fix: add missing MCP packages to pyproject.toml and add prerequisites…

9fce135

… to README

skill rule

50ac6fc

assets: add project logo and banner, display banner in README

623f681

Square logo for GitHub repo avatar and wide banner for README header. Design follows the blue-purple-green gradient palette from the original CodeWiki framework diagram, with a red CN badge for branding.

assets: remove white background from logos, use transparent PNG

03a514f

Square logo corners and banner surrounding area are now transparent instead of white/light-gray, suitable for any background color.

assets: fill banner rounded corners with dark gradient background

7fbd559

Remove transparent padding around the rounded rectangle by filling corners with a matching dark navy gradient, making the banner a solid rectangle.

更新logo

91ab20d

AI IDE驱动

7a698cf

fix: add missing MCP packages to pyproject.toml

7213a09

SKILL

129f730

anhnh2002 requested changes Jun 23, 2026

View reviewed changes

Comment thread pyproject.toml

Comment thread codewiki/mcp/tools/code_reader.py Outdated

Comment thread codewiki/mcp/tools/code_reader.py Outdated

Comment thread codewiki/mcp/tools/doc_writer.py Outdated

Comment thread codewiki/mcp/tools/analysis.py

mambo-wang added 3 commits June 23, 2026 17:27

merge: sync test branch code (excluding README)

b71fe1c

删除没用的文件

92b3e98

fix: add mcp SDK to dependencies and fix file encoding in utils

6582802

- Add mcp>=1.0.0 to pyproject.toml dependencies (fixes ModuleNotFoundError on fresh install) - Add explicit utf-8 encoding to FileManager file operations

mambo-wang requested a review from anhnh2002 June 24, 2026 07:18

anhnh2002 requested changes Jun 24, 2026

View reviewed changes

mambo-wang requested a review from anhnh2002 June 26, 2026 05:33

anhnh2002 requested changes Jun 27, 2026

View reviewed changes

	snippet = "\n".join(f"{i + start + 1:6}\t{lines[i]}" for i in range(start, end))
	snippet = "\n".join(f"{i + 1:6}\t{lines[i]}" for i in range(start, end))

Uh oh!

Conversation

mambo-wang commented Jun 20, 2026

Summary

Test plan

Uh oh!

anhnh2002 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mambo-wang commented Jun 23, 2026

Uh oh!

anhnh2002 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mambo-wang commented Jun 26, 2026

Uh oh!

anhnh2002 left a comment

Choose a reason for hiding this comment

Uh oh!

anhnh2002 Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

anhnh2002 Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants