fix(clone): clone agentic project documents + resolve exported agentic tools by chandrasekharan-zipstack · Pull Request #23 · Zipstack/unstract-python-client

chandrasekharan-zipstack · 2026-06-23T09:54:17Z

What

Completes org-to-org clone for cloud Agentic ("Agentic Prompt Studio") projects, which previously cloned only the project shell:

Documents — each project's uploaded docs are cloned (download raw bytes from source → skip names already on target → upload to the target project). Honours file_strategy="skip" like the files/lookups phases.
Verified data — the curated ground-truth rows are cloned, re-pointed to the cloned document by filename. Extracted/comparison data is regenerable and intentionally not cloned.
Exported agentic tools — a workflow tool_instance.tool_id now resolves against agentic_studio_registry in addition to prompt_studio_registry.

Why

Agentic projects keep uploads in a separate store (agentic/documents/), distinct from Prompt Studio prompt-documents. The files phase only iterates the custom_tool remap, so agentic docs were silently dropped — a clone landed the project with zero documents.
agentic_verified_data is "ground truth manually verified by user" — curated input (the accuracy baseline), not regenerable output. Without it the target org can't measure extraction accuracy without a human re-verifying every doc.
A workflow tool_instance.tool_id is a registry id. Exported agentic projects register under agentic_studio_registry, but ToolInstancePhase resolved only via prompt_studio_registry, so an agentic-tool instance found no remap and was skipped (no registry remap for tool_id …). The dependent workflow (e.g. "Agentic tool API") then landed with no tool wired.

All three reproduced on a staging org→org run.

How

client.py: list_agentic_documents, download_agentic_document (raw binary, like download_lookup_file), upload_agentic_document (multipart to agentic/projects/{id}/documents/upload/ — the real upload route; the documents viewset upload action is a backend stub); list_agentic_verified_data, create_agentic_verified_data.
agentic_studio.py: _clone_documents / _clone_one_document and _clone_verified_data run after schemas, before registry republish; idempotent by filename; honour max_file_size and file_strategy; dry-run plan counts both.
tool_instance.py: tool_id resolves via prompt_studio_registry or agentic_studio_registry; corrected the misleading "custom tool unpublished" skip message.

Like Prompt Studio uploads, this clones the file + creates the document row; extraction/summary stays a UI step.

Can this PR break any existing features?

No. New client methods are additive. The doc/verified-data paths are gated to the cloud-only AgenticStudioPhase (probe-skipped on OSS) and add work that previously didn't happen at all. The tool_instance change only adds a fallback resolve when the primary lookup misses, so existing Prompt Studio tool instances are unaffected.

Database Migrations

None.

Env Config

None.

Relevant Docs

N/A

Related Issues or PRs

Extends the agentic-studio clone support in AgenticStudioPhase.

Dependencies Versions

None.

Notes on Testing

tests/clone/ green (192 passing). Added: agentic doc-clone (skip-existing, dry-run count, file_strategy=skip), verified-data clone (filename mapping, skip-existing, skip-when-doc-missing, dry-run count), and tool_instance resolve-via-agentic-registry.
Pre-existing ruff format/E501 drift in untouched files left alone; changed lines are lint + format clean.

Checklist

I have read and understood the Contribution Guidelines.

🤖 Generated with Claude Code

…c tools Two gaps surfaced cloning agentic ("Agentic Prompt Studio") projects: 1. Documents were dropped. Agentic projects keep their uploads in their own store (agentic/documents/), separate from Prompt Studio prompt-documents. The files phase only iterates the custom_tool remap, so agentic docs were never cloned. AgenticStudioPhase now clones them per project (download raw bytes from source, skip names already on target, upload to the target project), and counts them in the dry-run plan. 2. Exported agentic tools were skipped in workflows. A workflow tool_instance references a registry id; exported agentic projects register under agentic_studio_registry, but ToolInstancePhase resolved tool_id only via prompt_studio_registry. It now falls back to the agentic registry, so the "Agentic tool API" workflow lands with its tool wired. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011ja9H1rnSXmPUgQtHm8TNS

greptile-apps · 2026-06-23T09:57:49Z

Greptile Summary

This PR completes the org-to-org clone for Agentic Prompt Studio projects by adding document upload/download, ground-truth verified-data cloning, and a fallback agentic_studio_registry resolver for workflow tool instances. The three previously flagged issues have all been resolved.

Document cloning: _clone_documents / _clone_one_document download raw bytes from source and upload to target, correctly short-circuits on file_strategy=\"skip\" and max_file_size.
Verified-data cloning: _clone_verified_data re-points ground-truth rows to the cloned target document via filename; skips rows whose document is missing, and returns early under file_strategy=\"skip\".
Tool-instance resolution: ToolInstancePhase now falls back to agentic_studio_registry when the prompt_studio_registry lookup misses.

Confidence Score: 5/5

Safe to merge

All three previously flagged issues are resolved, new methods are additive, and the agentic-registry fallback is inside the existing lock.

No files require special attention.

Important Files Changed

Filename	Overview
src/unstract/clone/client.py	Adds five new client methods following the established request/error-handling pattern.
src/unstract/clone/phases/agentic_studio.py	Adds document and verified-data clone methods with correct file_strategy/dry-run guards.
src/unstract/clone/phases/tool_instance.py	Adds agentic_studio_registry fallback inside the existing lock — thread-safe and correct.
tests/clone/test_agentic_studio_phase.py	Eight new test cases covering all critical edge cases.
tests/clone/test_tool_instance_phase.py	Adds test verifying the agentic-registry fallback resolve path.

_{Reviews (4): Last reviewed commit: "fix(clone): honour skip strategy in _clo..." | Re-trigger Greptile}

Agentic verified-data ("ground truth manually verified by user") is curated input, not regenerable output, so it must be cloned. AgenticStudioPhase now re-points each source verified-data row to the cloned target document by filename and recreates it (skipping docs absent on target and rows already present). Extracted/comparison data stays uncloned — both regenerate on a re-run + re-verify. Also honour file_strategy="skip" in the new document path, matching the files and lookups phases: under skip, agentic documents are listed and counted as skipped (operator re-uploads), not transferred. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011ja9H1rnSXmPUgQtHm8TNS

…tegy Verified data FKs a document; with file_strategy=skip no docs land on target, so a dry-run must predict the rows as skipped rather than as creates that the real run silently drops. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011ja9H1rnSXmPUgQtHm8TNS

_plan_children already forecasts verified-data as skipped under file_strategy=skip, but the runtime path lacked the matching guard: on a re-run where documents reached the target by other means, it would create verified rows the plan said it would skip. Add the early-return guard, mirroring _clone_documents. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011ja9H1rnSXmPUgQtHm8TNS

greptile-apps Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread src/unstract/clone/phases/agentic_studio.py

chandrasekharan-zipstack self-assigned this Jun 23, 2026

greptile-apps Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread src/unstract/clone/phases/agentic_studio.py Outdated

greptile-apps Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread src/unstract/clone/phases/agentic_studio.py

Deepak-Kesavan approved these changes Jun 23, 2026

View reviewed changes

chandrasekharan-zipstack merged commit d769942 into main Jun 23, 2026
3 checks passed

chandrasekharan-zipstack deleted the fix/clone-agentic-docs-and-tool branch June 23, 2026 12:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(clone): clone agentic project documents + resolve exported agentic tools#23

fix(clone): clone agentic project documents + resolve exported agentic tools#23
chandrasekharan-zipstack merged 4 commits into
mainfrom
fix/clone-agentic-docs-and-tool

chandrasekharan-zipstack commented Jun 23, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 23, 2026 •

edited

Loading

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chandrasekharan-zipstack commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Can this PR break any existing features?

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Checklist

Uh oh!

greptile-apps Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chandrasekharan-zipstack commented Jun 23, 2026 •

edited

Loading

greptile-apps Bot commented Jun 23, 2026 •

edited

Loading