fix(run-ops): DNS-safe, sortable base32hex run id (replace base62 KSUID)#4154
Conversation
The run-ops split minted NEW-store run ids as 27-char base62 KSUIDs. The supervisor puts the run id in the k8s pod name (runner-<id>), and pod names must be DNS-1123 labels (lowercase [a-z0-9-]), so uppercase base62 ids made k8s reject the pod (422) and those runs never launched. .toLowerCase() can't fix it (base62 folds distinct symbols, colliding ids and destroying sort). Change the encoding, not the structure: mint a 26-char lowercase base32hex id (24-char core = 6-byte ms timestamp + 9 CSPRNG bytes, + region char + version char "1"). Lexicographically time-sortable at ms resolution, DNS-safe from birth, hyphen-free (so firekeeper's runner-<id>-attempt-N round-trip is unchanged). The residency discriminator switches from id length to the fixed version char, so the format can evolve without a hard migration. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
WalkthroughThis PR replaces KSUID-based run identifiers with run-ops v1 identifiers across core generation/parsing, residency classification, webapp minting/routing, run-engine behavior, run-store routing, and related tests. The core package adds run-ops ID helpers and updates residency kinds and classifiers. Webapp services, routes, and tests now mint, classify, and route run-ops IDs, including region-aware child IDs and the new 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/webapp/test/apiRunResultPresenter.readthrough.test.ts (1)
24-28: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
newFriendlyId()needs the 26-char v1 body
newFriendlyId()still returns a 27-char body, so the NEW-path cases are seeding an id that the residency rule treats as LEGACY. Update it to the 26-char v1 shape to match the classification logic.
🧹 Nitpick comments (2)
internal-packages/run-engine/src/engine/tests/lifecycleRouter.test.ts (1)
459-466: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueSanitizing regex could collapse distinct suffixes into the same id.
newRunIdreplaces any char outside[0-9a-v]with"0". Two suffixes differing only in letters outside that range (e.g.w-z, uppercase) would sanitize to identical strings, silently producing duplicate ids and masking test bugs. Current call sites (e.g."new_run_e"/"new_run_g") don't trigger this, but it's a latent footgun for future test authors.apps/webapp/test/engine/triggerTask.test.ts (1)
2396-2403: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winRedundant dynamic
import()— use the already-importedgenerateRunOpsId.
generateRunOpsIdis statically imported in this file (line 33), so the dynamicimport("@trigger.dev/core/v3/isomorphic")here is unnecessary.♻️ Proposed fix
- (await import("`@trigger.dev/core/v3/isomorphic`")).generateRunOpsId() + generateRunOpsId()Based on coding guidelines: "Prefer static imports over dynamic
import(); only use dynamic imports when resolving circular dependencies, enabling real code splitting, or conditionally loading a module at runtime."Source: Coding guidelines
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4c495098-38e7-4d88-9358-5915b732f109
📒 Files selected for processing (66)
.changeset/olive-bees-repeat.mdapps/webapp/app/runEngine/concerns/idempotencyResidency.server.test.tsapps/webapp/app/runEngine/services/triggerFailedTask.server.tsapps/webapp/app/runEngine/services/triggerTask.server.tsapps/webapp/app/utils/friendlyId.test.tsapps/webapp/app/utils/friendlyId.tsapps/webapp/app/v3/runOpsMigration/crossSeamGuard.server.tsapps/webapp/app/v3/runOpsMigration/mintBatchFriendlyId.server.test.tsapps/webapp/app/v3/runOpsMigration/mintBatchFriendlyId.server.tsapps/webapp/app/v3/runOpsMigration/readThrough.server.test.tsapps/webapp/app/v3/runOpsMigration/resolveInheritedMintKind.server.test.tsapps/webapp/app/v3/runStore.server.test.tsapps/webapp/app/v3/services/batchTriggerV3.server.tsapps/webapp/app/v3/services/bulk/BulkActionV2.batchReadThrough.server.test.tsapps/webapp/app/v3/services/executeTasksWaitingForDeploy.tsapps/webapp/test/SpanPresenter.readthrough.test.tsapps/webapp/test/api.v1.waitpoints.tokens.test.tsapps/webapp/test/apiBatchResultsPresenter.readroute.test.tsapps/webapp/test/apiBatchResultsPresenter.readthrough.test.tsapps/webapp/test/apiRetrieveRunPresenter.readroute.test.tsapps/webapp/test/apiRunResultPresenter.readthrough.test.tsapps/webapp/test/apiWaitpointPresenter.readthrough.test.tsapps/webapp/test/batchTriggerV3ResidencyInheritance.test.tsapps/webapp/test/batchTriggerV3StoreRouting.test.tsapps/webapp/test/bulkActionV2ReadRouting.test.tsapps/webapp/test/cancelDevSessionRunsStoreRouting.test.tsapps/webapp/test/crossSeamGuard.proof.test.tsapps/webapp/test/engine/triggerFailedTask.test.tsapps/webapp/test/engine/triggerTask.test.tsapps/webapp/test/idempotencyDedupResidency.test.tsapps/webapp/test/idempotencyKeyConcernLegacyAuthority.test.tsapps/webapp/test/performTaskRunAlertsStoreRouting.test.tsapps/webapp/test/realtime/runReaderReadThrough.test.tsapps/webapp/test/resetIdempotencyKeyLegacyAuthority.test.tsapps/webapp/test/resolveWaitpointThroughReadThrough.readthrough.test.tsapps/webapp/test/runOpsCrossSeamGuard.test.tsapps/webapp/test/runOpsMintCutover.test.tsapps/webapp/test/runPresenterReadRoute.test.tsapps/webapp/test/runsRepository.readthrough.test.tsapps/webapp/test/sessions.readthrough.test.tsapps/webapp/test/updateMetadataStoreRoutingHetero.test.tsinternal-packages/run-engine/src/engine/systems/executionSnapshotSystem.test.tsinternal-packages/run-engine/src/engine/systems/waitpointSystem.test.tsinternal-packages/run-engine/src/engine/tests/blockEdgeResidency.test.tsinternal-packages/run-engine/src/engine/tests/completeWaitpointReadResidency.test.tsinternal-packages/run-engine/src/engine/tests/datetimeWaitpointColocation.test.tsinternal-packages/run-engine/src/engine/tests/lifecycleRouter.test.tsinternal-packages/run-engine/src/engine/tests/triggerCreateRouting.test.tsinternal-packages/run-store/src/PostgresRunStore.batchProbeReadClient.test.tsinternal-packages/run-store/src/PostgresRunStore.controlPlaneAlertFk.test.tsinternal-packages/run-store/src/PostgresRunStore.dedicatedRepro.test.tsinternal-packages/run-store/src/PostgresRunStore.writeAtomicity.test.tsinternal-packages/run-store/src/batchCompletionResidency.test.tsinternal-packages/run-store/src/runOpsStore.flipWindowDuplicate.test.tsinternal-packages/run-store/src/runOpsStore.forWaitpointCompletion.test.tsinternal-packages/run-store/src/runOpsStore.idempotencyDedup.test.tsinternal-packages/run-store/src/runOpsStore.mixedResidency.test.tsinternal-packages/run-store/src/runOpsStore.readAfterWrite.test.tsinternal-packages/run-store/src/runOpsStore.snapshots.test.tsinternal-packages/run-store/src/runOpsStore.test.tsinternal-packages/run-store/src/runOpsStore.tsinternal-packages/run-store/src/runOpsStore.waitpoints.test.tspackages/core/src/v3/isomorphic/friendlyId.test.tspackages/core/src/v3/isomorphic/friendlyId.tspackages/core/src/v3/isomorphic/runOpsResidency.test.tspackages/core/src/v3/isomorphic/runOpsResidency.ts
💤 Files with no reviewable changes (1)
- apps/webapp/app/v3/runOpsMigration/crossSeamGuard.server.ts
…vior change) The run-ops NEW-store run id is a 26-char base32hex id, not a KSUID, so code identifiers named "ksuid" were stale and misleading. Rename them to a consistent runOpsId/run-ops family (vars, test constants, helpers, fixture strings) and update comments/test descriptions to describe the current design. Also fixes 3 review findings in the touched test files: apiRunResultPresenter's newFriendlyId helper now mints a valid 26-char NEW body (was 27-char, silently LEGACY), lifecycleRouter's newRunId maps suffixes without the lossy outlier->"0" collision, and triggerTask uses the static generateRunOpsId import. Load-bearing wire values are untouched: the RUN_OPS_MINT_KSUID_ENABLED env var, the runOpsMintKsuid feature-flag key, and the persisted "ksuid"/"cuid" enum values. The one KSUID history note and the checksummed migration comment remain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # internal-packages/run-store/src/runOpsStore.test.ts # internal-packages/run-store/src/runOpsStore.ts
|
Addressed the CodeRabbit review (folded into this branch, latest commit):
Also in this branch: the stale |
…ng constraint Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@trigger.dev/build
trigger.dev
@trigger.dev/core
@trigger.dev/python
@trigger.dev/react-hooks
@trigger.dev/redis-worker
@trigger.dev/rsc
@trigger.dev/schema-to-json
@trigger.dev/sdk
commit: |
The routed-read primary test (from #4153) hardcoded a 27-char KSUID as the NEW-resident id, which the base32hex version-char discriminator now classifies LEGACY — so the NEW arm routed to the wrong store and failed. Seed it with a real generateRunOpsId() v1 body instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…psMintKind Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… left for checksum) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`@trigger.dev/database` and `@internal/run-ops-database` pin the same prisma version, so pnpm gives them one shared package instance in the store. When `turbo run generate` runs their generate scripts concurrently, both prisma processes race to write the query-engine binary into that shared directory and on Windows the loser fails with `EPERM: operation not permitted, rename` on the `.tmp -> .dll.node` step, killing CI before any test runs. Wrap `prisma generate` in a small cross-platform node script that retries on transient filesystem contention (EPERM/EBUSY/EACCES) with backoff. On non-Windows the first attempt always succeeds, so this is a zero-cost no-op. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
scripts/retry-prisma-generate.mjs (2)
24-51: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winSurface
result.erroron spawn failure.If
spawnSyncitself fails to launchprisma(e.g., not onPATH),result.stdout/result.stderrare empty, the transient-error regex never matches, and the script exits with an unhelpful generic status. Printingresult.errorwould make that failure mode debuggable.♻️ Proposed diff
process.stdout.write(result.stdout ?? ""); process.stderr.write(result.stderr ?? ""); + + if (result.error) { + console.error(`prisma generate failed to spawn: ${result.error.message}`); + }
25-28: 🔒 Security & Privacy | 🔵 Trivial | 💤 Low valueConsider scoping
shell: trueto Windows only.The header comment explains this wrapper exists to work around a Windows-only EPERM race. Unconditionally enabling
shell: trueon all platforms forwardspassthroughArgsthrough shell interpretation unnecessarily on POSIX systems.♻️ Proposed diff
const result = spawnSync("prisma", ["generate", ...passthroughArgs], { - shell: true, + shell: process.platform === "win32", encoding: "utf8", });
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 19b18649-cad8-47f7-92f3-9b336fc31984
📒 Files selected for processing (26)
.changeset/olive-bees-repeat.md.github/workflows/e2e.ymlapps/webapp/app/env.server.tsapps/webapp/app/runEngine/concerns/idempotencyResidency.server.test.tsapps/webapp/app/runEngine/concerns/idempotencyResidency.server.tsapps/webapp/app/runEngine/services/triggerFailedTask.server.tsapps/webapp/app/runEngine/services/triggerTask.server.tsapps/webapp/app/v3/featureFlags.tsapps/webapp/app/v3/runOpsMigration/mintBatchFriendlyId.server.test.tsapps/webapp/app/v3/runOpsMigration/mintBatchFriendlyId.server.tsapps/webapp/app/v3/runOpsMigration/resolveInheritedMintKind.server.test.tsapps/webapp/app/v3/runOpsMigration/resolveInheritedMintKind.server.tsapps/webapp/app/v3/runOpsMigration/runOpsMintKind.flipLatency.test.tsapps/webapp/app/v3/runOpsMigration/runOpsMintKind.server.test.tsapps/webapp/app/v3/runOpsMigration/runOpsMintKind.server.tsapps/webapp/app/v3/services/batchTriggerV3.server.tsapps/webapp/test/batchTriggerV3ResidencyInheritance.test.tsapps/webapp/test/engine/triggerFailedTask.test.tsapps/webapp/test/engine/triggerTask.test.tsapps/webapp/test/runOpsMintCutover.test.tsinternal-packages/database/package.jsoninternal-packages/run-ops-database/package.jsoninternal-packages/run-store/src/runOpsStore.routedReadPrimary.test.tspackages/core/src/v3/isomorphic/runOpsResidency.test.tspackages/core/src/v3/isomorphic/runOpsResidency.tsscripts/retry-prisma-generate.mjs
✅ Files skipped from review due to trivial changes (4)
- .github/workflows/e2e.yml
- .changeset/olive-bees-repeat.md
- apps/webapp/app/v3/runOpsMigration/resolveInheritedMintKind.server.ts
- apps/webapp/app/v3/runOpsMigration/runOpsMintKind.server.test.ts
🚧 Files skipped from review as they are similar to previous changes (8)
- apps/webapp/app/v3/runOpsMigration/resolveInheritedMintKind.server.test.ts
- apps/webapp/app/runEngine/concerns/idempotencyResidency.server.test.ts
- internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
- apps/webapp/test/engine/triggerTask.test.ts
- apps/webapp/app/v3/services/batchTriggerV3.server.ts
- apps/webapp/test/batchTriggerV3ResidencyInheritance.test.ts
- packages/core/src/v3/isomorphic/runOpsResidency.ts
- packages/core/src/v3/isomorphic/runOpsResidency.test.ts
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…reaks publish) (#4156) ## Build-blocker hotfix `publish.yml` on `main` is failing to build the image after #4154 merged: ``` @trigger.dev/database:generate: Error: Cannot find module '/triggerdotdev/scripts/retry-prisma-generate.mjs' … ERROR: process "/bin/sh -c pnpm run generate" did not complete successfully: exit code: 1 ``` ## Cause #4154 (Windows-CI hardening) changed the `generate` scripts of `@trigger.dev/database` and `@internal/run-ops-database` to call `node ../../scripts/retry-prisma-generate.mjs`. But `docker/Dockerfile`'s `builder` stage does `COPY docker/scripts ./scripts` (replacing the scripts dir) and then copies back only the specific root scripts it needs (`updateVersion.ts`, `bundleSdkDocs.ts`) before `RUN pnpm run generate` — the new `retry-prisma-generate.mjs` wasn't copied, so `pnpm run generate` can't find it and the image build fails. `publish.yml` only runs on push to `main` (not on PRs), so #4154's PR CI never built the image and this slipped through. ## Fix One line — copy the retry script alongside the other root scripts before the generate step: ```dockerfile COPY --chown=node:node scripts/retry-prisma-generate.mjs scripts/retry-prisma-generate.mjs ``` ## Verification Built locally with `docker build --target builder` (the stage that runs `pnpm run generate`) to confirm the generate step now passes — result appended once it finishes. No changeset / `.server-changes` — Dockerfile/build-only change, no package or server-runtime change. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Problem
The run-ops split mints NEW-store run ids as 27-char base62 KSUIDs. The supervisor writes the run id into the Kubernetes pod name (
runner-<id>), and pod names must be DNS-1123 labels (lowercase[a-z0-9-]) — so uppercase base62 ids make k8s reject the pod (422) and those runs never launch (they loop inPENDING_EXECUTINGuntil the heartbeat-stall handler nacks them, forever)..toLowerCase()can't fix it: base62 has bothA(10) anda(36) as distinct symbols, so folding collides distinct ids and destroys sort order.Fix: change the encoding, not the structure
Mint a 26-char lowercase base32hex run id:
0-9a-v): lowercase, order-preserving, DNS-safe; 15 bytes → exactly 24 chars, no padding. Hand-rolled encode/decode (no new dependency).charAtbefore decoding/routing), version ="1".DNS-safe from birth and hyphen-free, so firekeeper is unchanged —
runner-<id>-attempt-N→ striprunner-, cut at first hyphen still recovers the exact id incl. region+version.Residency discriminator: length → version char
classifyKind/classifyResidency(runOpsResidency.ts) previously distinguished NEW vs LEGACY by id length. That gets ambiguous with a third format. It now discriminates on the version char at a fixed position (isRunOpsIdBody: 26 chars,[25] === "1", base32hex alphabet) → NEW; everything else → LEGACY. Total, never throws. TheResidency(NEW/LEGACY) contract the routing store consumes is unchanged; the"ksuid"ResidencyKindlabel is retained only because it's the persistedrunOpsMintKsuidfeature-flag value.Scope / verification
@trigger.dev/coreisomorphic; mint path + all id-shape call sites swept (~40 webapp files); changeset added (@trigger.dev/corepatch).@trigger.dev/corebuilds; webapp typechecks; format/lint clean.Open decisions (flagged, not silently chosen)
TEXTwith default locale collation it's silently not honored. Confirm whetherCOLLATE "C"/BYTEAis needed on the run-ops schema.regionCharForRegion/REGION_CODES.This PR renames a persisted feature-flag key/value and an env var. These are not changed by the code alone and must be migrated when this deploys, or affected orgs silently fall back to
cuidminting (no crash —defaultValue: "cuid"):RUN_OPS_MINT_KSUID_ENABLED→RUN_OPS_MINT_ENABLED(carry the value over).organization.featureFlags: migrate both the key and value together:runOpsMintKsuid→runOpsMintKind"ksuid"→"runOpsId"Until an org's flag row is migrated, its
runOpsMintKindlookup misses and it mintscuid(legacy) — so no NEW-store ids for that org until the data lands.