Skip to content

fix(clerk-js): fail fast when the origin is slow at load#9065

Open
nikosdouvlis wants to merge 1 commit into
mainfrom
nikos/plat-2754-clerkjs-resiliency
Open

fix(clerk-js): fail fast when the origin is slow at load#9065
nikosdouvlis wants to merge 1 commit into
mainfrom
nikos/plat-2754-clerkjs-resiliency

Conversation

@nikosdouvlis

@nikosdouvlis nikosdouvlis commented Jul 1, 2026

Copy link
Copy Markdown
Member

Why
When Clerk's Frontend API is slow or unreachable, a cold Clerk.load() hangs for minutes. The /client request has no timeout, and on a 5xx or network error the recovery path awaits getToken({ skipCache: true }), which burns the full ~162s retry budget before Clerk is marked loaded. So during an origin outage the app sits there unresponsive instead of degrading and rendering.

What changed
Bound the browser load with a timeout in two spots: around the /client fetch and around the recovery token mint. On a slow or failed /client, recovery renders identity from the __session cookie, clears the session's token cache (via the existing clearCache()) so the mint bypasses the cache (no skipCache, so no force_origin), and keeps the cookie identity if the mint times out. Also adds a timeLimit util to @clerk/shared/utils.

One known gap: timeLimit stops waiting but doesn't cancel the request, so a timed-out recovery mint keeps retrying in the background and can briefly write a slightly stale token before the poller reconciles. Low severity and self-correcting, and the abortable-timeout follow-up plus the in-flight monotonic-session-token guards close it. This is phase 1, later phases cover fail-fast /environment and render-first init.

Summary by CodeRabbit

  • New Features

    • Added faster failover during sign-in initialization when the backend is slow or unavailable.
    • Cold loads now recover sooner by using cached session information instead of hanging.
  • Bug Fixes

    • Improved handling for slow or unreachable requests during app startup.
    • Reduced the chance of blocked loading screens by adding time limits to key startup steps.

When FAPI is slow or down, a cold clerk-js load hung for minutes: the /client fetch had no request timeout, and the load-failure recovery awaited getToken({ skipCache: true }), which runs a ~162s retry budget before Clerk marks itself loaded. During an outage the app sat unresponsive.

Bound the standard-browser load with a 5s timeout in two places: around the /client fetch, and around the recovery mint. On a slow or failed /client, recovery renders identity from the __session cookie, clears the session's token cache via the existing clearCache() so the mint bypasses the cache (no skipCache, so no force_origin), and awaits a fresh mint under the timeout, keeping the cookie identity if it times out. The on-timeout clear stops the poller from awaiting an abandoned in-flight mint.

Adds a timeLimit racer to @clerk/shared/utils.
@changeset-bot

changeset-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: d5ea434

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 23 packages
Name Type
@clerk/clerk-js Patch
@clerk/shared Patch
@clerk/chrome-extension Patch
@clerk/electron Patch
@clerk/expo Patch
@clerk/astro Patch
@clerk/backend Patch
@clerk/expo-passkeys Patch
@clerk/express Patch
@clerk/fastify Patch
@clerk/headless Patch
@clerk/hono Patch
@clerk/localizations Patch
@clerk/msw Patch
@clerk/nextjs Patch
@clerk/nuxt Patch
@clerk/react-router Patch
@clerk/react Patch
@clerk/tanstack-react-start Patch
@clerk/testing Patch
@clerk/ui Patch
@clerk/vue Patch
@clerk/swingset Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel

vercel Bot commented Jul 1, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
clerk-js-sandbox Ready Ready Preview, Comment Jul 1, 2026 7:12pm
swingset Ready Ready Preview, Comment Jul 1, 2026 7:12pm

Request Review

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a timeLimit utility to @clerk/shared that races a value/promise against a timeout, and applies it in clerk.ts to bound the initial client fetch and the recovery token mint during Clerk.load(), preventing indefinite hangs. Includes tests and a changeset.

Changes

Fail-fast initialization

Layer / File(s) Summary
timeLimit utility, export, and tests
packages/shared/src/utils/timeLimit.ts, packages/shared/src/utils/index.ts, packages/shared/src/utils/__tests__/timeLimit.test.ts
New timeLimit<T>(value, ms) races a value/promise against a timeout, rejecting with an Error if it elapses; exported from the utils barrel and covered by tests for resolution, timeout rejection, and timer cleanup.
Clerk.load() initialization timeout wiring
packages/clerk-js/src/core/clerk.ts
Imports timeLimit, adds INITIALIZATION_TIMEOUT_MS = 5_000, wraps the initial client fetch with timeLimit, and reworks token recovery to clear session cache and time-limit getToken(), restarting polling in a finally block.
Clerk.load() fail-fast test coverage
packages/clerk-js/src/core/__tests__/clerk.test.ts
Mocks createClientFromJwt and adds tests for hanging fetch, fetch failure with session, hanging recovery mint, no session cookie, and 4xx re-throw, asserting degraded status and poller call ordering.
Changeset entry
.changeset/clerkjs-fail-fast-slow-origin.md
Patch changeset documenting the fail-fast Clerk.load() behavior and the new timeLimit utility.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Clerk
  participant ClientAPI
  participant Session
  participant AuthCookieService

  Clerk->>ClientAPI: fetch() wrapped in timeLimit(5000ms)
  alt fetch hangs or times out
    ClientAPI-->>Clerk: timeout error
    Clerk->>AuthCookieService: stopPollingForToken
    Clerk->>Session: clear cache
    Clerk->>Session: getToken() wrapped in timeLimit
    alt getToken times out
      Session-->>Clerk: null
    else getToken resolves
      Session-->>Clerk: token
    end
    Clerk->>AuthCookieService: startPollingForToken (finally)
  else fetch succeeds
    ClientAPI-->>Clerk: client data
  end
Loading

Suggested reviewers: swolfand, seanperez29

Poem

A rabbit hops through cold-load fright,
No hanging fetch to steal the night,
With timeLimit's clock ticking fast,
Degraded now, but session's cast,
Five seconds max, then hop away! 🐇⏱️

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: making Clerk load fail fast when the origin is slow.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Comment @coderabbitai help to get the list of available commands.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

API Changes Report

Generated by Break Check on 2026-07-01T19:14:34.558Z

Summary

Metric Count
Packages analyzed 19
Packages with changes 1
🔴 Breaking changes 0
🟡 Non-breaking changes 0
🟢 Additions 1

@clerk/shared

Current version: 4.23.0
Recommended bump: MINOR → 4.24.0

Subpath ./utils

🟢 Additions (1)

Added: timeLimit
+ declare function timeLimit<T>(value: T | PromiseLike<T>, ms: number): Promise<T>;

Added function timeLimit


Report generated by Break Check

Last ran on d5ea434.

@pkg-pr-new

pkg-pr-new Bot commented Jul 1, 2026

Copy link
Copy Markdown

Open in StackBlitz

@clerk/astro

npm i https://pkg.pr.new/@clerk/astro@9065

@clerk/backend

npm i https://pkg.pr.new/@clerk/backend@9065

@clerk/chrome-extension

npm i https://pkg.pr.new/@clerk/chrome-extension@9065

@clerk/clerk-js

npm i https://pkg.pr.new/@clerk/clerk-js@9065

@clerk/electron

npm i https://pkg.pr.new/@clerk/electron@9065

@clerk/electron-passkeys

npm i https://pkg.pr.new/@clerk/electron-passkeys@9065

@clerk/eslint-plugin

npm i https://pkg.pr.new/@clerk/eslint-plugin@9065

@clerk/expo

npm i https://pkg.pr.new/@clerk/expo@9065

@clerk/expo-passkeys

npm i https://pkg.pr.new/@clerk/expo-passkeys@9065

@clerk/express

npm i https://pkg.pr.new/@clerk/express@9065

@clerk/fastify

npm i https://pkg.pr.new/@clerk/fastify@9065

@clerk/hono

npm i https://pkg.pr.new/@clerk/hono@9065

@clerk/localizations

npm i https://pkg.pr.new/@clerk/localizations@9065

@clerk/nextjs

npm i https://pkg.pr.new/@clerk/nextjs@9065

@clerk/nuxt

npm i https://pkg.pr.new/@clerk/nuxt@9065

@clerk/react

npm i https://pkg.pr.new/@clerk/react@9065

@clerk/react-router

npm i https://pkg.pr.new/@clerk/react-router@9065

@clerk/shared

npm i https://pkg.pr.new/@clerk/shared@9065

@clerk/tanstack-react-start

npm i https://pkg.pr.new/@clerk/tanstack-react-start@9065

@clerk/testing

npm i https://pkg.pr.new/@clerk/testing@9065

@clerk/ui

npm i https://pkg.pr.new/@clerk/ui@9065

@clerk/upgrade

npm i https://pkg.pr.new/@clerk/upgrade@9065

@clerk/vue

npm i https://pkg.pr.new/@clerk/vue@9065

commit: d5ea434

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
packages/clerk-js/src/core/clerk.ts (2)

3184-3186: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Don't swallow auth failures from the recovery mint.

The outer /client path correctly rethrows 4xx at Lines 3184-3186, but the recovery session.getToken() branch converts every failure into null. If the cookie-backed session has already been revoked or expired, this preserves the cookie-derived identity and finishes load as signed-in degraded instead of clearing auth state. Only timeout/network failures should be tolerated here; explicit auth failures need to sign the user out or bubble up.

Also applies to: 3202-3214

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/clerk-js/src/core/clerk.ts` around lines 3184 - 3186, The recovery
mint flow in clerk.ts is swallowing explicit auth failures by turning every
session.getToken() error into null, which leaves revoked/expired cookie sessions
looking signed-in. Update the recovery branch around session.getToken() so only
timeout/network failures are tolerated and mapped to null; for auth-related
failures, rethrow or sign the user out and clear the recovered identity. Use the
existing is4xxError(e) handling in the /client path as the pattern, and apply
the same distinction in the session.getToken() branch referenced by the affected
recovery logic.

3162-3175: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Wrap Environment.getInstance().fetch() with the initialization timeout
fetchMaxTries only limits retries here; it does not cap wall-clock time. Promise.allSettled([initEnvironmentPromise, initClient()]) can still stall on a slow /environment request, so this branch should use the same timeLimit(INITIALIZATION_TIMEOUT_MS) guard as the client fetch.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/clerk-js/src/core/clerk.ts` around lines 3162 - 3175, The
initialization flow in clerk.ts should apply the same wall-clock timeout to the
Environment.getInstance().fetch() path as initClient(), since fetchMaxTries only
limits retries and Promise.allSettled can still hang on a slow /environment
request. Update the initEnvironmentPromise chain to wrap the fetch call with
timeLimit(INITIALIZATION_TIMEOUT_MS) before the then/catch handling, keeping the
existing updateEnvironment and fallback-to-SafeLocalStorage logic intact.
🧹 Nitpick comments (1)
packages/clerk-js/src/core/__tests__/clerk.test.ts (1)

830-834: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Avoid any in the session double.

Line 830’s session: any removes the compiler checks around id, getToken(), and clearCache(), so this suite can drift away from the recovery-path contract without noticing. A tiny test-local interface keeps the helper honest.

As per coding guidelines, "Avoid any type - prefer unknown when type is uncertain, then narrow with type guards."

Suggested typing cleanup
+interface RecoverySessionDouble {
+  id: string;
+  status: 'active';
+  user: Record<string, unknown>;
+  getToken(): Promise<string | null>;
+  clearCache(): void;
+}
+
-const sessionClient = (session: any) => ({
+const sessionClient = (session: RecoverySessionDouble) => ({
   signedInSessions: [session],
   lastActiveSessionId: session.id,
 });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/clerk-js/src/core/__tests__/clerk.test.ts` around lines 830 - 834,
The session test double uses an untyped `any`, which bypasses compile-time
checks for the recovery-path contract. Update the `sessionClient` helper in
`clerk.test.ts` to use a small test-local interface or a narrowed `unknown` type
that explicitly includes the `id`, `getToken()`, and `clearCache()` members, and
keep `sessionlessClient` unchanged.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/clerk-js/src/core/__tests__/clerk.test.ts`:
- Around line 838-848: The helper `pumpUntilSettled` in `clerk.test.ts` can
still hang because it always awaits `tracked` even after the timer budget is
exhausted. Update `pumpUntilSettled` so that if the `load()` promise has not
settled after the 20 `vi.advanceTimersByTimeAsync(5000)` iterations, it throws a
deterministic error instead of awaiting forever; keep the existing settled
tracking logic and only await `tracked` when the promise has already resolved or
rejected.

---

Outside diff comments:
In `@packages/clerk-js/src/core/clerk.ts`:
- Around line 3184-3186: The recovery mint flow in clerk.ts is swallowing
explicit auth failures by turning every session.getToken() error into null,
which leaves revoked/expired cookie sessions looking signed-in. Update the
recovery branch around session.getToken() so only timeout/network failures are
tolerated and mapped to null; for auth-related failures, rethrow or sign the
user out and clear the recovered identity. Use the existing is4xxError(e)
handling in the /client path as the pattern, and apply the same distinction in
the session.getToken() branch referenced by the affected recovery logic.
- Around line 3162-3175: The initialization flow in clerk.ts should apply the
same wall-clock timeout to the Environment.getInstance().fetch() path as
initClient(), since fetchMaxTries only limits retries and Promise.allSettled can
still hang on a slow /environment request. Update the initEnvironmentPromise
chain to wrap the fetch call with timeLimit(INITIALIZATION_TIMEOUT_MS) before
the then/catch handling, keeping the existing updateEnvironment and
fallback-to-SafeLocalStorage logic intact.

---

Nitpick comments:
In `@packages/clerk-js/src/core/__tests__/clerk.test.ts`:
- Around line 830-834: The session test double uses an untyped `any`, which
bypasses compile-time checks for the recovery-path contract. Update the
`sessionClient` helper in `clerk.test.ts` to use a small test-local interface or
a narrowed `unknown` type that explicitly includes the `id`, `getToken()`, and
`clearCache()` members, and keep `sessionlessClient` unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: adbc0a12-845a-4bcc-8b5e-c59355326470

📥 Commits

Reviewing files that changed from the base of the PR and between c01b937 and d5ea434.

📒 Files selected for processing (6)
  • .changeset/clerkjs-fail-fast-slow-origin.md
  • packages/clerk-js/src/core/__tests__/clerk.test.ts
  • packages/clerk-js/src/core/clerk.ts
  • packages/shared/src/utils/__tests__/timeLimit.test.ts
  • packages/shared/src/utils/index.ts
  • packages/shared/src/utils/timeLimit.ts

Comment on lines +838 to +848
const pumpUntilSettled = async (promise: Promise<unknown>) => {
let settled = false;
const tracked = promise.then(
v => ((settled = true), v),
e => ((settled = true), Promise.reject(e)),
);
for (let i = 0; i < 20 && !settled; i++) {
await vi.advanceTimersByTimeAsync(5000);
}
await tracked;
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Make pumpUntilSettled fail instead of hanging.

If load() never settles, the loop exits after 20 advances and the final await tracked still waits forever under fake timers. Throw once the budget is exhausted so CI gets a deterministic failure instead of a stuck test.

Suggested fail-fast guard
 const pumpUntilSettled = async (promise: Promise<unknown>) => {
   let settled = false;
   const tracked = promise.then(
     v => ((settled = true), v),
     e => ((settled = true), Promise.reject(e)),
   );
   for (let i = 0; i < 20 && !settled; i++) {
     await vi.advanceTimersByTimeAsync(5000);
   }
+  if (!settled) {
+    throw new Error('Timed out waiting for Clerk.load() to settle in test');
+  }
   await tracked;
 };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const pumpUntilSettled = async (promise: Promise<unknown>) => {
let settled = false;
const tracked = promise.then(
v => ((settled = true), v),
e => ((settled = true), Promise.reject(e)),
);
for (let i = 0; i < 20 && !settled; i++) {
await vi.advanceTimersByTimeAsync(5000);
}
await tracked;
};
const pumpUntilSettled = async (promise: Promise<unknown>) => {
let settled = false;
const tracked = promise.then(
v => ((settled = true), v),
e => ((settled = true), Promise.reject(e)),
);
for (let i = 0; i < 20 && !settled; i++) {
await vi.advanceTimersByTimeAsync(5000);
}
if (!settled) {
throw new Error('Timed out waiting for Clerk.load() to settle in test');
}
await tracked;
};
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/clerk-js/src/core/__tests__/clerk.test.ts` around lines 838 - 848,
The helper `pumpUntilSettled` in `clerk.test.ts` can still hang because it
always awaits `tracked` even after the timer budget is exhausted. Update
`pumpUntilSettled` so that if the `load()` promise has not settled after the 20
`vi.advanceTimersByTimeAsync(5000)` iterations, it throws a deterministic error
instead of awaiting forever; keep the existing settled tracking logic and only
await `tracked` when the promise has already resolved or rejected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant