fix(clerk-js): fail fast when the origin is slow at load#9065
fix(clerk-js): fail fast when the origin is slow at load#9065nikosdouvlis wants to merge 1 commit into
Conversation
When FAPI is slow or down, a cold clerk-js load hung for minutes: the /client fetch had no request timeout, and the load-failure recovery awaited getToken({ skipCache: true }), which runs a ~162s retry budget before Clerk marks itself loaded. During an outage the app sat unresponsive.
Bound the standard-browser load with a 5s timeout in two places: around the /client fetch, and around the recovery mint. On a slow or failed /client, recovery renders identity from the __session cookie, clears the session's token cache via the existing clearCache() so the mint bypasses the cache (no skipCache, so no force_origin), and awaits a fresh mint under the timeout, keeping the cookie identity if it times out. The on-timeout clear stops the poller from awaiting an abandoned in-flight mint.
Adds a timeLimit racer to @clerk/shared/utils.
🦋 Changeset detectedLatest commit: d5ea434 The changes in this PR will be included in the next version bump. This PR includes changesets to release 23 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughAdds a ChangesFail-fast initialization
Estimated code review effort: 3 (Moderate) | ~25 minutes Sequence Diagram(s)sequenceDiagram
participant Clerk
participant ClientAPI
participant Session
participant AuthCookieService
Clerk->>ClientAPI: fetch() wrapped in timeLimit(5000ms)
alt fetch hangs or times out
ClientAPI-->>Clerk: timeout error
Clerk->>AuthCookieService: stopPollingForToken
Clerk->>Session: clear cache
Clerk->>Session: getToken() wrapped in timeLimit
alt getToken times out
Session-->>Clerk: null
else getToken resolves
Session-->>Clerk: token
end
Clerk->>AuthCookieService: startPollingForToken (finally)
else fetch succeeds
ClientAPI-->>Clerk: client data
end
Suggested reviewers: Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
Comment |
API Changes Report
Summary
@clerk/sharedCurrent version: 4.23.0 Subpath
|
@clerk/astro
@clerk/backend
@clerk/chrome-extension
@clerk/clerk-js
@clerk/electron
@clerk/electron-passkeys
@clerk/eslint-plugin
@clerk/expo
@clerk/expo-passkeys
@clerk/express
@clerk/fastify
@clerk/hono
@clerk/localizations
@clerk/nextjs
@clerk/nuxt
@clerk/react
@clerk/react-router
@clerk/shared
@clerk/tanstack-react-start
@clerk/testing
@clerk/ui
@clerk/upgrade
@clerk/vue
commit: |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
packages/clerk-js/src/core/clerk.ts (2)
3184-3186: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winDon't swallow auth failures from the recovery mint.
The outer
/clientpath correctly rethrows 4xx at Lines 3184-3186, but the recoverysession.getToken()branch converts every failure intonull. If the cookie-backed session has already been revoked or expired, this preserves the cookie-derived identity and finishes load as signed-indegradedinstead of clearing auth state. Only timeout/network failures should be tolerated here; explicit auth failures need to sign the user out or bubble up.Also applies to: 3202-3214
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/clerk-js/src/core/clerk.ts` around lines 3184 - 3186, The recovery mint flow in clerk.ts is swallowing explicit auth failures by turning every session.getToken() error into null, which leaves revoked/expired cookie sessions looking signed-in. Update the recovery branch around session.getToken() so only timeout/network failures are tolerated and mapped to null; for auth-related failures, rethrow or sign the user out and clear the recovered identity. Use the existing is4xxError(e) handling in the /client path as the pattern, and apply the same distinction in the session.getToken() branch referenced by the affected recovery logic.
3162-3175: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winWrap
Environment.getInstance().fetch()with the initialization timeout
fetchMaxTriesonly limits retries here; it does not cap wall-clock time.Promise.allSettled([initEnvironmentPromise, initClient()])can still stall on a slow/environmentrequest, so this branch should use the sametimeLimit(INITIALIZATION_TIMEOUT_MS)guard as the client fetch.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/clerk-js/src/core/clerk.ts` around lines 3162 - 3175, The initialization flow in clerk.ts should apply the same wall-clock timeout to the Environment.getInstance().fetch() path as initClient(), since fetchMaxTries only limits retries and Promise.allSettled can still hang on a slow /environment request. Update the initEnvironmentPromise chain to wrap the fetch call with timeLimit(INITIALIZATION_TIMEOUT_MS) before the then/catch handling, keeping the existing updateEnvironment and fallback-to-SafeLocalStorage logic intact.
🧹 Nitpick comments (1)
packages/clerk-js/src/core/__tests__/clerk.test.ts (1)
830-834: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAvoid
anyin the session double.Line 830’s
session: anyremoves the compiler checks aroundid,getToken(), andclearCache(), so this suite can drift away from the recovery-path contract without noticing. A tiny test-local interface keeps the helper honest.As per coding guidelines, "Avoid
anytype - preferunknownwhen type is uncertain, then narrow with type guards."Suggested typing cleanup
+interface RecoverySessionDouble { + id: string; + status: 'active'; + user: Record<string, unknown>; + getToken(): Promise<string | null>; + clearCache(): void; +} + -const sessionClient = (session: any) => ({ +const sessionClient = (session: RecoverySessionDouble) => ({ signedInSessions: [session], lastActiveSessionId: session.id, });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/clerk-js/src/core/__tests__/clerk.test.ts` around lines 830 - 834, The session test double uses an untyped `any`, which bypasses compile-time checks for the recovery-path contract. Update the `sessionClient` helper in `clerk.test.ts` to use a small test-local interface or a narrowed `unknown` type that explicitly includes the `id`, `getToken()`, and `clearCache()` members, and keep `sessionlessClient` unchanged.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/clerk-js/src/core/__tests__/clerk.test.ts`:
- Around line 838-848: The helper `pumpUntilSettled` in `clerk.test.ts` can
still hang because it always awaits `tracked` even after the timer budget is
exhausted. Update `pumpUntilSettled` so that if the `load()` promise has not
settled after the 20 `vi.advanceTimersByTimeAsync(5000)` iterations, it throws a
deterministic error instead of awaiting forever; keep the existing settled
tracking logic and only await `tracked` when the promise has already resolved or
rejected.
---
Outside diff comments:
In `@packages/clerk-js/src/core/clerk.ts`:
- Around line 3184-3186: The recovery mint flow in clerk.ts is swallowing
explicit auth failures by turning every session.getToken() error into null,
which leaves revoked/expired cookie sessions looking signed-in. Update the
recovery branch around session.getToken() so only timeout/network failures are
tolerated and mapped to null; for auth-related failures, rethrow or sign the
user out and clear the recovered identity. Use the existing is4xxError(e)
handling in the /client path as the pattern, and apply the same distinction in
the session.getToken() branch referenced by the affected recovery logic.
- Around line 3162-3175: The initialization flow in clerk.ts should apply the
same wall-clock timeout to the Environment.getInstance().fetch() path as
initClient(), since fetchMaxTries only limits retries and Promise.allSettled can
still hang on a slow /environment request. Update the initEnvironmentPromise
chain to wrap the fetch call with timeLimit(INITIALIZATION_TIMEOUT_MS) before
the then/catch handling, keeping the existing updateEnvironment and
fallback-to-SafeLocalStorage logic intact.
---
Nitpick comments:
In `@packages/clerk-js/src/core/__tests__/clerk.test.ts`:
- Around line 830-834: The session test double uses an untyped `any`, which
bypasses compile-time checks for the recovery-path contract. Update the
`sessionClient` helper in `clerk.test.ts` to use a small test-local interface or
a narrowed `unknown` type that explicitly includes the `id`, `getToken()`, and
`clearCache()` members, and keep `sessionlessClient` unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Repository UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: adbc0a12-845a-4bcc-8b5e-c59355326470
📒 Files selected for processing (6)
.changeset/clerkjs-fail-fast-slow-origin.mdpackages/clerk-js/src/core/__tests__/clerk.test.tspackages/clerk-js/src/core/clerk.tspackages/shared/src/utils/__tests__/timeLimit.test.tspackages/shared/src/utils/index.tspackages/shared/src/utils/timeLimit.ts
| const pumpUntilSettled = async (promise: Promise<unknown>) => { | ||
| let settled = false; | ||
| const tracked = promise.then( | ||
| v => ((settled = true), v), | ||
| e => ((settled = true), Promise.reject(e)), | ||
| ); | ||
| for (let i = 0; i < 20 && !settled; i++) { | ||
| await vi.advanceTimersByTimeAsync(5000); | ||
| } | ||
| await tracked; | ||
| }; |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Make pumpUntilSettled fail instead of hanging.
If load() never settles, the loop exits after 20 advances and the final await tracked still waits forever under fake timers. Throw once the budget is exhausted so CI gets a deterministic failure instead of a stuck test.
Suggested fail-fast guard
const pumpUntilSettled = async (promise: Promise<unknown>) => {
let settled = false;
const tracked = promise.then(
v => ((settled = true), v),
e => ((settled = true), Promise.reject(e)),
);
for (let i = 0; i < 20 && !settled; i++) {
await vi.advanceTimersByTimeAsync(5000);
}
+ if (!settled) {
+ throw new Error('Timed out waiting for Clerk.load() to settle in test');
+ }
await tracked;
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const pumpUntilSettled = async (promise: Promise<unknown>) => { | |
| let settled = false; | |
| const tracked = promise.then( | |
| v => ((settled = true), v), | |
| e => ((settled = true), Promise.reject(e)), | |
| ); | |
| for (let i = 0; i < 20 && !settled; i++) { | |
| await vi.advanceTimersByTimeAsync(5000); | |
| } | |
| await tracked; | |
| }; | |
| const pumpUntilSettled = async (promise: Promise<unknown>) => { | |
| let settled = false; | |
| const tracked = promise.then( | |
| v => ((settled = true), v), | |
| e => ((settled = true), Promise.reject(e)), | |
| ); | |
| for (let i = 0; i < 20 && !settled; i++) { | |
| await vi.advanceTimersByTimeAsync(5000); | |
| } | |
| if (!settled) { | |
| throw new Error('Timed out waiting for Clerk.load() to settle in test'); | |
| } | |
| await tracked; | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/clerk-js/src/core/__tests__/clerk.test.ts` around lines 838 - 848,
The helper `pumpUntilSettled` in `clerk.test.ts` can still hang because it
always awaits `tracked` even after the timer budget is exhausted. Update
`pumpUntilSettled` so that if the `load()` promise has not settled after the 20
`vi.advanceTimersByTimeAsync(5000)` iterations, it throws a deterministic error
instead of awaiting forever; keep the existing settled tracking logic and only
await `tracked` when the promise has already resolved or rejected.
Why
When Clerk's Frontend API is slow or unreachable, a cold Clerk.load() hangs for minutes. The /client request has no timeout, and on a 5xx or network error the recovery path awaits getToken({ skipCache: true }), which burns the full ~162s retry budget before Clerk is marked loaded. So during an origin outage the app sits there unresponsive instead of degrading and rendering.
What changed
Bound the browser load with a timeout in two spots: around the /client fetch and around the recovery token mint. On a slow or failed /client, recovery renders identity from the __session cookie, clears the session's token cache (via the existing clearCache()) so the mint bypasses the cache (no skipCache, so no force_origin), and keeps the cookie identity if the mint times out. Also adds a timeLimit util to @clerk/shared/utils.
One known gap: timeLimit stops waiting but doesn't cancel the request, so a timed-out recovery mint keeps retrying in the background and can briefly write a slightly stale token before the poller reconciles. Low severity and self-correcting, and the abortable-timeout follow-up plus the in-flight monotonic-session-token guards close it. This is phase 1, later phases cover fail-fast /environment and render-first init.
Summary by CodeRabbit
New Features
Bug Fixes