fix(auth): retry token exchange on transient failures#129
Conversation
| _MAX_ATTEMPTS = 3 # one initial attempt + up to two retries | ||
| _BACKOFF_BASE = 0.1 # seconds -- first retry waits ~this, doubling thereafter | ||
| _BACKOFF_MAX = 2.0 # cap on a single backoff so a flapping host can't stall us | ||
| _BACKOFF_JITTER = 0.5 # +/- fraction of jitter added to spread retries out |
There was a problem hiding this comment.
super nit: the comment says +/- fraction but the jitter is purely additive — _backoff_delay computes base * (1 + _BACKOFF_JITTER * random.random()), giving a delay in [base, base * 1.5] (never below base). The _backoff_delay docstring describes it correctly as additive [0, _BACKOFF_JITTER * base]; only this constant's inline comment is off. Consider # +fraction of jitter added ... to match. (not blocking)
There was a problem hiding this comment.
Retry logic is correct and thoroughly tested: retries=False makes the explicit loop the sole arbiter of the attempt budget, 4xx stays fatal, 5xx + transport errors retry with bounded backoff+jitter, and the lock-free fast path correctly avoids serializing cache hits behind an in-flight mint. One super-nit comment inline.
Closes #113. The JWT token-exchange (
POST /v1/auth/jwt) now retries transient failures — 5xx responses and transport errors — with a bounded budget (3 attempts) and exponential backoff plus jitter, while 4xx stays fatal; the cached-JWT fast path is served lock-free so a retrying mint can't block callers that need no mint.