Skip to content

fix(server): treat database timeout errors as client-retryable (503)#22315

Draft
devin-ai-integration[bot] wants to merge 2 commits into
mainfrom
devin/1781660358-fix-db-timeout-retryable
Draft

fix(server): treat database timeout errors as client-retryable (503)#22315
devin-ai-integration[bot] wants to merge 2 commits into
mainfrom
devin/1781660358-fix-db-timeout-retryable

Conversation

@devin-ai-integration

Copy link
Copy Markdown
Contributor

closes #22314

When many deployments poll the server simultaneously, the connection pool can be exhausted and asyncpg.connect() times out. Previously these TimeoutError exceptions were unrecognized by is_client_retryable_exception(), returning HTTP 500. Clients don't retry 500s, so the user sees noisy error logs and failed requests during transient load spikes.

This PR adds three exception types to the retryable set so the server returns 503 instead, enabling automatic client retries with backoff:

 is_client_retryable_exception():
+  sqlalchemy.exc.TimeoutError      # pool checkout timeout (all pool_size + max_overflow slots busy)
+  asyncpg.TooManyConnectionsError  # PostgreSQL max_connections reached
+  TimeoutError                     # asyncpg connection creation timeout

Checklist

  • This pull request references any related issue by including "closes <link to issue>"
  • If this pull request adds new functionality, it includes unit tests that cover the changes
  • If this pull request removes docs files, it includes redirect settings in mint.json.
  • If this pull request adds functions or classes, it includes helpful docstrings.

Link to Devin session: https://app.devin.ai/sessions/68eca37c30704d60a68b1794aadf9a4c

devin-ai-integration Bot and others added 2 commits June 17, 2026 01:41
Under high concurrency (many deployments polling simultaneously), the
connection pool can be exhausted and asyncpg connection attempts time out.
Previously these TimeoutError exceptions returned HTTP 500, which clients
do not retry. Now they return HTTP 503, enabling automatic client retries
with backoff.

Added to retryable exceptions:
- TimeoutError (asyncpg connection timeout during pool overflow)
- sqlalchemy.exc.TimeoutError (pool checkout timeout)
- asyncpg.exceptions.TooManyConnectionsError (PostgreSQL max_connections)

closes #22314

Co-authored-by: Alexander Streed <alex.s@prefect.io>
@devin-ai-integration

Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment, CI, and merge conflict monitoring

@github-actions github-actions Bot added the bug Something isn't working label Jun 17, 2026
@codspeed-hq

codspeed-hq Bot commented Jun 17, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 8 untouched benchmarks
⏩ 1 skipped benchmark1


Comparing devin/1781660358-fix-db-timeout-retryable (c0e2920) with main (7107daf)

Open in CodSpeed

Footnotes

  1. 1 benchmark was skipped, so the baseline result was used instead. If it was deleted from the codebase, click here and archive it to remove it from the performance reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prefect Server got a lot of Timeouts

0 participants