Skip to content

fix(testing): scale local durable timers#501

Open
zhongkechen wants to merge 5 commits into
mainfrom
codex/scale-local-durable-timers
Open

fix(testing): scale local durable timers#501
zhongkechen wants to merge 5 commits into
mainfrom
codex/scale-local-durable-timers

Conversation

@zhongkechen

@zhongkechen zhongkechen commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add a shared DURABLE_EXECUTION_TIME_SCALE helper for local durable test timers.
  • Apply the scale to wait timers, step retry timers, callback timeouts, and callback heartbeat timeouts.
  • Keep wait operation history faithful to the modeled durable duration: only the local scheduler delay is scaled, while WaitStartedDetails.ScheduledEndTimestamp remains unscaled.
  • Default local example tests to 0.05 scale so modeled durable delays do not spend unnecessary wall-clock time in CI.
  • Keep a 5s floor for long callback timeout/heartbeat timers so externally-driven callback tests do not race unrealistically short deadlines.
  • Let heartbeat-focused examples opt out with time_scale="1.0" because they intentionally test real heartbeat timing.

Context

These commits were split out from PR #498. They are not part of the root-cause fix for the scheduler hang or the Python 3.13/3.14 callback race; those remain in PR #498.

This PR keeps the timer scaling change separate so it can be reviewed as a test-duration improvement for local durable timers.

Changes

  • Add aws_durable_execution_sdk_python_testing.time_scale.
  • Reuse the helper in wait and step retry processors instead of parsing DURABLE_EXECUTION_TIME_SCALE inline.
  • Scale wait scheduling without scaling the wait history event timestamp.
  • Scale callback timeout and heartbeat scheduling in the local test executor.
  • Preserve short callback deadlines by using max(scaled_delay, min(original_delay, 5s)) for long callback timers.
  • Add test coverage for scaled wait scheduling with unscaled history, step retry scaling, callback timeout scaling, callback heartbeat scaling, and short-deadline preservation.

Verification

  • hatch fmt --check
  • hatch run types:check
  • git diff --check
  • hatch run test:all packages/aws-durable-execution-sdk-python-testing/tests/checkpoint/processors/wait_test.py packages/aws-durable-execution-sdk-python-testing/tests/checkpoint/processors/step_test.py packages/aws-durable-execution-sdk-python-testing/tests/executor_test.py -q - 138 passed
  • Previous heartbeat example verification: hatch run test:all packages/aws-durable-execution-sdk-python-examples/test/callback/test_callback_heartbeat.py packages/aws-durable-execution-sdk-python-examples/test/wait_for_callback/test_wait_for_callback_heartbeat.py -q - 2 passed

@zhongkechen zhongkechen marked this pull request as draft June 30, 2026 22:37
@zhongkechen zhongkechen marked this pull request as ready for review June 30, 2026 22:51
@SilanHe

SilanHe commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

I'm not against this change since you're already done the work but barring more comments, I think it would be better to just port the "skip time" functionality from the other testing libraries

@zhongkechen

zhongkechen commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

I'm not against this change since you're already done the work but barring more comments, I think it would be better to just port the "skip time" functionality from the other testing libraries

Time scaling is already used in waits. I just extended it to step retries and other operations that have delays. It is different from Java i think because the way python test cases interact with the execution is different. In Java local test runner, the runner stops execution after each invocation and waits for user interaction while pythons runner runs execution concurrently when user interacts with the execution.

@yaythomas

yaythomas commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

There's a pre-existing issue here (not your fault!) where Wait timers leak scaled time into the observable history, when arguably the event history should contain the un-scaled time.

scaled_wait_seconds = scale_delay(wait_seconds)
scheduled_end_timestamp = datetime.now(UTC) + timedelta(seconds=scaled_wait_seconds)
wait_details = WaitDetails(scheduled_end_timestamp=scheduled_end_timestamp)

The Callback timeout / heartbeat you're adding here don't have this issue.

I have a big refactor coming that re-arranges how the test Executor works and that makes it compatible with all the existing JS example tests (featuring concurrency). I've split it into 2 PRs for easier reviewing, but each is still pretty big 😓.

I still have to rebase these coming PRs on the recent changes in this repo, which is not going to be a lot of fun.

If you'd like, I can pick up this PR here and incorporate it after I got those 2 PRs in? It would make my re-basing life simpler, and I will happily pick up the great work you did here and do the merging legwork afterwards?

@zhongkechen

zhongkechen commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

It would make my re-basing life simpler

create a draft PR for your changes and I'll be happy to help you merge

@zhongkechen zhongkechen force-pushed the codex/scale-local-durable-timers branch from 48b3674 to 6f1fb2d Compare July 2, 2026 17:00
@zhongkechen

Copy link
Copy Markdown
Contributor Author

Fixed the history event for wait operation in a new commit: 6f1fb2d

@zhongkechen zhongkechen force-pushed the codex/scale-local-durable-timers branch from 6f1fb2d to fa3a480 Compare July 2, 2026 18:25
@zhongkechen

zhongkechen commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Tests failed due to this issue: #405

The fIx of the issue is in another PR: #505

@zhongkechen zhongkechen self-assigned this Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants