gh-151646: Fix data race between gc.get_stats() and GC in free-threading builds by zangjiucheng · Pull Request #151766 · python/cpython

zangjiucheng · 2026-06-20T03:42:38Z

In free-threading builds, gc_get_stats_impl() reads per-generation stats
with plain struct copies while gc_collect_main() writes the same fields
after _Py_StartTheWorld() — no synchronisation on either side.

Fix: add PyMutex stats_mutex to _gc_runtime_state (under
Py_GIL_DISABLED) and hold it around both the seven field writes in the
writer and the three struct copies in the reader. The lock is acquired after
_Py_StartTheWorld(), so no deadlock with stop-the-world. GIL build is
unchanged.

Also adds Lib/test/test_free_threading/test_gc_get_stats_race.py.

Verified with TSAN on Linux: race confirmed on main, clean on the patch.

🤖 Generated with Claude Code

Issue: Data race between gc.get_stats() and a concurrent collection under free-threading #151646

…bage collection

LindaSummer · 2026-07-02T06:19:16Z

+       sees a consistent snapshot rather than racing on these fields
+       (gh-151646).  get_stats() reads buffer->index inside the lock so that
+       both writer and reader resolve the same slot under the same mutex. */
+    PyMutex_Lock(&gcstate->stats_mutex);


It seems the guard macro Py_GIL_DISABLED is missing here and below the unlock.

The whole gc_free_threading.c file is wrapped in #ifdef Py_GIL_DISABLED so I don't think that's needed.

nascheme · 2026-07-03T19:29:12Z

This looks okay to me. Some comments:

The unit test is too heavy, IMHO. It seems 50 collections is enough to trigger the TSAN warning, at least for me. I think you should just loop on the writer (gc.collect()) and exit the readers when the writer is done.
I suggest reducing the comments a bit. I would have a comment for the stats_mutex struct member, explaining what it protects. Where you acquire and release it you don't need a comment (people can look at the member comment to see the purpose of that mutex and when it needs to be held). The "so they can run concurrently under free-threading without a data race" part could be removed too. People generally know what a mutex is for.

Suggested unit test change:

diff --git a/Lib/test/test_free_threading/test_gc_get_stats_race.py b/Lib/test/test_free_threading/test_gc_get_stats_race.py
index a91bdce0304..dc609a98fdf 100644
--- a/Lib/test/test_free_threading/test_gc_get_stats_race.py
+++ b/Lib/test/test_free_threading/test_gc_get_stats_race.py
@@ -43,49 +43,41 @@
 
 
 # One thread hammers gc.collect() (the writer side); enough reader threads run
-# gc.get_stats() concurrently to make the race easy to observe.  The iteration
-# counts are deliberately fixed and finite so the script always terminates.
-NUM_COLLECTORS = 1
-NUM_READERS = 8
-ITERATIONS = 50_000
+# gc.get_stats() concurrently to make the race easy to observe.  Readers run
+# until the writer is done so the test duration is controlled by the collection
+# count rather than by reader loop counts.
+NUM_READERS = 4
+ITERATIONS = 50
 
 
-def _stress_get_stats_race(num_collectors=NUM_COLLECTORS,
-                           num_readers=NUM_READERS,
-                           iterations=ITERATIONS):
-    """Race gc.collect() against gc.get_stats() and return collected stats."""
+def _stress_get_stats_race(num_readers=NUM_READERS, iterations=ITERATIONS):
+    """Race gc.collect() against gc.get_stats()."""
 
-    # Synchronise the start so collectors and readers overlap for as long as
+    # Synchronise the start so the writer and readers overlap for as long as
     # possible, maximising the chance of the read and write landing on the
     # statistics struct at the same time.
-    barrier = threading.Barrier(num_collectors + num_readers)
+    done = threading.Event()
 
     def collector():
-        barrier.wait()
-        for _ in range(iterations):
-            # Writer: each full collection updates gcstate->generation_stats.
-            gc.collect()
+        try:
+            for _ in range(iterations):
+                # Writer: each full collection updates gcstate->generation_stats.
+                gc.collect()
+        finally:
+            done.set()
 
     def reader():
-        barrier.wait()
-        for _ in range(iterations):
+        while not done.is_set():
             # Reader: copies the per-generation stats structs with no lock.
             gc.get_stats()
 
-    threads = [threading.Thread(target=collector) for _ in range(num_collectors)]
-    threads += [threading.Thread(target=reader) for _ in range(num_readers)]
-
-    with threading_helper.start_threads(threads):
-        pass
+    threading_helper.run_concurrently([collector] + [reader] * num_readers)
 
 
 @threading_helper.requires_working_threading()
 class TestGCGetStatsRace(unittest.TestCase):
     def test_get_stats_collect_race(self):
-        # Use a reduced iteration count under the regular test suite so the test
-        # stays reasonably quick while still exercising the race; the standalone
-        # __main__ path below uses the full ITERATIONS for a reliable repro.
-        _stress_get_stats_race(iterations=2_000)
+        _stress_get_stats_race()
 
         # The race is benign at the Python level: gc.get_stats() must still
         # return well-formed data and the interpreter must not crash.
@@ -98,10 +90,10 @@ def test_get_stats_collect_race(self):
 
 
 if __name__ == "__main__":
-    # Standalone reproduction: run the full-size race and exit cleanly so the
-    # script can be reused as a regression check once the fix lands.
-    print(f"Racing {NUM_COLLECTORS} gc.collect() thread(s) against "
-          f"{NUM_READERS} gc.get_stats() thread(s), {ITERATIONS} iterations each...")
+    # Standalone reproduction: run the race and exit cleanly so the script can
+    # be reused as a regression check once the fix lands.
+    print(f"Racing 1 gc.collect() thread against "
+          f"{NUM_READERS} gc.get_stats() thread(s), {ITERATIONS} collections...")
     _stress_get_stats_race()
     print("Done (no Python-level crash). "
           "Run under a free-threading + TSAN build to observe the data race.")

zangjiucheng · 2026-07-03T19:31:31Z

I will take a look after work and address comments, thanks for reviewing :)

nascheme · 2026-07-03T19:41:02Z

I revised the unit test to use threading_helper.run_concurrently(), simplifies it a bit.

Address review feedback (nascheme) on pythonGH-151766: - The stress test looped 8 reader threads x 50,000 iterations each. Run one writer for a fixed number of gc.collect() calls and have the readers loop until the writer signals done, so the duration is driven by the collection count. Use threading_helper.run_concurrently() for start synchronisation. - Trim the stats_mutex comments: keep one comment on the struct member saying what it protects, and drop the now-redundant comments at the lock sites. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

zangjiucheng · 2026-07-04T01:30:06Z

Thanks, applied both suggestions.

Switched to one writer looping over a fixed number of gc.collect() calls with the readers running until it signals done (via threading_helper.run_concurrently), and dropped the counts to NUM_READERS = 4 / ITERATIONS = 50. On my machine (free-threading debug build) the in-suite test went from ~3.3 s to ~0.08 s, and the old standalone 50k-iteration repro dropped from ~80 s to ~0.08 s. It still races concurrent gc.collect() against gc.get_stats() in the same window; I'm relying on your observation that ~50 collections is enough to surface the TSan warning (I don't have a TSan build handy locally, but it passes cleanly and quickly on a regular free-threading build).

Fix data race in free-threading builds between gc.get_stats() and gar…

47cfd29

…bage collection

zangjiucheng requested a review from pablogsal as a code owner June 20, 2026 03:42

bedevere-app Bot mentioned this pull request Jun 20, 2026

Data race between gc.get_stats() and a concurrent collection under free-threading #151646

Closed

bedevere-app Bot added the awaiting review label Jun 20, 2026

LindaSummer reviewed Jul 2, 2026

View reviewed changes

nascheme merged commit 8b1dbb1 into python:main Jul 4, 2026
54 checks passed

bedevere-app Bot removed the awaiting review label Jul 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-151646: Fix data race between gc.get_stats() and GC in free-threading builds#151766

gh-151646: Fix data race between gc.get_stats() and GC in free-threading builds#151766
nascheme merged 2 commits into
python:mainfrom
zangjiucheng:gh-151646-fix

zangjiucheng commented Jun 20, 2026 •

edited by bedevere-app Bot

Loading

Uh oh!

LindaSummer Jul 2, 2026

Uh oh!

nascheme Jul 3, 2026

Uh oh!

nascheme commented Jul 3, 2026 •

edited

Loading

Uh oh!

zangjiucheng commented Jul 3, 2026

Uh oh!

nascheme commented Jul 3, 2026

Uh oh!

zangjiucheng commented Jul 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Uh oh!

Conversation

zangjiucheng commented Jun 20, 2026 • edited by bedevere-app Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LindaSummer Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

nascheme Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

nascheme commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zangjiucheng commented Jul 3, 2026

Uh oh!

nascheme commented Jul 3, 2026

Uh oh!

zangjiucheng commented Jul 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zangjiucheng commented Jun 20, 2026 •

edited by bedevere-app Bot

Loading

nascheme commented Jul 3, 2026 •

edited

Loading