Skip to content

Data race on the GC debug flag (gc.set_debug/get_debug) in free-threading builds #153014

Description

@tonghuaroot

Bug report

In a free-threading build (--disable-gil), the garbage collector debug flag
gcstate->debug is read and written without synchronisation:

  • Write: gc_set_debug_impl() (Modules/gcmodule.c) does a plain
    gcstate->debug = flags. Unlike gc_set_threshold_impl(), which runs its
    free-threading branch under _PyEval_StopTheWorld(), gc.set_debug() takes
    no lock and does not stop the world.
  • Read: gc_get_debug_impl() returns gcstate->debug directly, and the
    collector in Python/gc_free_threading.c reads gcstate->debug /
    interp->gc.debug in several places while walking the graph.

So one thread calling gc.set_debug() concurrently with another calling
gc.get_debug() or triggering a collection is an unsynchronised read/write of
the same int. The flag is only an int, so the effect stays benign at the
Python level, but it is undefined behaviour under C11 and ThreadSanitizer
reports it as a data race.

This is the same class of issue already fixed for sys dlopenflags
(gh-151644) and gc.get_stats() (gh-151646). gc.enable() / gc.disable()
in the same file already access gcstate->enabled atomically; the debug flag
was missed.

ThreadSanitizer output

Built with ./configure --with-thread-sanitizer --disable-gil and stressed
with concurrent gc.set_debug() / gc.get_debug() plus a thread churning
cyclic garbage so the collector runs:

WARNING: ThreadSanitizer: data race
  Write of size 4 at 0x...6c by thread T2:
    #0 gc_set_debug gcmodule.c.h:186
  Previous write of size 4 at 0x...6c by thread T1:
    #0 gc_set_debug gcmodule.c.h:186
  Location is global '_PyRuntime'

How to reproduce

  1. ./configure --with-thread-sanitizer --disable-gil && make
  2. Run a script that starts a few threads calling gc.set_debug(...) /
    gc.get_debug() in a loop, plus a thread that builds reference cycles and
    calls gc.collect().
  3. TSan reports the write/write race on gcstate->debug.

Suggested fix

Access the flag with FT_ATOMIC_STORE_INT_RELAXED / FT_ATOMIC_LOAD_INT_RELAXED
in gc_set_debug_impl() / gc_get_debug_impl() and in the collector reads,
matching how gcstate->enabled and dlopenflags are already handled. Relaxed
ordering is correct for an independent int flag. These wrappers compile to a
plain load/store in the default (GIL) build, so there is no change there.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions