Skip to content

gh-152718: Reject oversized table counts in the profiling binary reader#152719

Merged
pablogsal merged 5 commits into
python:mainfrom
tonghuaroot:fix-profiling-reader-alloc
Jul 4, 2026
Merged

gh-152718: Reject oversized table counts in the profiling binary reader#152719
pablogsal merged 5 commits into
python:mainfrom
tonghuaroot:fix-profiling-reader-alloc

Conversation

@tonghuaroot

@tonghuaroot tonghuaroot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

This bounds the eager string-table and frame-table allocations in the
_remote_debugging binary profile reader against the file size before
allocating, so a .pyb file whose footer declares an oversized count is
rejected with a ValueError instead of triggering a multi-gigabyte allocation.

Each table entry occupies at least one byte on disk (a string is a >=1-byte
length varint, a frame is six >=1-byte varints plus the opcode byte), so a count
larger than (file_size - table_offset) / MIN_ENTRY_SIZE cannot be backed by
real data. This mirrors the existing RLE-count bound in binary_reader_replay.
Legitimate files are unaffected.

Regression tests in test_binary_format patch the footer counts and assert the
reader rejects both an oversized and a one-past-capacity count, while a count
exactly at the cap still opens.

@pablogsal pablogsal left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

}
#endif

size_t max_frames =

@maurycy maurycy Jul 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering about a helper. With these two and this one:

/* Validate RLE count to prevent DoS from malicious files.
* Each RLE sample needs at least 2 bytes (1 byte min varint + 1 status byte).
* Also reject absurdly large counts that would exhaust memory. */
size_t remaining_data = reader->sample_data_size - offset;
size_t max_possible_samples = remaining_data / 2;
if (count > max_possible_samples) {
PyErr_Format(PyExc_ValueError,
"Invalid RLE count %u exceeds maximum possible %zu for remaining data",
count, max_possible_samples);
return -1;
}

it'd be three. Something along the lines of reader_validate_count("RLE", count, reader->sample_data_size - offset, 2) etc.

Generally, wondering about centralizing all these validations in a single place.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea — done in 572947c: the string, frame and RLE checks now share a reader_validate_count(what, count, available, min_entry_size) helper. One heads-up: the RLE call sits in the region #152722 also touches, so whichever lands first the other will need a small rebase there. Thanks!

…count

Addresses review feedback: centralize the string/frame/RLE oversized-count
validations behind one helper.
@pablogsal

Copy link
Copy Markdown
Member

@tonghuaroot seems tests are failing can you take a look?

On 32-bit builds a footer frame count of 0xFFFFFFFF overflows the
allocation size, so the size_t overflow guard raises OverflowError before
the count-vs-file-size guard raises ValueError.  Accept either rejection.
@tonghuaroot

Copy link
Copy Markdown
Contributor Author

Thanks @pablogsal, pushed a fix in 7c23d37. The failure was 32-bit only (Win32 and Emscripten): test_open_rejects_frame_count_larger_than_file patches the frame count to 0xFFFFFFFF, which on a 32-bit build overflows frames_count * sizeof(FrameEntry), so the pre-existing SIZEOF_SIZE_T < 8 guard raises OverflowError before the new count-vs-file-size check raises ValueError. Both reject the file, so I relaxed the test to accept either. Re-running CI now.

@pablogsal pablogsal added the needs backport to 3.15 pre-release feature fixes, bugs and security fixes label Jul 4, 2026
@pablogsal pablogsal merged commit 8e580b0 into python:main Jul 4, 2026
56 checks passed
@miss-islington-app

Copy link
Copy Markdown

Thanks @tonghuaroot for the PR, and @pablogsal for merging it 🌮🎉.. I'm working now to backport this PR to: 3.15.
🐍🍒⛏🤖

@pablogsal

Copy link
Copy Markdown
Member

Thanks for te PR @tonghuaroot

@bedevere-app

bedevere-app Bot commented Jul 4, 2026

Copy link
Copy Markdown

GH-153050 is a backport of this pull request to the 3.15 branch.

@bedevere-app bedevere-app Bot removed the needs backport to 3.15 pre-release feature fixes, bugs and security fixes label Jul 4, 2026
pablogsal pushed a commit that referenced this pull request Jul 4, 2026
…ry reader (GH-152719) (#153050)

gh-152718: Reject oversized table counts in the profiling binary reader (GH-152719)
(cherry picked from commit 8e580b0)

Co-authored-by: tonghuaroot (童话) <tonghuaroot@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants