move json line tracking off the success path by henderkes · Pull Request #22487 · php/php-src

henderkes · 2026-06-27T15:45:37Z

should get rid of most of the performance penalty introduced by b80ffc5

Reasoning: I'm thinking that successful json parses should be much more common than failed ones. So instead of paying a noticeable cost on every successful json parse, that never reports an error, we should instead only pay error reporting cost when it happens. This is a trade-off of paying a higher penalty in the error case to keep the expected case clean.

I've not benchmarked the error path yet, but for running phpstan on phpstan-src, it shaves about a second off of execution time.

henderkes · 2026-06-29T06:10:35Z

build	success ns/op	success total(ms)	failure ns/op	failure total(ms)
gcc	34877.6	6975.51	34213.4	6842.67
gcc patch	30101.7	6020.34	32081.2	6416.25
clang	28449.1	5689.83	27742.5	5548.51
clang patch	27606.1	5521.22	34205.9	6841.18

Script to benchmark:

bench.php

<?php
function build_payloads(): array {
    $rows = [];
    for ($i = 0; $i < 30; $i++) {
        $rows[] = [
            'id'     => $i,
            'name'   => "item_\u{00e9}\u{2603}_$i",
            'active' => ($i % 2 === 0),
            'score'  => $i * 1.5 + 0.25,
            'tags'   => ['alpha', 'beta', "gamma\ndelta", "quote\"inside"],
            'meta'   => ['nested' => ['deep' => ['value' => "text with \"quotes\" and \\ backslash and /slash"]]],
            'note'   => null,
            'big'    => 123456789012345,
        ];
    }
    $data = ['rows' => $rows, 'count' => 30, 'ok' => true, 'ratio' => 3.14159];
    $doc  = json_encode($data);
    $bad = $doc;
    $pos = strrpos($bad, 'backslash');
    if ($pos === false) {
        fwrite(STDERR, "payload generation failed: marker not found\n");
        exit(2);
    }
    $bad[$pos] = "\x01";
    return ['success' => $doc, 'failure' => $bad];
}

function run_case(string $doc, int $iters): array {
    $ok = 0; $fail = 0;
    $t = hrtime(true);
    for ($i = 0; $i < $iters; $i++) {
        $r = json_decode($doc, true);
        if ($r === null && json_last_error() !== JSON_ERROR_NONE) {
            $fail++;
        } else {
            $ok++;
        }
    }
    $ns = hrtime(true) - $t;
    return [
        'iters'    => $iters,
        'ok'       => $ok,
        'fail'     => $fail,
        'ns_total' => $ns,
        'ns_op'    => $iters > 0 ? $ns / $iters : 0.0,
        'bytes'    => strlen($doc),
        'last_err' => json_last_error_msg(),
    ];
}

$args  = array_slice($argv, 1);
$iters = 200000;
$csv   = false;
$emit  = null;
for ($i = 0; $i < count($args); $i++) {
    $a = $args[$i];
    if ($a === '--csv') {
        $csv = true;
    } elseif ($a === '--emit') {
        $emit = $args[++$i] ?? '.';
    } elseif ($a !== '' && $a === (string) (int) $a) {
        $iters = (int) $a;
    } else {
        fwrite(STDERR, "unknown arg: $a\n");
        exit(2);
    }
}

$payloads = build_payloads();
if ($emit !== null) {
    file_put_contents("$emit/success.json", $payloads['success']);
    file_put_contents("$emit/failure.json", $payloads['failure']);
    printf("wrote %s/success.json (%d bytes) and %s/failure.json (%d bytes)\n",
        $emit, strlen($payloads['success']), $emit, strlen($payloads['failure']));
    exit(0);
}

$probe_ok   = json_decode($payloads['success'], true);
$ok_digest  = md5(serialize($probe_ok));
json_decode($payloads['failure'], true);
$fail_msg   = json_last_error_msg();
for ($i = 0; $i < 1000; $i++) { json_decode($payloads['success'], true); }

$results = [];
foreach (['success', 'failure'] as $case) {
    $results[$case] = run_case($payloads[$case], $iters);
}

$banner = sprintf("php %s | %s | iters=%d",
    PHP_VERSION, defined('PHP_ZTS') && PHP_ZTS ? 'ZTS' : 'NTS', $iters);
if ($csv) {
    foreach ($results as $case => $r) {
        printf("%s,%d,%d,%d,%d,%.1f,%d\n",
            $case, $r['iters'], $r['ok'], $r['fail'], $r['ns_total'], $r['ns_op'], $r['bytes']);
    }
    exit(0);
}

echo $banner . "\n";
echo "success decode digest: $ok_digest\n";
echo "failure error message: $fail_msg\n";
printf("%-9s %12s %12s %14s %10s %10s\n",
    'case', 'iters', 'bytes', 'total(ms)', 'ns/op', 'ok/fail');
foreach ($results as $case => $r) {
    printf("%-9s %12d %12d %14.2f %10.1f %6d/%-6d\n",
        $case, $r['iters'], $r['bytes'], $r['ns_total'] / 1e6, $r['ns_op'],
        $r['ok'], $r['fail']);
}

…22487

TimWolla

LGTM, except remark.

bukka

Can we add more tests. I would like to see multiline not ended string here because currently it report start of the string so how it's going to work on multiple lines.

In general I don't have a problem with this. I just want more test so it doesn't end up like before.

bukka · 2026-06-30T10:18:35Z

 bool(false)
 int(3)
-string(71) "Control character error, possibly incorrectly encoded near location 1:9"
+string(72) "Control character error, possibly incorrectly encoded near location 1:24"


I just confused by this because this completely wrong test - it doesn't test UTF-16 key but missing " after value. The new position looks correct, it's just so confusing

Relatedly: #22527.

As we just discussed in DMs: I think the test is not trying to say that the key is an erroneous UTF-16 sequence, but instead that the key is UTF-16 and the error is afterwards.

Basically this is testing that the old broken column calculation works correctly for UTF-16 keys.

This test can probably just be removed entirely, because we no longer handle unicode escape sequences in a special way.

@bukka updated the description, but left out extra tests for this issue in this PR because Tim already pushed in #22528

henderkes · 2026-06-30T10:41:52Z

@TimWolla I switched to uint64_t everywhere for this now. size_t was 32bit on x86-windows, it's now 64 bits. Perhaps we can relax this to 32 bit in general, probably not very realistic someone would reach the limit?

I'll look into more tests and finding any issues related to the previous change.

TimWolla · 2026-06-30T10:45:48Z

Perhaps we can relax this to 32 bit in general, probably not very realistic someone would reach the limit?

4 GB of JSON is unlikely, but possible (for 64 platforms). A single 64-bit increment during regular parsing should be plenty fast even on 32-bit platforms, I consider using uint64_t everywhere to be acceptable.

github-actions Bot added Category: Build System Extension: json Extension: tokenizer ABI break labels Jun 27, 2026

yyjson: move decode error line tracking off the hot path

8255bed

henderkes force-pushed the perf/yyjson branch from a01f119 to 8255bed Compare June 28, 2026 02:55

henderkes changed the title ~~move json line tracking off the hot path~~ move json line tracking off the success path Jun 28, 2026

henderkes marked this pull request as ready for review June 28, 2026 15:43

henderkes requested a review from bukka as a code owner June 28, 2026 15:43

TimWolla requested changes Jun 29, 2026

View reviewed changes

Comment thread ext/json/php_json_scanner.h Outdated

Comment thread ext/json/json_parser.y Outdated

TimWolla mentioned this pull request Jun 29, 2026

Incorrect error column in PHP 8.6 JSON parser #22514

Open

henderkes added 2 commits June 30, 2026 10:17

suggestion: use uint64_t instead of size_t

2d7b217

column tracking: count by source characters, not by decoded ones php#…

9c2ec07

…22487

TimWolla approved these changes Jun 30, 2026

View reviewed changes

Comment thread ext/json/json_parser.y Outdated

Comment thread ext/json/json_parser.y Outdated

TimWolla reviewed Jun 30, 2026

View reviewed changes

Comment thread ext/json/json_parser.y Outdated

bukka requested changes Jun 30, 2026

View reviewed changes

json: switch line and column from size_t to uint64_t

38f5357

update test descriptions

7a8595a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

move json line tracking off the success path#22487

move json line tracking off the success path#22487
henderkes wants to merge 5 commits into
php:masterfrom
henderkes:perf/yyjson

henderkes commented Jun 27, 2026 •

edited

Loading

Uh oh!

henderkes commented Jun 29, 2026

Uh oh!

Uh oh!

Uh oh!

TimWolla left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bukka left a comment

Uh oh!

bukka Jun 30, 2026

Uh oh!

TimWolla Jun 30, 2026

Uh oh!

TimWolla Jun 30, 2026

Uh oh!

henderkes Jun 30, 2026 •

edited

Loading

Uh oh!

henderkes commented Jun 30, 2026

Uh oh!

TimWolla commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

henderkes commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henderkes commented Jun 29, 2026

Uh oh!

Uh oh!

Uh oh!

TimWolla left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bukka left a comment

Choose a reason for hiding this comment

Uh oh!

bukka Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

TimWolla Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

TimWolla Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

henderkes Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henderkes commented Jun 30, 2026

Uh oh!

TimWolla commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

henderkes commented Jun 27, 2026 •

edited

Loading

henderkes Jun 30, 2026 •

edited

Loading