Skip to content

move json line tracking off the success path#22487

Open
henderkes wants to merge 5 commits into
php:masterfrom
henderkes:perf/yyjson
Open

move json line tracking off the success path#22487
henderkes wants to merge 5 commits into
php:masterfrom
henderkes:perf/yyjson

Conversation

@henderkes

@henderkes henderkes commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

should get rid of most of the performance penalty introduced by b80ffc5

cross-link #22277 (comment)

Reasoning: I'm thinking that successful json parses should be much more common than failed ones. So instead of paying a noticeable cost on every successful json parse, that never reports an error, we should instead only pay error reporting cost when it happens. This is a trade-off of paying a higher penalty in the error case to keep the expected case clean.

I've not benchmarked the error path yet, but for running phpstan on phpstan-src, it shaves about a second off of execution time.

@henderkes henderkes changed the title move json line tracking off the hot path move json line tracking off the success path Jun 28, 2026
@henderkes henderkes marked this pull request as ready for review June 28, 2026 15:43
@henderkes henderkes requested a review from bukka as a code owner June 28, 2026 15:43
@henderkes

Copy link
Copy Markdown
Contributor Author
buildsuccess ns/opsuccess total(ms)failure ns/opfailure total(ms)
gcc34877.66975.5134213.46842.67
gcc patch30101.76020.3432081.26416.25
clang28449.15689.8327742.55548.51
clang patch27606.15521.2234205.96841.18

Script to benchmark:

bench.php
<?php
function build_payloads(): array {
    $rows = [];
    for ($i = 0; $i < 30; $i++) {
        $rows[] = [
            'id'     => $i,
            'name'   => "item_\u{00e9}\u{2603}_$i",
            'active' => ($i % 2 === 0),
            'score'  => $i * 1.5 + 0.25,
            'tags'   => ['alpha', 'beta', "gamma\ndelta", "quote\"inside"],
            'meta'   => ['nested' => ['deep' => ['value' => "text with \"quotes\" and \\ backslash and /slash"]]],
            'note'   => null,
            'big'    => 123456789012345,
        ];
    }
    $data = ['rows' => $rows, 'count' => 30, 'ok' => true, 'ratio' => 3.14159];
    $doc  = json_encode($data);
    $bad = $doc;
    $pos = strrpos($bad, 'backslash');
    if ($pos === false) {
        fwrite(STDERR, "payload generation failed: marker not found\n");
        exit(2);
    }
    $bad[$pos] = "\x01";
    return ['success' => $doc, 'failure' => $bad];
}

function run_case(string $doc, int $iters): array {
    $ok = 0; $fail = 0;
    $t = hrtime(true);
    for ($i = 0; $i < $iters; $i++) {
        $r = json_decode($doc, true);
        if ($r === null && json_last_error() !== JSON_ERROR_NONE) {
            $fail++;
        } else {
            $ok++;
        }
    }
    $ns = hrtime(true) - $t;
    return [
        'iters'    => $iters,
        'ok'       => $ok,
        'fail'     => $fail,
        'ns_total' => $ns,
        'ns_op'    => $iters > 0 ? $ns / $iters : 0.0,
        'bytes'    => strlen($doc),
        'last_err' => json_last_error_msg(),
    ];
}

$args  = array_slice($argv, 1);
$iters = 200000;
$csv   = false;
$emit  = null;
for ($i = 0; $i < count($args); $i++) {
    $a = $args[$i];
    if ($a === '--csv') {
        $csv = true;
    } elseif ($a === '--emit') {
        $emit = $args[++$i] ?? '.';
    } elseif ($a !== '' && $a === (string) (int) $a) {
        $iters = (int) $a;
    } else {
        fwrite(STDERR, "unknown arg: $a\n");
        exit(2);
    }
}

$payloads = build_payloads();
if ($emit !== null) {
    file_put_contents("$emit/success.json", $payloads['success']);
    file_put_contents("$emit/failure.json", $payloads['failure']);
    printf("wrote %s/success.json (%d bytes) and %s/failure.json (%d bytes)\n",
        $emit, strlen($payloads['success']), $emit, strlen($payloads['failure']));
    exit(0);
}

$probe_ok   = json_decode($payloads['success'], true);
$ok_digest  = md5(serialize($probe_ok));
json_decode($payloads['failure'], true);
$fail_msg   = json_last_error_msg();
for ($i = 0; $i < 1000; $i++) { json_decode($payloads['success'], true); }

$results = [];
foreach (['success', 'failure'] as $case) {
    $results[$case] = run_case($payloads[$case], $iters);
}

$banner = sprintf("php %s | %s | iters=%d",
    PHP_VERSION, defined('PHP_ZTS') && PHP_ZTS ? 'ZTS' : 'NTS', $iters);
if ($csv) {
    foreach ($results as $case => $r) {
        printf("%s,%d,%d,%d,%d,%.1f,%d\n",
            $case, $r['iters'], $r['ok'], $r['fail'], $r['ns_total'], $r['ns_op'], $r['bytes']);
    }
    exit(0);
}

echo $banner . "\n";
echo "success decode digest: $ok_digest\n";
echo "failure error message: $fail_msg\n";
printf("%-9s %12s %12s %14s %10s %10s\n",
    'case', 'iters', 'bytes', 'total(ms)', 'ns/op', 'ok/fail');
foreach ($results as $case => $r) {
    printf("%-9s %12d %12d %14.2f %10.1f %6d/%-6d\n",
        $case, $r['iters'], $r['bytes'], $r['ns_total'] / 1e6, $r['ns_op'],
        $r['ok'], $r['fail']);
}

Comment thread ext/json/php_json_scanner.h Outdated
Comment thread ext/json/json_parser.y Outdated

@TimWolla TimWolla left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except remark.

Comment thread ext/json/json_parser.y Outdated
Comment thread ext/json/json_parser.y Outdated
Comment thread ext/json/json_parser.y Outdated

@bukka bukka left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add more tests. I would like to see multiline not ended string here because currently it report start of the string so how it's going to work on multiple lines.

In general I don't have a problem with this. I just want more test so it doesn't end up like before.

bool(false)
int(3)
string(71) "Control character error, possibly incorrectly encoded near location 1:9"
string(72) "Control character error, possibly incorrectly encoded near location 1:24"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just confused by this because this completely wrong test - it doesn't test UTF-16 key but missing " after value. The new position looks correct, it's just so confusing

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatedly: #22527.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we just discussed in DMs: I think the test is not trying to say that the key is an erroneous UTF-16 sequence, but instead that the key is UTF-16 and the error is afterwards.

Basically this is testing that the old broken column calculation works correctly for UTF-16 keys.

This test can probably just be removed entirely, because we no longer handle unicode escape sequences in a special way.

@henderkes henderkes Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bukka updated the description, but left out extra tests for this issue in this PR because Tim already pushed in #22528

@henderkes

Copy link
Copy Markdown
Contributor Author

@TimWolla I switched to uint64_t everywhere for this now. size_t was 32bit on x86-windows, it's now 64 bits. Perhaps we can relax this to 32 bit in general, probably not very realistic someone would reach the limit?

I'll look into more tests and finding any issues related to the previous change.

@TimWolla

Copy link
Copy Markdown
Member

Perhaps we can relax this to 32 bit in general, probably not very realistic someone would reach the limit?

4 GB of JSON is unlikely, but possible (for 64 platforms). A single 64-bit increment during regular parsing should be plenty fast even on 32-bit platforms, I consider using uint64_t everywhere to be acceptable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants