move json line tracking off the success path#22487
Conversation
Script to benchmark: bench.php<?php
function build_payloads(): array {
$rows = [];
for ($i = 0; $i < 30; $i++) {
$rows[] = [
'id' => $i,
'name' => "item_\u{00e9}\u{2603}_$i",
'active' => ($i % 2 === 0),
'score' => $i * 1.5 + 0.25,
'tags' => ['alpha', 'beta', "gamma\ndelta", "quote\"inside"],
'meta' => ['nested' => ['deep' => ['value' => "text with \"quotes\" and \\ backslash and /slash"]]],
'note' => null,
'big' => 123456789012345,
];
}
$data = ['rows' => $rows, 'count' => 30, 'ok' => true, 'ratio' => 3.14159];
$doc = json_encode($data);
$bad = $doc;
$pos = strrpos($bad, 'backslash');
if ($pos === false) {
fwrite(STDERR, "payload generation failed: marker not found\n");
exit(2);
}
$bad[$pos] = "\x01";
return ['success' => $doc, 'failure' => $bad];
}
function run_case(string $doc, int $iters): array {
$ok = 0; $fail = 0;
$t = hrtime(true);
for ($i = 0; $i < $iters; $i++) {
$r = json_decode($doc, true);
if ($r === null && json_last_error() !== JSON_ERROR_NONE) {
$fail++;
} else {
$ok++;
}
}
$ns = hrtime(true) - $t;
return [
'iters' => $iters,
'ok' => $ok,
'fail' => $fail,
'ns_total' => $ns,
'ns_op' => $iters > 0 ? $ns / $iters : 0.0,
'bytes' => strlen($doc),
'last_err' => json_last_error_msg(),
];
}
$args = array_slice($argv, 1);
$iters = 200000;
$csv = false;
$emit = null;
for ($i = 0; $i < count($args); $i++) {
$a = $args[$i];
if ($a === '--csv') {
$csv = true;
} elseif ($a === '--emit') {
$emit = $args[++$i] ?? '.';
} elseif ($a !== '' && $a === (string) (int) $a) {
$iters = (int) $a;
} else {
fwrite(STDERR, "unknown arg: $a\n");
exit(2);
}
}
$payloads = build_payloads();
if ($emit !== null) {
file_put_contents("$emit/success.json", $payloads['success']);
file_put_contents("$emit/failure.json", $payloads['failure']);
printf("wrote %s/success.json (%d bytes) and %s/failure.json (%d bytes)\n",
$emit, strlen($payloads['success']), $emit, strlen($payloads['failure']));
exit(0);
}
$probe_ok = json_decode($payloads['success'], true);
$ok_digest = md5(serialize($probe_ok));
json_decode($payloads['failure'], true);
$fail_msg = json_last_error_msg();
for ($i = 0; $i < 1000; $i++) { json_decode($payloads['success'], true); }
$results = [];
foreach (['success', 'failure'] as $case) {
$results[$case] = run_case($payloads[$case], $iters);
}
$banner = sprintf("php %s | %s | iters=%d",
PHP_VERSION, defined('PHP_ZTS') && PHP_ZTS ? 'ZTS' : 'NTS', $iters);
if ($csv) {
foreach ($results as $case => $r) {
printf("%s,%d,%d,%d,%d,%.1f,%d\n",
$case, $r['iters'], $r['ok'], $r['fail'], $r['ns_total'], $r['ns_op'], $r['bytes']);
}
exit(0);
}
echo $banner . "\n";
echo "success decode digest: $ok_digest\n";
echo "failure error message: $fail_msg\n";
printf("%-9s %12s %12s %14s %10s %10s\n",
'case', 'iters', 'bytes', 'total(ms)', 'ns/op', 'ok/fail');
foreach ($results as $case => $r) {
printf("%-9s %12d %12d %14.2f %10.1f %6d/%-6d\n",
$case, $r['iters'], $r['bytes'], $r['ns_total'] / 1e6, $r['ns_op'],
$r['ok'], $r['fail']);
} |
bukka
left a comment
There was a problem hiding this comment.
Can we add more tests. I would like to see multiline not ended string here because currently it report start of the string so how it's going to work on multiple lines.
In general I don't have a problem with this. I just want more test so it doesn't end up like before.
| bool(false) | ||
| int(3) | ||
| string(71) "Control character error, possibly incorrectly encoded near location 1:9" | ||
| string(72) "Control character error, possibly incorrectly encoded near location 1:24" |
There was a problem hiding this comment.
I just confused by this because this completely wrong test - it doesn't test UTF-16 key but missing " after value. The new position looks correct, it's just so confusing
There was a problem hiding this comment.
As we just discussed in DMs: I think the test is not trying to say that the key is an erroneous UTF-16 sequence, but instead that the key is UTF-16 and the error is afterwards.
Basically this is testing that the old broken column calculation works correctly for UTF-16 keys.
This test can probably just be removed entirely, because we no longer handle unicode escape sequences in a special way.
|
@TimWolla I switched to uint64_t everywhere for this now. size_t was 32bit on x86-windows, it's now 64 bits. Perhaps we can relax this to 32 bit in general, probably not very realistic someone would reach the limit? I'll look into more tests and finding any issues related to the previous change. |
4 GB of JSON is unlikely, but possible (for 64 platforms). A single 64-bit increment during regular parsing should be plenty fast even on 32-bit platforms, I consider using uint64_t everywhere to be acceptable. |
should get rid of most of the performance penalty introduced by b80ffc5
cross-link #22277 (comment)
Reasoning: I'm thinking that successful json parses should be much more common than failed ones. So instead of paying a noticeable cost on every successful json parse, that never reports an error, we should instead only pay error reporting cost when it happens. This is a trade-off of paying a higher penalty in the error case to keep the expected case clean.
I've not benchmarked the error path yet, but for running phpstan on phpstan-src, it shaves about a second off of execution time.