fix: safe opcache_reset#2073
Conversation
# Conflicts: # caddy/caddy_test.go
|
Hi @AlliBalliBaba, any news on this side? 🙂 |
|
I experimented a lot with juggling thread states, but not sure it's possible to make a reset fully safe from our side alone. There are 2 main race conditions:
The first race condition kind of gets mitigated by the changes in this PR. THe second one would need to somehow be fixed in php-src or requires a hook to reject opcache_resets. |
|
Can we try doing what we do with the environment functions and simply override/replace the existing one:
|
|
Yeah I tried that as well. But an But maybe overwriting the user function would still be a good first step 👍 |
I tried this and it kind of works, but workers are an issue. Only got it working by locking new requests, draining all workers, doing the reset, restarting all workers and then unlocking. |
|
Oh I just realized that overwriting |
|
Yes, I hard linked because opcache is always active in 8.5+ (which is where I test). I'll tidy up my changes and merge them on top of this branch when I get to it. |
|
Maybe we can make it a requirement also for other PHP versions, not sure how many installations there are without it. |
|
You already had logic to check whether opcache exists in the branch so I only had to pick in a minor change here. Let's see if tests pass. Edit: well, apparently not. I only tested with 8.5 locally and guess what's passing x) Okay, I think I get the issue now. Hard linking to opcache won't help because the init chain for shared extensions is different than for static extensions, which is why it works for me locally, but not in CI in < 8.5. Need to wait with actually calling opcache reset until the extension is properly loaded. |
e54cd96 to
5c8e340
Compare
5c8e340 to
1d75824
Compare
e7bd25a to
0d87765
Compare
|
Alright, we're down to php 8.2 segfaulting. I'm not terribly keep to find out why. |
|
Down to only debian 8.2 failing... |
|
@henderkes do you remember what exactly fixed 8.2 for you before? I think I re-introduced the failure for some 8.2 versions while fixing a different race. |
It turned out that the failures were unrelated to the branch changes. I merged main so if it's still happening now it'll be something different. |
Not sure why... I've ran the exact same docker container tests 100x locally and didn't run into the failure a single time. |
…rankenphp into fix/opcache-safe-reset
Replace the ad-hoc opcache test with the canonical concurrency test and fixtures from PR php#2073 (caddy TestOpcacheReset, opcache_reset.php, require.php): 500 mixed sleep.php / opcache_reset.php requests across 20 workers. Verified passing against this implementation with opcache enabled and every opcache_reset() routed through the thread-safe coordinator. https://claude.ai/code/session_01K4jwAnp9mA9ApgaFiJDf6d
|
Hey, quick question, would be really great if we could get this PR merged soonish, because it's blocking a lot of our team members. Is there any estimation for how long this takes to fix or can we assist in some way? |
|
so i even checked out this branch and built the docker image for it locally, but i still run into |
|
zend_mm_heap_corrupted is generally an error in upstream php-src or caused by a not-fully-zts-compatible extension. This PR here would fix segfaults, rather than zend memory manager logic errors |
|
Is there anything I can do to help get this merged? Properly reloading frankenkphp is of high priority to me (#2266) |
|
@AlliBalliBaba I compiled the current branch and ran it on our test system with our symfony application. I had no issues. |
|
Yes just specific problems you have been having regarding opcache. Or the watcher in general. People mainly have been reporting crashes during wordpress or moodle installation (like #1737). |
|
Zero problems with opcache. But our symfony app is pretty clean and relatively new version, so probably a lot less troublesome to run than Moodle or Wordpress |
PR php#2073 drains all threads for explicit opcache_reset() calls, but WordPress triggers implicit restarts via opcache_invalidate() filling opcache memory → zend_accel_schedule_restart() → restart_pending → the actual reset runs during the next php_request_startup(). That path bypasses the opcache_reset() override and races on shared memory. Add a pthread_rwlock around the request lifecycle: - Normal requests: read lock (concurrent, single atomic CAS) - When zend_accel_schedule_restart_hook fires: set an atomic flag - Next php_request_startup(): sees flag → write lock (exclusive), blocks until all current requests complete, resets safely - After startup: unlock, all threads resume Combined with PR php#2073, this covers both explicit opcache_reset() (thread drain) and implicit OOM/hash-overflow restarts (rwlock). Testing: without this patch, crashes every ~10 min on a WordPress multisite. With PR php#2073 alone, same crash frequency. With the rwlock from our PR php#2349 alone, crashes reduced to every 6-12 hours (worker threads not covered). Combined fix expected to eliminate both paths.
Idea to fix #1737
Just a WIP, to test in CI.