Skip to content

feat(context): all-native OTEL context write API — Phase 1 (PROF-15271)#631

Draft
rkennke wants to merge 8 commits into
mainfrom
all-native-context-storage
Draft

feat(context): all-native OTEL context write API — Phase 1 (PROF-15271)#631
rkennke wants to merge 8 commits into
mainfrom
all-native-context-storage

Conversation

@rkennke

@rkennke rkennke commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Phase 1 of eliminating the virtual-thread use-after-free in OTEL trace-context storage
(PROF-15271), by moving context writes to an all-native path that resolves the current
carrier's OTEP record inside the JNI call — where the mounted virtual thread is pinned to its
carrier by the native frame, so the write is race-free by construction with no cached
DirectByteBuffer to dangle.

This PR is purely additive: it adds the new API alongside the existing DirectByteBuffer
(DBB) path (now @Deprecated) and changes no behavior. The crash fix itself lands in Phase 2,
when dd-trace-java switches to the new API; until then carrier-scoping (#625 / 94686da) remains
the mitigation.

Full rationale, options considered (A–E), the JMH campaign, the consumer audit, and the
expand → migrate → contract plan are in the design note:
doc/plans/2026-07-02-all-native-context-storage-design.md.

What's in this PR

  • Native write primitives (javaApi.cpp): setTraceContext0/clearTraceContext0 (combined
    per-scope activate/deactivate) and setContextValue0/clearContextValue0 (single attribute),
    using the same detach → mutate → attach valid-flag protocol as ThreadContext so the
    async-signal-handler sampler (ContextApi::get) stays coherent. The native path self-initializes
    the LRS attrs_data entry (no dependency on the ThreadContext ctor).
  • Public API + global value cache (JavaProfiler, ContextValueCache): provisional names
    setTraceContext/clearTraceContext/setContextValue/clearContextValue. The
    value → (encoding, utf8) memoization moves out of the per-thread ThreadContext into a
    process-wide lock-free cache (safe because the encoding is a process-global immutable ID).
  • Deprecations: the DBB write/read surface and the ThreadContext/ScopeStack/ContextSetter
    helper classes are marked @Deprecated (non-breaking) pointing at the new API.
  • Tests (AllNativeContextTest, 9): round-trip, DBB-equivalence, clear semantics,
    span-transition attribute reset, single-attr set/clear, overflow, rejected values,
    DBB↔native coexistence, and coherence of native writes from 512 mounted virtual threads.
  • Perf-guard benchmark (ContextCombinedBenchmark).

Results

  • Tests: all 9 new tests pass on JDK 21 (debug); the existing context suite (Otel / Carrier /
    Tag / ProcessContext) is unregressed.
  • Perf (production per-scope activate+deactivate cycle, carrier mode): all-native
    ~136 ns vs DBB ~189 ns — ~28% faster (~24% thread mode), a win on virtual threads too, and
    mode-independent. The DBB baseline is the faithful production cost (each of its 6 calls pays a
    currentContext()/CarrierThreadLocal lookup the 2-call native cycle avoids).

Migration (expand → migrate → contract)

  1. Expand — this PR (java-profiler): add the new API alongside the DBB path. Additive.
  2. Migrate (dd-trace-java): switch DatadogProfilingIntegration to the combined API. Crash
    fix + perf win land here.
  3. Contract (java-profiler): delete the DBB / CarrierThreadLocal / OtelContextStorage
    subsystem. Breaking, gated on all-consumers-migrated.

Reviewer notes

  • Public API names are provisional — the one Phase-1 item to coordinate with the dd-trace-java
    team so Phase 2 is a clean swap.
  • Signal-safety is the key review area: the native detach/attach + release-fence protocol vs
    the sampler's acquire load of valid (ContextApi::get). Verify on aarch64 (trivial on x86).
  • The full JMH campaign + native prototypes are preserved for reproducibility on branch
    context-storage-benchmark-experiment.
  • Consumer audit: the DBB API's only production consumer is the dd-trace-java bridge; everything
    else is test/benchmark. Open item: confirm ddprof-lib standalone Maven publication (external
    consumers) before the Phase-3 removal.

🤖 Generated with Claude Code

rkennke and others added 8 commits July 3, 2026 08:35
Captures the UAF problem, options A-E, the JMH benchmark campaign and its
governing principle, the combined-API design, the consumer audit, and the
expand -> migrate -> contract migration plan.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add native setTraceContext0/clearTraceContext0 (combined per-scope
activate/deactivate) and setContextValue0/clearContextValue0 (single attribute),
plus attrs_data compaction and LRS helpers, in javaApi.cpp, with matching private
native declarations on JavaProfiler.

Each resolves the current carrier's OtelThreadContextRecord via
ProfiledThread::current(). A JNI native frame pins a mounted virtual thread to its
carrier for the duration of the call, so the resolved record is guaranteed live and
cannot migrate mid-write -- there is no cached per-thread DirectByteBuffer to
dangle. Writes follow the same detach -> mutate -> attach valid-flag protocol as
ThreadContext, keeping the async-signal-handler sampler (ContextApi::get) coherent.

Native layer only; the public Java API and value cache built on top follow in a
later commit. The DirectByteBuffer path is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-15271)

Add the public all-native context API on JavaProfiler (provisional names, pending
dd-trace-java coordination): setTraceContext/clearTraceContext (combined per-scope
activate/deactivate) and setContextValue/clearContextValue (single attribute),
each backed by the native primitives.

Relocate the value->(encoding,utf8) memoization out of the per-thread ThreadContext
into a new process-wide ContextValueCache. Because the encoding is a process-global
immutable ID (registerConstant0), a shared lock-free cache is correct: immutable
entries published via a single AtomicReferenceArray slot store (no torn reads), a
miss re-registers idempotently, and collisions self-heal. This avoids duplicating
the table across every carrier/virtual thread and needs no CarrierThreadLocal.
registerConstant0 is made package-private for reuse (relocates in phase 3).

Coexists with the DirectByteBuffer path (both write the same native record); the
DBB path is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…5271)

Mark the DBB write/read surface `@Deprecated` pointing at the all-native API:
JavaProfiler.setContext/clearContext/setContextAttribute/clearContextAttribute/
setContextAttributesByIdAndBytes/copyTags/getThreadContext, and the DBB-path helper
classes ThreadContext, ScopeStack, and ContextSetter.

Non-breaking (annotations only) -- signals the migration for dd-trace-java (phase 2)
ahead of removal (phase 3). Java has no -Werror, so the deprecation notes are
informational; the DBB path still functions unchanged during coexistence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…PROF-15271)

setTraceContext0/clearTraceContext0 now write the fixed LRS entry header
(key_index=0, length=16) in addition to its hex value, instead of relying on the
ThreadContext constructor to have initialized it. This makes the all-native path
correct on a record that only saw the ProfiledThread zero-init -- i.e. when no
DirectByteBuffer/ThreadContext was ever created (the phase-2 pure-native case),
where attrs_data[1] would otherwise be 0 and mis-frame the LRS entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
AllNativeContextTest exercises setTraceContext/clearTraceContext/setContextValue/
clearContextValue: round-trip readback, equivalence with the DirectByteBuffer path,
clear semantics, span-transition attribute reset, single-attribute set/clear,
attrs_data overflow, rejected (null/oversized) values, DBB<->native coexistence on
one thread, and coherence of native writes from many mounted virtual threads (the
migration-safe path).

Read-back uses the ThreadContext DBB read methods as an oracle -- a view over the
same native record the all-native path writes. All 9 pass on JDK 21 (debug); the
existing context suite (Otel/Carrier/Tag/ProcessContext) is unregressed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ROF-15271)

Point the design note appendix at branch context-storage-benchmark-experiment
(pushed to origin), which preserves the full JMH campaign + native prototypes for
reproducibility, rather than describing scaffolding that lives only on that branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…le (PROF-15271)

ContextCombinedBenchmark compares the production per-scope activate+deactivate cycle:
the combined native calls (setTraceContext + clearTraceContext) vs the deprecated
DirectByteBuffer sequence as dd-trace-java actually drives it (profiler.setContext +
2x setContextAttribute + clear...), on platform + mounted virtual threads, both storage
modes.

Measured (JDK 21, debug, carrier): native ~136 ns vs DBB ~189 ns -- native ~28% faster,
mode-independent, and a win on vthreads too (~147 vs ~193 ns). The DBB baseline here is
the faithful production cost: each of its 6 calls pays a currentContext()/
CarrierThreadLocal lookup, which the 2-call combined native cycle avoids.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dd-octo-sts

dd-octo-sts Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

CI Test Results

Run: #28657970875 | Commit: 2e7b756 | Duration: 13m 44s (longest job)

2 of 32 test jobs failed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Failed Tests

glibc-aarch64/debug / 8-j9

Job: View logs

No detailed failure information available. Check the job logs.

glibc-amd64/debug / 11-j9

Job: View logs

No detailed failure information available. Check the job logs.

Summary: Total: 32 | Passed: 30 | Failed: 2


Updated: 2026-07-03 11:53:20 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant