BQAA (Java): preview-readiness fixes — redaction, table bootstrap, drop stats, reliability#1318
BQAA (Java): preview-readiness fixes — redaction, table bootstrap, drop stats, reliability#1318caohy1988 wants to merge 4 commits into
Conversation
…p stats, reliability) Addresses google#1316. Highlights: - P1: secret redaction in JsonFormatter; empty-STATE_DELTA guard; DAY time-partitioning + retry-after-failure table bootstrap; root_agent_name initialization; no-current-agent guard. - P2: drop-row accounting via getDropStats(); StreamWriter regional routing; JVM shutdown-hook drain; schema auto-upgrade always diffs (+ drift warning). - P3: config numeric validation; view-identifier validation; truncation depth guard; hot-path INFO->FINE; EventData license header + Javadoc fix. Not built/tested locally; unit tests still to add. Deep-semantic parity items (span-tree-under-OTel, full ADK envelope, long-running tool resume, off-thread first-event ensure) intentionally deferred; see PR description. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Review against #1316 preview-readiness issueI re-reviewed this PR against the tracked preview blockers in #1316. This is a strong partial fix, but I would keep it draft for now. A few issue-tracked blockers remain, and I found two regressions/holes in the new implementation that should be fixed before marking ready. Findings
Coverage against #1316Good fixes landed: redaction, empty Still intentionally unresolved from the issue: HITL/user-message resume parity, added Python event types and views, ambient OTel/internal span-tree semantics, plugin-owned OTel span export, full ReadinessRecommendation: keep this PR as draft until the P1s above are fixed and tests are added. Current GH status also has |
…, license, agent fallback - F1: maybeUpgradeSchema now returns success; ensureTableExists only marks the table ready when create/existing/upgrade all succeed (failed upgrade retries). - F2: add missing Apache license header to PluginState.java. - F3: fold drop counters AFTER BatchProcessor.close() so final-drain drops count. - F4: schema drift check now also compares Field.Mode (was type-only), incl. nested fields via recursion. - F5: agent column falls back to Event.author (via EventData.fallbackAgentName) on source-event rows (STATE_DELTA/A2A/AGENT_RESPONSE) before the "unknown" sentinel, matching the Python plugin. Still untested locally; unit tests to follow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks — all five findings validated against source and fixed in 11d2fab. Point by point:
Everything under "still intentionally unresolved" in your review remains deferred (span-tree/OTel semantics, ADK envelope, HITL long-running resume + pair keys, off-thread first-event ensure, default naming, per-invocation writer scaling, DLQ/salvage). Keeping the PR draft: still not built locally and unit tests for these behaviors are pending. The |
Deferred google#1316 P1 item: span_id/parent_span_id now always come from the BQAA internal execution tree (getCurrentSpanAndParent) instead of preferring ambient OpenTelemetry span IDs, so parent_span_id always references another logged row rather than an unlogged framework span. Ambient OTel still governs trace_id via getTraceId. The completed/error callbacks (LLM/tool) and getCompletedEventData now always record the popped internal span, and the now-dead getAmbientSpanAndParent() helper is removed. Tests (JUnit4/vintage, matching existing style): - JsonFormatterTest: sensitive-key redaction (top-level, case-insensitive + temp: prefix, nested; is_truncated unaffected) and recursion depth guard. - BigQueryLoggerConfigTest (new): build() rejects non-positive batchSize/queueMaxSize/maxContentLength. - TraceManagerTest: initTraceIfNeeded sets root agent name; null-agent keeps the sentinel. Still not executed locally; CI verifies. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Two follow-ups in 08fe970: 1. Internal span tree under ambient OTel (previously deferred #1316 P1). Note: this does not stop the plugin from creating real OTel spans — the duplicate-export concern is separate and still deferred. This change only fixes which span IDs land in the rows. 2. Unit tests (JUnit4/vintage, matching existing style):
Still not executed locally (no Maven build in my environment), so CI is the gate. Remaining deferred items from your review are unchanged: ADK envelope, HITL long-running resume + pair keys, plugin-owned OTel span export, off-thread first-event ensure, default naming, per-invocation writer scaling, DLQ/salvage. |
…n STRUCT mode drift - Tests no longer expect a STATE_DELTA row for events that never set a state delta; onEventCallback_populatesCorrectFields now sets a real stateDelta and asserts it lands in attributes, and a new test pins down that empty deltas emit no row. - schemaFieldsMatch now checks mode drift on STRUCT columns themselves (e.g. NULLABLE vs REPEATED content_parts) before recursing into subfields, via a shared warnOnIncompatibleDrift helper; added a regression test capturing the warning. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Hi @caohy1988, thank you for your contribution! We appreciate you taking the time to submit this pull request. I noticed that the pre-checks are currently failing and make sure your PR consist of single commit as per Contribution Policy. Could you please take a look and address those comments? |
Summary
Addresses preview-readiness gaps in the Java BigQuery Agent Analytics (BQAA) plugin tracked in #1316, bringing it closer to parity with the Python plugin and hardening the write path.
Important
This change was not built or tested locally (the full Maven build was not run in the authoring environment). Please rely on CI and review. Unit tests for the new behaviors still need to be added — see "Tests to add" below. Findings were verified by direct source review against the issue-pinned SHA
ce6af87, which is byte-identical to currentmainfor this package.Fixes #1316 (partially — remaining deferred items listed below).
P1 — preview blockers
JsonFormatter): sensitive keys (client_secret,access_token,refresh_token,id_token,api_key,password, and anytemp:*key) are now redacted to[REDACTED]during truncation, matching the Python plugin's_SENSITIVE_KEYS. Case-insensitive; redaction does not set theis_truncatedflag (parity). This covers tool args/results, session state, usage metadata, and custom-tag paths that flow throughsmartTruncate.STATE_DELTAguard (onEventCallback): aSTATE_DELTArow is only emitted whenevent.actions().stateDelta()is non-empty, matching Python (no row for empty deltas).ensureTableExists/ensureTableExistsOnce):timestamp(parity with Python; enables partition pruning), in addition to the existing clustering.tableEnsuredis now set only after a successful setup, so a transient first-run failure (auth blip, missing dataset, quota) is retried on later events instead of permanently disabling table create/upgrade for the plugin instance.already exists) is treated as success rather than a hard failure.root_agent_nameis populated (TraceManager.initTraceIfNeeded, called fromgetAttributes):attributes.root_agent_namenow resolves to the real root agent name instead of the sentinel default. Null-safe.resolveAgentName): theagentcolumn is resolved defensively; workflow-driven callbacks with no current agent fall back to"unknown"instead of NPE-dropping the row.P2 — reliability hardening
BatchProcessor+PluginState+getDropStats()): rows lost to a full queue, append errors, or serialization errors are now counted and exposed viaplugin.getDropStats()(queue_full/append_error/serialization_error). Counters survive per-invocation processor churn by folding into plugin-level totals at close.StreamWriteris now built with.setLocation(config.location())so append RPCs route to the dataset's region (previously only the control-plane client set location; non-US/EU appends risked stream-not-found).state.close()bounded byshutdownTimeout) so a host that never callsclose()still flushes queued rows at exit. The package-private test constructor does not register a hook (avoids hook leakage in tests).maybeUpgradeSchema): removed the version-label-only early return that Python deliberately avoids — a table stamped with the current label but missing columns is now reconciled. Incompatible non-STRUCT type drift is logged (was silently ignored, later surfacing as opaque append failures).P3 — operational polish
batchSize,queueMaxSize, andmaxContentLengthare validated as positive inbuild().createAnalyticsViewsrefuses to interpolate a project/dataset/table/viewPrefix that isn't a safe identifier (guards the string-formatted DDL).recursiveSmartTruncate(preventsStackOverflowErroron pathological nesting) + a log line on the string-fallback path.isProcessed, pending-task wait/removal, stale-span clearing, table-exists).EventData.java: added the missing Apache license header and fixed stale_log_eventJavadoc.Deferred (intentionally not in this PR)
These need behavior/parity decisions and tests I did not want to ship blind:
attributes.adkenvelope parity (BQAA Java preview readiness: parity and safety gaps vs Python plugin #1316 P2): route/render/rewind metadata, source-event node/branch/scope, long-running pair keys.TOOL_COMPLETEDwithpause_kind/function_call_idpair keys (BQAA Java preview readiness: parity and safety gaps vs Python plugin #1316 P1).ensureTableExistsis still synchronous on the first event's callback. Moving it off-thread safely needs append-gating so early rows aren't lost; deferred to avoid an untested concurrency change. (Note for whoever picks this up: snapshot any mutableInvocationContext/Sessiondata before crossing threads.)agent_analytics.events+createViews=falsevs Pythonagent_events+create_views=True): a product/docs decision, not a code fix.Plugin-owned OTel span export (#1316 comment) is related to the span-tree item and is deferred with it.
Tests to add
is_truncatedunaffected by redaction.state_deltaproduces noSTATE_DELTArow.getDropStats()increments on queue-full / append-error / serialization-error paths.batchSize/queueMaxSize/maxContentLength.close()is not called explicitly.🤖 Generated with Claude Code