Skip to content

Add a metastore read replica role for read-only routing#6548

Draft
shuheiktgw wants to merge 11 commits into
quickwit-oss:mainfrom
shuheiktgw:metastore-read-replica-role
Draft

Add a metastore read replica role for read-only routing#6548
shuheiktgw wants to merge 11 commits into
quickwit-oss:mainfrom
shuheiktgw:metastore-read-replica-role

Conversation

@shuheiktgw

Copy link
Copy Markdown
Collaborator

Description

This is an alternative to #6538. Instead of running a single metastore deployment and routing read-only requests to a PostgreSQL read replica via a per-request gRPC header, this PR introduces a dedicated metastore_read_replica node role: read-only metastore pods are deployed separately (connected to the replica), discovered through the cluster, and searcher/analytics reads are routed to them at the service-discovery layer.

Routing by which node you talk to (rather than a per-request header) means no proto/codegen/interceptor changes, and it isolates read load onto separate processes that can scale and fail independently of the write-critical primary metastore.

How it works

  1. Operators run a separate set of pods with --service metastore_read_replica and metastore_read_replica_uri pointing at the PostgreSQL read replica (one shared config; the role selects which URI is used).
  2. A read-replica node resolves metastore_read_replica_uri over a read-only connection (migrations skipped — the replica is migrated by the primary) and serves the same quickwit.metastore.MetastoreService gRPC service, advertising the metastore_read_replica role in the cluster.
  3. ReadReplicaRoutingMetastore (given to the searcher service + DataFusion) routes the stale-tolerant search/analytics reads (index_metadata, indexes_metadata, list_indexes_metadata, list_splits, list_metrics_splits, list_sketch_splits) to read-replica nodes when any are connected, and everything else (writes, non-hot-path reads) to the primary. When no replica is deployed it degrades to the primary, so the feature is fully opt-in.
  4. Writes always go to the primary, so even a stray write through the routing client is served correctly rather than failing on the read-only replica.

Trade-offs

  • Staleness: search-path reads may lag the replica's replication window (seconds). Acceptable for logs/traces; scoped to searchers only, so the control plane / indexers / janitor keep read-your-writes against the primary.
  • A new node role + Deployment is the operational cost; in exchange there is zero generated-code churn and strong read/write process isolation.

Testing

  • quickwit-config: role parsing, opt-in default set, and all validation paths.
  • ReadReplicaRoutingMetastore: read routing (replica when connected, primary when not) through the public MetastoreService interface.
  • PostgreSQL-gated test resolving a read-only metastore against a real database and serving a read.
  • clippy --workspace --all-features --tests, cargo +nightly fmt --check, log-format / license / typos all clean.

Known limitation / follow-up

There is no automated end-to-end multi-node test (searcher → read-replica node over gRPC): the ClusterSandbox harness uses an in-memory metastore, and this feature is PostgreSQL-only. Covering it needs a PostgreSQL-backed sandbox variant (proposed as a follow-up). Manual production-path verification via two metastore roles against a real DB is the interim check.

🤖 Generated with Claude Code

Introduce the `metastore_read_replica` service role and the
`metastore_read_replica_uri` node config option: the foundation for
routing read-only metastore traffic to a PostgreSQL read replica.

- Add `QuickwitService::MetastoreReadReplica`, parsed from
  `metastore_read_replica` / `metastore-read-replica`.
- Split `QuickwitService::default_services()` from `supported_services()`
  so the new role is opt-in and never enabled implicitly on all-in-one
  nodes.
- Add the optional `metastore_read_replica_uri` field (env
  `QW_METASTORE_READ_REPLICA_URI`), redacted alongside `metastore_uri`.
- Validate that the role requires a PostgreSQL `metastore_read_replica_uri`
  and cannot be co-located with the `metastore` role.
Add the resolution plumbing for connecting to a PostgreSQL read replica
over a read-only connection.

- Add `MetastoreFactoryOptions { read_only }` and thread it through
  `MetastoreFactory::resolve`.
- Add `MetastoreResolver::resolve_read_only`, which rejects any non-
  PostgreSQL backend.
- Key the PostgreSQL factory cache on `(uri, options)` so the read-write
  and read-only clients get distinct connection pools.
- Add `PostgresqlMetastore::new_read_only`: a read-only connection pool
  with migrations skipped (the replica is migrated by the primary).
Resolve and expose a metastore gRPC server when the
`metastore_read_replica` role is enabled, backed by a read-only
connection to `metastore_read_replica_uri`.

- The read replica server reuses the metrics + load-shed layers but omits
  the control-plane event layers, which only wrap write RPCs.
- A read-replica node is exempted from the control-plane connectivity
  wait, like a primary metastore node, so dedicated replica pods start
  independently.
- Extract `metastore_max_in_flight_requests` shared by both roles.
Add `ReadReplicaRoutingMetastore`, a `MetastoreService` wrapper that
routes the stale-tolerant reads issued by the search and analytics paths
(`index_metadata`, `indexes_metadata`, `list_indexes_metadata`,
`list_splits`, `list_metrics_splits`, `list_sketch_splits`) to read
replica nodes when any are connected, and everything else (writes and
non-hot-path reads) to the primary.

- Routing is decided per request from the read replica balance channel's
  live connection set, so it degrades to the primary when no replica is
  deployed. The check is a synchronous `watch` read with no borrow held
  across an await.
- Wire it into the searcher service and the DataFusion session builder, so
  all search (REST, Elasticsearch, gRPC) and metrics analytics benefit,
  while REST admin handlers keep read-your-writes against the primary.
- Document `metastore_read_replica_uri` in the node config reference and
  the example `quickwit.yaml`.
- Add a PostgreSQL-gated test that resolves a read-only metastore against
  a real database and verifies it serves read RPCs.
Replace the full `MetastoreService` implementation on
`ReadReplicaRoutingMetastore` (which forced ~45 delegating methods, most
of them writes) with a narrow, read-only `MetastoreReadService` trait —
the read-only subset of the metastore RPCs (`index_metadata`,
`list_indexes_metadata`, `list_splits`, `list_metrics_splits`,
`list_sketch_splits`).

- `MetastoreServiceClient` implements `MetastoreReadService`;
  `MetastoreReadServiceClient = Arc<dyn MetastoreReadService>`.
- `ReadReplicaRoutingMetastore` now implements only the 5-method trait, so
  writes are excluded at the type level rather than delegated.
- The search and DataFusion read paths take `MetastoreReadServiceClient` /
  `&dyn MetastoreReadService`; `list_parquet_splits_*` take
  `&dyn MetastoreReadService`. `single_node_search` keeps its concrete
  `MetastoreServiceClient` parameter and adapts internally.

Addresses review feedback that the wrapper should expose a Go-style
read-only interface instead of reimplementing the whole service.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant