Add Store.get_many bulk fetch API by TomNicholas · Pull Request #4112 · zarr-developers/zarr-python

TomNicholas · 2026-07-01T20:55:13Z

Adds a public, overridable Store.get_many for fetching many values — each a whole key or a (key, byte_range) pair — in a single call, so a store can batch/coalesce reads that land in the same underlying object instead of issuing one get per key.

It generalizes Store.get_ranges (many ranges of one key) to many keys, yielding (request_index, Buffer | None) batches in completion order. The ABC default is a concurrent fan-out over get; FsspecStore overrides it to coalesce via fsspec's cat_ranges. Coalescing tuning is left to each store rather than exposed on the interface.

Motivation — xref #1806 (batched Store API), #1758 (request coalescing), and zarr-developers/VirtualiZarr#947 (files-as-shards / consolidating small reads): a custom store such as VirtualiZarr's ManifestStore or icechunk's IcechunkStore can override get_many to merge many small chunk reads into fewer requests.

Draft — feedback on the signature welcome.

codecov · 2026-07-01T20:59:44Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.53%. Comparing base (1ab9953) to head (9e6eeae).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4112      +/-   ##
==========================================
+ Coverage   93.50%   93.53%   +0.02%     
==========================================
  Files          90       90              
  Lines       11981    12013      +32     
==========================================
+ Hits        11203    11236      +33     
+ Misses        778      777       -1

Files with missing lines	Coverage Δ
src/zarr/abc/store.py	`96.38% <100.00%> (+0.20%)`	⬆️
src/zarr/storage/_fsspec.py	`91.50% <100.00%> (+0.17%)`	⬆️
src/zarr/testing/store.py	`99.46% <100.00%> (+0.02%)`	⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add a public, overridable `Store.get_many` that retrieves many values at once - each request being a whole key or a `(key, byte_range)` pair. It generalizes `Store.get_ranges` (many ranges of one key) to many keys, and yields `(request_index, Buffer | None)` batches in completion order so a store can coalesce reads that land in the same underlying object. The ABC default fetches requests concurrently with `get`, so every store works out of the box; stores with a bulk backend override it (`FsspecStore` coalesces via fsspec's `cat_ranges`). Coalescing tuning is left to each store rather than exposed on the interface. This restores and generalizes the batched-fetch capability of the v2 `getitems` Store API (see zarr-developersgh-1806).

Implement Store.get_many on ManifestStore: resolve the requested chunk keys through the manifests to (source file, byte range), group the requests by source file, and fetch each group with obstore's range-coalescing reader so that virtual references lying within `coalesce_max_gap_bytes` of each other in the same file are served by a single, larger request instead of one request per chunk. Keys that are not plain manifest-backed chunks (metadata, inlined, or missing chunks) are served individually via `get`. This is the same technique object_store / async-tiff use to read many tiles efficiently, applied to virtual chunk references, and derives an "effective shard index" from the manifests at read time. It requires no file-format assumptions and no spec changes. The coalescing gap is configurable via a new `coalesce_max_gap_bytes` constructor argument (default 1 MiB, 0 disables). Depends on the cross-key `Store.get_many` API in zarr-python (zarr-developers/zarr-python#4112, #4113); until that is released the method is a dormant override. See zarr-developers#947.

Implement Store.get_many on ManifestStore: resolve the requested chunk keys through the manifests to (source file, byte range), group the requests by source file, coalesce each group into runs, and serve each run with a single ranged read that is sliced back into per-chunk buffers. Keys that are not plain manifest-backed chunks (metadata, inlined, or missing chunks) are served individually via `get`. Coalescing uses two knobs (the object_store / async-tiff model): a run merges references whose gap is <= `coalesce_max_gap_bytes` as long as the resulting read stays <= `coalesce_max_bytes`. The gap defaults to 0 - merge only adjacent references, a pure win with no wasted bytes - because benchmarking a 2D map-tile query against a remote Met Office file showed that bridging larger gaps pulls in the chunks that sit between rows (a 2D box is contiguous along one axis but strided along the other), reading ~3x the needed bytes and running slower than no coalescing at all. Merging only adjacent references was ~2.8x faster than the per-chunk baseline with zero over-read. `max_bytes` (default 8 MiB) bounds the size of any single read. Depends on the cross-key `Store.get_many` API in zarr-python (zarr-developers/zarr-python#4112, #4113); until that is released the method is a dormant override. See zarr-developers#947.

TomNicholas mentioned this pull request Jul 1, 2026

Use Store.get_many for whole-chunk reads in BatchedCodecPipeline #4113

Draft

TomNicholas force-pushed the feat/store-get-many branch from 24890a9 to 9e6eeae Compare July 1, 2026 21:00

TomNicholas mentioned this pull request Jul 1, 2026

Coalescing ManifestStore.get_many zarr-developers/VirtualiZarr#1033

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add Store.get_many bulk fetch API#4112

Add Store.get_many bulk fetch API#4112
TomNicholas wants to merge 1 commit into
zarr-developers:mainfrom
TomNicholas:feat/store-get-many

TomNicholas commented Jul 1, 2026

Uh oh!

codecov Bot commented Jul 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

TomNicholas commented Jul 1, 2026

Uh oh!

codecov Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented Jul 1, 2026 •

edited

Loading