Use Store.get_many for whole-chunk reads in BatchedCodecPipeline by TomNicholas · Pull Request #4113 · zarr-developers/zarr-python

TomNicholas · 2026-07-01T20:55:23Z

Builds on #4112. BatchedCodecPipeline.read now fetches a whole (non-sharded) request with a single Store.get_many call instead of one get per chunk, so a store can batch/coalesce the underlying reads — independently of codec_pipeline.batch_size, which still governs only decode batching.

The sharding codec's partial-decode path is unchanged, and stores without a specialized get_many fall back to the previous concurrent per-chunk behavior.

Motivation — xref #1758 (request coalescing), #1806 (batched Store API), and zarr-developers/VirtualiZarr#947 (files-as-shards / consolidating small reads).

Stacked on #4112 — its commit is the first one here; review after it. Draft.

Add a public, overridable `Store.get_many` that retrieves many values at once - each request being a whole key or a `(key, byte_range)` pair. It generalizes `Store.get_ranges` (many ranges of one key) to many keys, and yields `(request_index, Buffer | None)` batches in completion order so a store can coalesce reads that land in the same underlying object. The ABC default fetches requests concurrently with `get`, so every store works out of the box; stores with a bulk backend override it (`FsspecStore` coalesces via fsspec's `cat_ranges`). Coalescing tuning is left to each store rather than exposed on the interface. This restores and generalizes the batched-fetch capability of the v2 `getitems` Store API (see zarr-developersgh-1806).

BatchedCodecPipeline.read now fetches the encoded bytes for an entire (non-sharded) read with a single Store.get_many call, instead of one Store.get per chunk. It drives get_many over all chunk keys, scatters the completion-ordered (index, buffer) results back into position, and feeds them to the per-batch decode path. This lets a store batch or coalesce the underlying reads (e.g. FsspecStore via cat_ranges, or a custom store such as virtualizarr's ManifestStore / icechunk's IcechunkStore that overrides get_many) regardless of codec_pipeline.batch_size, which still governs only decode batching. The sharding codec's partial-decode path is untouched, and stores without a specialized get_many fall back to the previous concurrent per-chunk gets.

codecov · 2026-07-01T21:00:15Z

Codecov Report

❌ Patch coverage is 93.15068% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.51%. Comparing base (1ab9953) to head (4f1ad9f).

Files with missing lines	Patch %	Lines
src/zarr/core/codec_pipeline.py	87.50%	5 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #4113   +/-   ##
=======================================
  Coverage   93.50%   93.51%           
=======================================
  Files          90       90           
  Lines       11981    12051   +70     
=======================================
+ Hits        11203    11269   +66     
- Misses        778      782    +4

Files with missing lines	Coverage Δ
src/zarr/abc/store.py	`96.38% <100.00%> (+0.20%)`	⬆️
src/zarr/storage/_fsspec.py	`91.50% <100.00%> (+0.17%)`	⬆️
src/zarr/testing/store.py	`99.46% <100.00%> (+0.02%)`	⬆️
src/zarr/core/codec_pipeline.py	`94.26% <87.50%> (-1.03%)`	⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ilan-gold · 2026-07-02T10:27:59Z

Can you highlight what the relation of this to #3925 is?

TomNicholas · 2026-07-02T15:21:55Z

Yes good question - get_ranges is only for a single key.

For my use case (fetching virtual chunks that are separate keys but happen to be part of the same netCDF file object) I need a method for bulk fetching multiple keys. Then my store (virtualizarr.ManifestStore/IcechunkStore) can use its manifests at runtime to find which chunks actually happen to live close to each other in the same object, then fetch those in one coalesced request. See zarr-developers/VirtualiZarr#1033 for an example implementation of get_many. I've seen 10x speedups for some queries with this addition.

d-v-b · 2026-07-02T15:33:41Z

+    ) -> AsyncIterator[Sequence[tuple[int, Buffer | None]]]:
+        """Retrieve many values, possibly from different keys, at once.
+
+        This is the bulk counterpart to :meth:`get`: the whole set of requests


rst-style docstring -> mkdocs-style docstring

d-v-b · 2026-07-02T15:34:53Z

+        """
+        # Local imports to avoid an import cycle at module load time.
+        from zarr.core.common import concurrent_map
+        from zarr.core.config import config


what if the concurrency is a plain keyword-only parameter for this function?

ilan-gold · 2026-07-02T15:49:02Z

See zarr-developers/VirtualiZarr#1033 for an example implementation of get_many. I've seen 10x speedups for some queries with this addition.

Ah nice! Now I get what you mean by "is handed to the store in a single call, so an implementation can fetch..."

TomNicholas added 2 commits July 1, 2026 17:00

TomNicholas force-pushed the feat/pipeline-use-get-many branch from d8a292d to 4f1ad9f Compare July 1, 2026 21:00

TomNicholas mentioned this pull request Jul 1, 2026

Coalescing ManifestStore.get_many zarr-developers/VirtualiZarr#1033

Draft

d-v-b reviewed Jul 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use Store.get_many for whole-chunk reads in BatchedCodecPipeline#4113

Use Store.get_many for whole-chunk reads in BatchedCodecPipeline#4113
TomNicholas wants to merge 2 commits into
zarr-developers:mainfrom
TomNicholas:feat/pipeline-use-get-many

TomNicholas commented Jul 1, 2026

Uh oh!

codecov Bot commented Jul 1, 2026 •

edited

Loading

Uh oh!

ilan-gold commented Jul 2, 2026

Uh oh!

TomNicholas commented Jul 2, 2026

Uh oh!

d-v-b Jul 2, 2026

Uh oh!

d-v-b Jul 2, 2026

Uh oh!

ilan-gold commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Uh oh!

Conversation

TomNicholas commented Jul 1, 2026

Uh oh!

codecov Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ilan-gold commented Jul 2, 2026

Uh oh!

TomNicholas commented Jul 2, 2026

Uh oh!

d-v-b Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

d-v-b Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

ilan-gold commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Jul 1, 2026 •

edited

Loading