Vectorise _partition_bumps to fix O(n_chunks x count) dask graph build (#3612)#3617
Merged
Merged
Conversation
…3612) _partition_bumps rescanned the full locs array with a boolean mask once per chunk, so dask graph construction ran in O(n_chunks * n_bumps) before any compute. Bucket each bump into its chunk in a single searchsorted pass instead (byte-identical partitions, ~85x faster at 25.6k chunks). Add a benchmark (benchmarks/benchmarks/bump.py) and a regression test that locks the partition against the original boolean-mask reference. Sweep: performance. Backends: dask+numpy, dask+cupy (shared helper).
Contributor
Author
PR Review: Vectorise _partition_bumps to fix O(n_chunks x count) dask graph build (#3612)Blockers (must fix before merge)None. Suggestions (should fix, not blocking)None. Nits (optional improvements)
What looks good
Checklist
|
Contributor
Author
|
Follow-up on the review nits:
No blockers or suggestions were raised. Tests still green (19 passed, includes cupy + dask+cupy on GPU). |
…e-bump-2026-07-02
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3612.
_partition_bumpsassigned each bump to the dask chunk holding its centre pixel by looping over every chunk and rescanning the fulllocsarray with a fresh boolean mask. That is O(n_chunks x n_bumps), and it runs eagerly during graph construction, before any.compute(). On a finely chunked raster the rescan dominates the runtime.This replaces the double loop with a single
np.searchsortedpass that buckets every bump into its chunk, then groups by chunk. Complexity drops to about O(n_bumps log n_chunks). The output is byte-identical to the old code, including chunk-local coordinates and intra-chunk order, so compute results do not change.Measured on the standalone helper (count = 100k):
Backends
The change is in
_partition_bumps, shared by_bump_dask_numpyand_bump_dask_cupy. The numpy and cupy (non-dask) paths do not touch this helper and are unaffected. Verified on a CUDA host: the cupy and dask+cupy tests pass.Benchmark
Adds
benchmarks/benchmarks/bump.py. Thedaskcase is finely chunked and times graph construction only, so a return of the per-chunk rescan would show up as a regression.Test plan
pytest xrspatial/tests/test_bump.py(19 passed, includes cupy and dask+cupy on GPU)test_partition_bumps_matches_bruteforcechecks the vectorised partition against the original boolean-mask reference across an irregular chunk layoutFound by /sweep-performance.