Parallelize convolve_2d numpy kernel (#3615)#3616
Open
brendancol wants to merge 1 commit into
Open
Conversation
_convolve_2d_numpy iterated with numba.prange but was decorated @jit(nopython=True, nogil=True) with no parallel=True, so prange degraded to a serial range. On a 20-core host a 2000x2000 float64 raster with a 15x15 kernel dropped from ~356 ms to ~53 ms end-to-end once parallel=True was enabled, with identical results. Add parallel=True and serialize the kernel launch behind a module-level threading.Lock, since dask calls it per chunk under a threaded scheduler and numba's workqueue layer is not threadsafe across host threads (SIGABRT on macOS, same hazard as #3141 fixed in terrain.py). A single numpy call takes the lock uncontended and still runs across all cores. Add benchmarks/benchmarks/convolution.py (Convolve2d, CircleKernel) so the regression cannot silently return.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #3615.
Problem
_convolve_2d_numpyiterates its two outer loops withnumba.prangebut was decorated@jit(nopython=True, nogil=True)with noparallel=True. Numba only parallelizesprangewhenparallel=Trueis set, so as shipped the loop ran serial on a single core. This is the main CPU convolution path, used by the numpy backend and per chunk by the dask+numpy backend.Fix
parallel=Trueto the decorator.threading.Lock. Dask calls the kernel per chunk under its threaded scheduler, and numba's defaultworkqueuethreading layer is not threadsafe across host threads (SIGABRT on macOS). This is the same hazard and fix already applied toterrain.py(Streaming reproject thread pool aborts the process when numba parallel kernels run concurrently #3141). A single numpy call takes the lock uncontended and still runs across all cores; concurrent dask chunk calls run one at a time, each internally parallel.The cupy and dask+cupy paths use a separate CUDA kernel and are untouched.
Measurement
20-core host, 2000x2000 float64 raster, 15x15 kernel, end-to-end through
convolve_2d:Results are identical (verified
np.allclose(..., equal_nan=True)).Verification
xrspatial/tests/test_convolution.py: 6 passed.nan,nearest,reflect,wrap): match.Benchmark
Adds
benchmarks/benchmarks/convolution.pywithConvolve2d(numpy/cupy/dask across sizes and kernel sizes) andCircleKernel, so the serial regression cannot silently return.Performance context