Perf: faster get_largest_connected_component_mask (bincount + LUT gather) by aymuos15 · Pull Request #8978 · Project-MONAI/MONAI

aymuos15 · 2026-07-02T21:37:31Z

Fixes #8974 .

Description

get_largest_connected_component_mask (monai/transforms/utils.py) ranked component sizes by gathering every non-background voxel through lib.nonzero(features) and then built the mask with lib.isin over the whole label field. Both allocate large transient arrays. This counts labels with a single full-field lib.bincount (zeroing index 0 to drop background), and builds the mask with a boolean lookup-table gather (keep[features]) instead of isin. Output is bit-identical, verified across 2D and 3D at several foreground fractions and component counts, and the numpy and cupy/cucim paths are covered unchanged.

The isin replacement is the dominant win on both time and peak memory; the bincount change removes the redundant nonzero index arrays and gathered copy on top.

case	metric	before	after	improvement
3D 192^3, fg 50%	time	82.0 ms	8.6 ms	9.5x
3D 192^3, fg 50%	peak transient mem	108.0 MB	7.2 MB	15x
2D 1024^2, fg 60%	time	21.5 ms	1.8 ms	11.7x
2D 1024^2, fg 60%	peak transient mem	18.0 MB	1.4 MB	12.9x
3D 128^3, fg 5% (48k comps)	time	10.7 ms	6.2 ms	1.7x

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).

Rank component sizes with a full-field bincount (zeroing background at index 0) instead of gathering every non-background voxel through lib.nonzero, and build the output mask with a boolean lookup-table gather instead of lib.isin over the whole label field. Both avoid large transient allocations; output is bit-identical. Covers the numpy and cupy/cucim paths unchanged. Signed-off-by: Soumya Snigdha Kundu <soumya_snigdha.kundu@kcl.ac.uk>

coderabbitai · 2026-07-02T21:37:58Z

📝 Walkthrough

Walkthrough

The change refactors get_largest_connected_component_mask in monai/transforms/utils.py to compute label voxel counts via bincount on flattened features, zero out the background count, rank labels by descending count via argsort, and build a boolean keep lookup table indexed directly into the feature map, replacing the prior nonzero-based extraction and isin masking approach.

Estimated code review effort: 2 (Simple) | ~10 minutes

Changes

File	Summary
monai/transforms/utils.py	Replaced nonzero/isin-based component selection with bincount-based ranking and boolean indexing

Related issues: Addresses #8974 (materialized redundant nonzero index arrays before bincount).

Related PRs: None noted.

Suggested labels: performance, transforms

Suggested reviewers: MONAI transforms maintainers

Poem:
Bincount hops, no nonzero fuss,
Background zeroed, no extra muss.
Argsort ranks the counts with speed,
A boolean table's all we need.
Same result, less memory greed. 🐇

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly names the affected function and the performance-focused change.
Description check	✅ Passed	The PR description includes the fix reference, summary, and change-type checklist, with only non-critical items left unchecked.
Linked Issues check	✅ Passed	The code matches `#8974` by counting over the full label field and zeroing background before ranking components.
Out of Scope Changes check	✅ Passed	The extra lookup-table mask build is still part of the same function-level optimization and not unrelated scope.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

monai/transforms/utils.py (1)

1227-1235: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Use argpartition instead of full argsort for top-k selection.

lib.argsort(counts)[::-1][:num_components] sorts the entire counts array just to pick the top num_components. Since only the largest few are needed, argpartition avoids the full O(n log n) sort and better matches this PR's own perf-optimization goal.

♻️ Proposed refactor

-        # argsort[::-1] gives labels of the largest components; keep the first n
-        features_to_keep = lib.argsort(counts)[::-1][:num_components]
+        # argpartition avoids a full sort when only the top-n labels are needed
+        features_to_keep = lib.argpartition(counts, -num_components)[-num_components:]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@monai/transforms/utils.py` around lines 1227 - 1235, The top-k selection in
the label-count ranking is doing a full sort via
lib.argsort(counts)[::-1][:num_components], which is unnecessary for selecting
only the largest components. Update the selection logic in the component-ranking
code around the bincount/counts handling to use lib.argpartition for the top
num_components labels, then keep the existing boolean lookup-table flow
(keep/features_to_keep/out) to preserve behavior while improving performance.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@monai/transforms/utils.py`:
- Around line 1227-1235: The KeepLargestConnectedComponent ranking path in
utils.py is not covered for equal-sized connected components, so the current
argsort-based selection can change silently. Add a regression test for the
KeepLargestConnectedComponent behavior with tie-sized components that exercises
the ranking branch and asserts the chosen components remain deterministic; use
the same multi-component setup already used in the existing connected-component
tests to locate the relevant case.

---

Nitpick comments:
In `@monai/transforms/utils.py`:
- Around line 1227-1235: The top-k selection in the label-count ranking is doing
a full sort via lib.argsort(counts)[::-1][:num_components], which is unnecessary
for selecting only the largest components. Update the selection logic in the
component-ranking code around the bincount/counts handling to use
lib.argpartition for the top num_components labels, then keep the existing
boolean lookup-table flow (keep/features_to_keep/out) to preserve behavior while
improving performance.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 26c10d81-e6f4-4fe3-b7ad-e8038af614ce

📥 Commits

Reviewing files that changed from the base of the PR and between f1dcac4 and 6c354ca.

📒 Files selected for processing (1)

monai/transforms/utils.py

coderabbitai · 2026-07-02T21:41:37Z

+        # bincount counts every label; index 0 is background, so drop it before ranking
+        counts = lib.bincount(features.reshape(-1))
+        counts[0] = 0
+        # argsort[::-1] gives labels of the largest components; keep the first n
+        features_to_keep = lib.argsort(counts)[::-1][:num_components]
+        # boolean lookup-table gather over the label field, cheaper than isin
+        keep = lib.zeros(counts.shape[0], dtype=bool)
+        keep[features_to_keep] = True
+        out = keep[features]


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find tests covering get_largest_connected_component_mask with multi-component scenarios rg -n "get_largest_connected_component_mask|num_components" --type=py -g '*test*'

Repository: Project-MONAI/MONAI

Length of output: 157

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '\n== Implementation locations ==\n' rg -n "def get_largest_connected_component_mask|get_largest_connected_component_mask\(" monai -g '*.py' printf '\n== Candidate test mentions ==\n' rg -n "largest_connected_component|connected_component|num_components|bincount|argsort|isin" tests monai -g '*test*.py' -g '*.py' || true printf '\n== File inventory around tests ==\n' fd -a -t f 'test.*\.py$|.*_test\.py$' tests monai 2>/dev/null | head -n 200

Repository: Project-MONAI/MONAI

Length of output: 50376

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '\n== Implementation slice ==\n' sed -n '1191,1260p' monai/transforms/utils.py printf '\n== Test file slice ==\n' sed -n '300,430p' tests/transforms/test_keep_largest_connected_component.py

Repository: Project-MONAI/MONAI

Length of output: 7866

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '\n== Test file top ==\n' sed -n '1,140p' tests/transforms/test_keep_largest_connected_component.py printf '\n== grid_6 definition and nearby fixtures ==\n' rg -n "grid_6|grid_4|grid_1" tests/transforms/test_keep_largest_connected_component.py python3 - <<'PY' import ast, pathlib path = pathlib.Path('tests/transforms/test_keep_largest_connected_component.py') src = path.read_text() tree = ast.parse(src) for node in ast.walk(tree): if isinstance(node, ast.Assign): for t in node.targets: if isinstance(t, ast.Name) and t.id == 'grid_6': print('\n== Parsed grid_6 value ==') print(ast.literal_eval(node.value)) PY

Repository: Project-MONAI/MONAI

Length of output: 5570

🏁 Script executed:

#!/bin/bash set -euo pipefail python3 - <<'PY' from collections import deque grid = [[0, 0, 1, 1, 0, 0, 1], [0, 0, 0, 1, 0, 0, 1], [1, 1, 0, 0, 1, 0, 1], [0, 0, 0, 1, 0, 0, 1]] H, W = len(grid), len(grid[0]) for conn in (4, 8): seen = [[False]*W for _ in range(H)] sizes = [] for i in range(H): for j in range(W): if grid[i][j] == 0 or seen[i][j]: continue q = deque([(i,j)]) seen[i][j] = True size = 0 while q: x,y = q.popleft() size += 1 for dx,dy in ([(-1,0),(1,0),(0,-1),(0,1)] if conn==4 else [(-1,-1),(-1,0),(-1,1),(0,-1),(0,1),(1,-1),(1,0),(1,1)]): nx, ny = x+dx, y+dy if 0 <= nx < H and 0 <= ny < W and not seen[nx][ny] and grid[nx][ny] != 0: seen[nx][ny] = True q.append((nx, ny)) sizes.append(size) print(conn, sorted(sizes, reverse=True), "num_components=", len(sizes)) PY

Repository: Project-MONAI/MONAI

Length of output: 223

Add a tie-case regression for KeepLargestConnectedComponent. The current tests already hit the ranking branch with multi-component inputs, but none cover equal-sized components, so the new argsort ordering stays unpinned.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@monai/transforms/utils.py` around lines 1227 - 1235, The KeepLargestConnectedComponent ranking path in utils.py is not covered for equal-sized connected components, so the current argsort-based selection can change silently. Add a regression test for the KeepLargestConnectedComponent behavior with tie-sized components that exercises the ranking branch and asserts the chosen components remain deterministic; use the same multi-component setup already used in the existing connected-component tests to locate the relevant case.

Source: Path instructions

aymuos15 requested review from KumoLiu, Nic-Ma and ericspod as code owners July 2, 2026 21:37

coderabbitai Bot reviewed Jul 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perf: faster get_largest_connected_component_mask (bincount + LUT gather)#8978

Perf: faster get_largest_connected_component_mask (bincount + LUT gather)#8978
aymuos15 wants to merge 1 commit into
Project-MONAI:devfrom
aymuos15:perf/largest-cc-bincount-lut

aymuos15 commented Jul 2, 2026

Uh oh!

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

aymuos15 commented Jul 2, 2026

Description

Types of changes

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading