Experimental performance family (default-off): load-balance infrastructure, active-box windowing, block-structured AMR, hybrid WENO/Riemann sensors by sbryngelson · Pull Request #1628 · MFlowCode/MFC

sbryngelson · 2026-07-03T19:14:57Z

Summary

This PR adds an opt-in, default-off family of performance features and the measurement
infrastructure they rest on. With all flags at their defaults the only touched production
path is s_mpi_decompose_computational_domain, refactored to compute its equal split
through the new m_box module (byte-identical; covered by the existing suite).

Load-balance infrastructure (common + sim):

m_box: t_box + partition arithmetic; shared by the decomposer, AMR, and the
weighted splitter.
m_load_weight + load_weight_wrt: per-cell load-weight field (active-box, EL-bubble,
IB, phase-change Newton-iteration contributors) with field output and a per-rank
imbalance metric.
m_sfc_partition + sfc_partition_wrt: Morton-SFC tile ordering and chains-on-chains
balanced partition, reported as a predicted-imbalance diagnostic.
m_load_balance + load_balance: experimental weighted static Cartesian decomposition
at init (requires parallel_io), with a min-cells feasibility floor and, when AMR is
on, fine-work-aware weighting with a deterministic feasibility clamp.
m_rank_timing + rank_time_wrt: per-rank compute-time diagnostic (halo exchange
excluded; device-synced on GPU).

Active-box windowing (sim):

m_active_box + active_box: restricts reconstruction/Riemann/RK windows to a
light-cone-grown box around non-ambient flow; a debug tripwire guards under-growth.
Golden-tested (ECABA006) to stay a strict subset while matching the full-domain
solution.

Block-structured AMR (sim):

m_amr + m_amr_registers + amr: a two-level 2:1 refined hierarchy with
conservative restriction / conservative-linear prolongation, per-stage flux registers
with Berger–Colella refluxing, gradient-based dynamic regrid (amr_regrid_int,
amr_tag_eps, amr_buf), optional dt/2 subcycling (amr_subcycle), multi-rank
operation with a mirror-decomposed fine level (patches may span rank boundaries; fine
halo exchange; distributed flux registers; rank-local regrid), and GPU builds
(device-resident fine fields and registers, on-device ghost fill/RK/restriction).
Requires WENO, SSP-RK3, model_eqns=2, single fluid (checker-enforced).

Hybrid reconstruction/flux sensors (sim):

hybrid_weno (+hybrid_weno_eps): linear-optimal reconstruction in smooth cells, full
WENO only at flagged discontinuities (Jameson-type density+pressure sensor,
stencil-dilated, halo-aware).
hybrid_riemann (+hybrid_smooth_flux): cheap central/Rusanov flux in smooth cells,
full HLLC at discontinuities (5- and 6-equation blocks).

Motivation

Measured rank imbalance on heterogeneous-cost workloads (bubbles, IB, phase change)
motivates first-class measurement tools; the active box and hybrid sensors give direct
speedups on localized-flow / mostly-smooth cases; AMR concentrates resolution where the
flow needs it, and the load-balance coupling keeps the refined work spread across ranks.

Testing

Four new golden tests: 5ECBB926 (AMR static patch), 1CBACEB5 (AMR dynamic regrid),
852CCB81 (AMR subcycling), ECABA006 (active_box 3D strict-subset).
Locally verified on gfortran/CPU: three-target build; full precheck; the 4 new goldens
- 4 existing goldens (incl. both periodic-IBM cases, exercising the Fix periodic ib issues #1618 merge) all
  pass; load_balance+amr np=2 end-to-end smoke produces the analytically predicted
  weighted offsets and completes; amr np=2 spanning-patch run completes.
GPU: nvfortran OpenACC build of the simulation target (the GPU-accelerated
executable) verified locally (see PR checks for the full matrix).
All parameters default off; case_validator entries, case.md docs, and
module_categories are included.

Known-untested configurations

Delegated to CI: Cray ftn, Intel ifx, AMD flang, OpenMP target offload, single/mixed
precision. Hybrid WENO/Riemann ship without a dedicated golden case (flagging for
reviewer judgment; the sensors are default-off and checker-guarded).

Review guide

The 75 commits are arc-ordered and cleanly arc-separable — reviewing by arc is much
easier than by file:

2760da7d…2bb5fdc4 active-box (11)
bbf6b2a9…14b837c6 load-weight field + contributors (8)
0161fac0…2795e266 SFC partition diagnostic (6)
6df9c1f0…c43c02a5 weighted decomposition (load_balance) (8)
95398eb3…cc7882d1 rank timing (4)
21c60ffa…5082b535 hybrid WENO/Riemann (10)
74b58771…de244407 m_box refactor + validation hygiene (4)
352f564e…03b59516 AMR: static hierarchy → restriction/prolongation → fine advance →
refluxing → regrid → subcycling → multi-rank → GPU → mirror decomposition →
load-balance coupling (20)
a1a7e3ad merge of upstream/master (num_procs_x/y/z promotion adopted from Fix periodic ib issues #1618)

Addendum: features added after the initial draft

Multi-fluid AMR (5-equation multi-component): per-fluid conservative reflux (per-fluid mass defects ~1e-15 through refluxed+subcycled+regridding advance), sum-preserving volume-fraction prolongation (n−1 fractions + closure), mpp_lim required for num_fluids > 1; shock–material-interface demo validated. Known bounded limitation: alpha-sum deviation up to ~5.7e-3 at coarse cells historically hosting a patch face during shock crossing (non-growing, mpp_lim-damped; the volume-fraction K-term is deliberately not refluxed — it is non-conservative).
3D validation: free-stream exact (0.0) in 3D with subcycling+regrid armed; 3D blast regrid-tracked at ~1e-14 defects; np=2 seam-spanning element-exact; code audit found no y/z asymmetry.
Golden coverage is now 5 cases: 1D static / 1D regrid / 1D subcycle / 3D static / 1D two-fluid.

Further additions

AMR restart (SP10): fine-level save/restore with regridded-box persistence, both IO modes (serial per-rank + parallel MPI-IO); restart continuation element-exact (save-then-restart == straight run), same-num_procs required (np-flexible restart is future work).
Viscous AMR (SP11): viscous prohibition lifted; viscous stress/work refluxed through the existing registers (enters rhs as a flux_src_n face-flux difference, same form as advective flux) so coarse/fine boundaries match total flux; energy conservation 0.0, accuracy triplet coarse 2.49e-4 ≫ two-level 6.89e-5 ≈ fine 5.04e-5. A fine-ghost-coordinate bug (viscous gradient using stale coarse dx at the fine subdomain/patch edge — invisible to WENO, which uses only interior dx) was found by an np=2 exactness probe and fixed; the fine viscous seam is now byte-exact across ranks. Residual: a bounded (~1e-6) np-dependence remains only at the coarse/fine patch boundary from prolongation-derived ghost gradients (AMR's inherently-approximate coupling zone); the density-gradient tagger senses shear poorly (buffered/static patch recommended; error-estimator taggers are future work).

Multi-block AMR + terminology

Multi-block AMR (SP12a): tagged cells are clustered (Berger–Rigoutsos + a min-separation merge) into a LIST of separated refinement blocks, so multiple separated features (e.g. a shock and a contact that have separated) each get their own tight block instead of one bounding box wastefully spanning both plus the smooth gap between — measured 66% fewer cells refined on a two-interface Sod, conservation and np=1==np=2 element-exactness preserved. amr_max_blocks (default 4; N fixed-size slots, ~N× device memory — compute efficiency is the goal, memory efficiency a follow-up), amr_cluster_eff (default 0.7). Fine blocks stay ≥ buff_size apart ⇒ no fine–fine coupling; all existing per-block machinery (multi-rank, GPU, subcycle, viscous, multi-fluid) loops over the block list unchanged.
Terminology: AMR's refinement regions are now called blocks (amr_block_beg/end, amr_max_blocks) — disambiguated from MFC's initial-condition patch_icpp. (Draft-stage rename; golden values unchanged.)

Euler-Euler bubbles under AMR (SP13)

Bubbly-flow AMR (monodisperse, polytropic — the simplest Euler-Euler config): bubble moments (in q_cons) are refluxed by the existing register machinery; prolongation is realizability-preserving (radius moment nR > 0 maintained across coarse→fine, analogous to the multi-fluid volume-fraction closure). Validated: conservation defects ~1e-15 through refluxed+subcycled+regridding advance, moments stay realizable, AMR beats the coarse solution, np=1==np=2 element-exact. Non-polytropic / QBMM / polydisperse / Lagrangian bubbles remain explicitly gated (future work — non-polytropic additionally needs per-block pb/mv handling).

Phase-change (relax) under AMR (SP15)

Phase-change / pressure relaxation now works under AMR: the per-cell equilibration (relax) runs on each fine block before restriction (a new s_amr_relax_fine), so the refined solution is properly relaxed. Cell-local — no reflux, no c/f coupling. Machine-precision conservation, free-stream preserved, np=1==np=2 bit-exact. Config: model_eqns=2, relax=T, num_fluids>1, mpp_lim=T.

Validation hardening (blind spots closed)

GPU correctness of all AMR physics: the AMR goldens (which exercise two-fluid, viscous, bubbles, multi-block, phase-change) were executed on NVIDIA GPU (V100, OpenACC/nvhpc) and match the CPU results within tolerance — 9/9. Every physics rung is GPU-correct, not merely GPU-compilable.
Cross-feature interaction coverage: added combined-feature goldens — viscous+multifluid+multiblock+subcycle and bubbles+multiblock+subcycle+regrid — both conserve to ~1e-16, with the viscous+multifluid+multiblock case verified np=1==np=2 element-exact (reflux + fine halo + multi-block + physics all correct together under MPI).

Chemistry under AMR (SP16) + surface-tension limitation

Chemistry (reactions + advection) under AMR, multi-rank: species mass fractions get a sum/positivity-preserving prolongation closure (like the multi-fluid volume-fraction closure); reactions run per-cell on the fine blocks; a chemistry+AMR temperature-ghost MPI-exchange bug (uninitialized seam-ghost temperature → NaN in the reacting-EOS Newton solve at rank boundaries) was found and fixed, so np=1==np=2 is element-exact (species bit-identical). Machine-precision conservation, realizability exact. Added the suite's first multi-rank AMR golden.
Surface tension is explicitly unsupported under AMR (documented prohibit): the capillary force depends on the interface normal, which the prolonged fine ghost color cannot reproduce consistently across a 2:1 coarse/fine boundary; three fixes were attempted and diagnosed (conservation is structural, but the seam force imbalance is not tamable by fine-block-only corrections). See the AMR docs section.

Further physics rungs (SP17–SP20)

Chemistry species diffusion under AMR (SP17): species diffusion now works under AMR — its flux_src is captured by the coarse/fine flux registers (mirroring the viscous path) and refluxed, and the temperature ghost is exchanged at rank seams (the same broadening the reactions fix required). Removed the diffusion prohibit; np=1==np=2 element-exact, machine-precision conservation.
Non-polytropic + polydisperse bubbles under AMR (SP18): Euler-Euler bubble support (SP13, previously monodisperse+polytropic only) now covers non-polytropic and polydisperse (nb ≥ 1) configs, with a per-block moment-realizability floor applied to all positive moments on prolongation. Conservation is machine-precision for polytropic; the non-polytropic source-term model carries a ~7e-10 defect that is np-invariant (identical np=1/np=2 — a model property, not an AMR decomposition leak). Removed the polytropic/monodisperse gates.
QBMM bubbles under AMR (SP19): polytropic QBMM is supported — the CHyQMOM 6-moment set lives entirely in q_cons and is injected piecewise-constant at prolongation so every fine/ghost child inherits a realizable moment set (variance c20 > 0), keeping the inversion NaN-free; moments reflux/restrict on the standard conservative path. np=2 element-exact, conservation ~1e-15. Non-polytropic QBMM stays gated (its pb/mv quadrature side-state is a global array the fine advance would corrupt through the swap).
Static immersed boundaries under AMR (SP20): a fixed, single, non-STL body on a static block (amr_regrid_int = 0) is now resolved on the refined level — each fine block carries its own fine-grid IB markers/ghost points computed from the geometry, and the fine advance applies the ghost-cell IB correction per RK stage. The IB forcing is non-conservative by construction at the body (ghost-cell method), while the flux reflux still conserves to machine precision away from it. Moving/multi-body/STL/dynamic-regrid-with-IB remain gated. A body straddling a rank seam is rejected at startup (the fine-IB image-point stencil across the seam is not yet decomposition-exact) rather than silently producing a small surface error — keep the body within a single rank's subdomain.

…avior change)

…port

…oke-point

…zen-ambient)

…at save step

…-clause)

…, correct growth comment)

…ehavior change)

…inal-review fixes)

…behavior change)

…otal fluxes

…xactness)

…ay); num_patches=1 behavior-identical

…ion merge, per-slot advance (SP12a)

…sambiguate from IC patches; golden values unchanged

…moment prolongation, per-block bubble state

…ation on the fine level

…iblock+subcycle, bubbles+subcycle+regrid)

…ts closed)

…hibit (diffuse-interface c/f normal inconsistency; 3 attempts diagnosed)

… closure, per-block reactions, multi-rank temperature-ghost exchange (diffusion gated)

…docs into PR branch

…the c/f boundary; lift the diffusion gate

…ck moment realizability

…c/polydisperse bubbles now supported

…olydisperse, SP18) + AMR docs

…, realizability-preserving prolongation

…rkers/ghost points, body-driven tagging

…e-point stencil not decomposition-exact there)

…dle now aborts

…rry-pick

# Conflicts: # src/simulation/m_ibm.fpp

sbryngelson added 30 commits June 28, 2026 17:18

feat(sim): add active_box parameter and m_active_box skeleton (no beh…

2760da7

…avior change)

fix(sim): export ab_ambient from m_active_box per documented interface

f257117

feat(sim): detect ambient state and initialize active box from IC sup…

2078c55

…port

feat(sim): grow active box by light-cone with single device-update ch…

5891b94

…oke-point

feat(sim): restrict RK update and convert window to the active box

19c06a2

feat(sim): restrict reconstruction and Riemann windows to the active box

0beb6b1

feat(sim): add debug envelope tripwire for active-box under-growth

c79c45b

fix(sim): active-box tripwire checks inner margin (outer layer is fro…

1896baa

…zen-ambient)

test(sim): golden regression for active_box on a 3D shock case

a72f6b5

test(sim): strengthen active_box golden so box stays a strict subset …

1f37c90

…at save step

fix(sim): make active-box convert bounds device-resident (GPU present…

e1f7ad5

…-clause)

fix(sim): active-box final-review fixes (init ab_active, tighten gate…

2bb5fdc

…, correct growth comment)

feat(sim): add load_weight_wrt param and m_load_weight skeleton (no b…

bbf6b2a

…ehavior change)

feat(sim): load_weight base + active-box contributor with field output

e474a22

feat(sim): per-rank load-imbalance metric

796f884

feat(sim): bubble (EE/EL) load-weight contributors

fa1629b

fix(sim): gate bubble load-weight contributions by the active box

e54c161

feat(sim): IB load-weight contributor

7c2fb22

feat(sim): phase-change Newton-iteration load-weight contributor

a12b5f5

fix(sim): load-weight metric on parallel_io path + EE reads q_cons (f…

14b837c

…inal-review fixes)

feat(sim): add sfc_partition params and m_sfc_partition skeleton (no …

0161fac

…behavior change)

fix(sim): export m_sfc_partition state + guard partition_tile_size>=1

5b30010

feat(sim): aggregate per-cell load weight into global tile weights

71b503d

feat(sim): Morton space-filling-curve tile ordering

0d2ac75

feat(sim): chains-on-chains balanced contiguous SFC partition

0ba87e2

feat(sim): report SFC-partition predicted imbalance + call wiring

f7617e8

fix(sim): self-contained SFC report (current + predicted + gain)

2795e26

spike(sim): prove weighted re-decompose + re-read mechanism

6df9c1f

feat(sim): weighted-split function for load-balanced decomposition

5570a71

feat(sim): global axis marginals from per-cell load weight

8c01acb

sbryngelson added 30 commits July 3, 2026 16:58

feat(sim): viscous AMR (SP11) - viscous flux registers, c/f-matched t…

3003068

…otal fluxes

fix(sim): AMR fine ghost coordinates for viscous stencil (SP11 np=2 e…

6129592

…xactness)

merge: viscous AMR (SP11) + fine-ghost-coordinate fix into the PR branch

3c6e77d

refactor(sim): AMR patch slots (amr_fine -> amr_slots, fixed pool arr…

931470d

…ay); num_patches=1 behavior-identical

merge: AMR patch-slot infrastructure (SP12a T1) into PR branch

bc32128

feat(sim): AMR multi-patch - Berger-Rigoutsos clustering, min-separat…

8bb32ed

…ion merge, per-slot advance (SP12a)

refactor: rename AMR "patch" -> "block" (params, symbols, docs) to di…

11d46e3

…sambiguate from IC patches; golden values unchanged

merge: AMR multi-patch (SP12a) + patch->block rename into PR branch

fc65bad

feat(sim): Euler-Euler bubbles AMR (SP13) - realizability-preserving …

7bf7bee

…moment prolongation, per-block bubble state

merge: Euler-Euler bubbles AMR (SP13) into PR branch

00c9378

feat(sim): phase-change (relax) AMR (SP15) - per-block pressure relax…

7bc7cde

…ation on the fine level

merge: phase-change (relax) AMR (SP15) into PR branch

17a8e51

test(sim): AMR cross-feature golden coverage (viscous+multifluid+mult…

a429c68

…iblock+subcycle, bubbles+subcycle+regrid)

merge: AMR GPU-correctness + cross-feature golden coverage (blind spo…

b816bc3

…ts closed)

docs(sim): document AMR+surface_tension limitation as an explicit pro…

5b72e1d

…hibit (diffuse-interface c/f normal inconsistency; 3 attempts diagnosed)

feat(sim): chemistry AMR (SP16) - species sum/positivity prolongation…

86c1038

… closure, per-block reactions, multi-rank temperature-ghost exchange (diffusion gated)

merge: chemistry AMR (SP16, multi-rank) + surface-tension limitation …

37265de

…docs into PR branch

docs: AMR feature documentation page

267660c

merge: AMR documentation page into PR branch

8354b1f

feat(sim): chemistry diffusion under AMR (SP17) - flux_src reflux at …

cdf9239

…the c/f boundary; lift the diffusion gate

feat(sim): non-polytropic + polydisperse bubbles AMR (SP18) - per-blo…

dbb1331

…ck moment realizability

docs: update AMR support matrix - chemistry diffusion + non-polytropi…

f16d52d

…c/polydisperse bubbles now supported

merge: finish chemistry (diffusion, SP17) + bubbles (non-polytropic/p…

5522823

…olydisperse, SP18) + AMR docs

feat(sim): QBMM bubbles AMR (SP19) - fine-resolution moment inversion…

1a5193d

…, realizability-preserving prolongation

feat(sim): static immersed-boundary AMR (SP20) - per-block fine IB ma…

21d5965

…rkers/ghost points, body-driven tagging

fix(sim): gate IB body straddling a rank seam under AMR (fine-IB imag…

49db985

…e-point stencil not decomposition-exact there)

docs(amr): support matrix for QBMM-polytropic + static IB; seam-strad…

e035a10

…dle now aborts

fix(test): remove stray merge marker left in cases.py during SP20 che…

2dacb4b

…rry-pick

style: format m_checker AMR prohibit block after SP19/SP20 fold

83a6f01

merge: QBMM (SP19) + static IB (SP20) AMR into PR branch

b4463ef

# Conflicts: # src/simulation/m_ibm.fpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experimental performance family (default-off): load-balance infrastructure, active-box windowing, block-structured AMR, hybrid WENO/Riemann sensors#1628

Experimental performance family (default-off): load-balance infrastructure, active-box windowing, block-structured AMR, hybrid WENO/Riemann sensors#1628
sbryngelson wants to merge 111 commits into
MFlowCode:masterfrom
sbryngelson:up/mega

sbryngelson commented Jul 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sbryngelson commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Testing

Known-untested configurations

Review guide

Addendum: features added after the initial draft

Further additions

Multi-block AMR + terminology

Euler-Euler bubbles under AMR (SP13)

Phase-change (relax) under AMR (SP15)

Validation hardening (blind spots closed)

Chemistry under AMR (SP16) + surface-tension limitation

Further physics rungs (SP17–SP20)

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

sbryngelson commented Jul 3, 2026 •

edited

Loading