Add AMD GPU support via ROCm/HIP by jeffdaily · Pull Request #94 · MIT-Lu-Lab/cuPDLPx

jeffdaily · 2026-06-19T20:15:55Z

This adds optional AMD GPU support to cuPDLPx through ROCm/HIP, alongside the existing CUDA path. The CUDA build is unchanged when the new option is off.

What changed

The port keeps every device source in its existing CUDA spelling and routes the CUDA APIs to their HIP equivalents through a single compatibility header, internal/cuda_to_hip.h. On a HIP build that header maps the CUDA runtime, cuBLAS, cuSPARSE, and CUB symbols to hipRT, hipBLAS, hipSPARSE, and hipCUB; on a CUDA build it includes the standard CUDA headers, so nothing about the NVIDIA path changes. internal/cusparse_compat.h selects the standard hipsparseSpMV path on ROCm, since hipSPARSE does not provide the cusparseSpMVOp variant.

The build system gains a USE_HIP option (off by default). When enabled, the project is configured with the HIP language, the .cu sources are compiled as HIP, and the targets link hipBLAS, hipSPARSE, and hipCUB instead of the CUDA libraries. GPU architectures are chosen with CMAKE_HIP_ARCHITECTURES (defaulting to gfx90a). On Windows, the CLI-only mps_parser.c is excluded from the core library because it relies on strtok_r.

The interface test gains a case that runs the GPU solver path with presolve disabled, so the hipBLAS/hipSPARSE execution path is exercised end to end. Enabling the test suite (-DCUPDLPX_BUILD_TESTS=ON) to run the new case surfaced four stale zero_tolerance assignments in test_interface.c referencing a matrix_desc_t field that no longer exists; they are removed so the file compiles.

Building for AMD GPUs

cmake -B build -DUSE_HIP=ON -DCMAKE_HIP_ARCHITECTURES=gfx90a -DCMAKE_PREFIX_PATH=/opt/rocm
cmake --build build --clean-first

Set CMAKE_HIP_ARCHITECTURES to match the target GPU (for example gfx90a for MI200, gfx1100 for RDNA3 desktop, or gfx1201 for RDNA4). If the ROCm install is not on CMake's default search path, point -DCMAKE_PREFIX_PATH at it (e.g. /opt/rocm) so find_package can locate hip, hipBLAS, hipSPARSE, and hipCUB. The resulting ./build/cupdlpx binary is invoked exactly as in the CUDA build. The README documents this alongside the CUDA build instructions.

Validation

Built and run on an AMD Instinct MI200 (gfx90a) with ROCm 7.2.1: the full interface test suite passes, including the GPU solver case that exercises the hipBLAS and hipSPARSE paths (Status: OPTIMAL). The same configuration builds cleanly for gfx1100 (RDNA3) and gfx1201 (RDNA4). The CUDA build path is unaffected by these changes.

This work was prepared with assistance from Claude, an AI assistant by Anthropic.

This adds an optional AMD GPU build to cuPDLPx through ROCm/HIP, alongside the existing CUDA path. The CUDA build is unchanged when USE_HIP is off. To review: start with internal/cuda_to_hip.h, which routes the CUDA runtime, cuBLAS, cuSPARSE, and CUB symbols used by the solver to their hipRT, hipBLAS, hipSPARSE, and hipCUB equivalents on a HIP build, and includes the standard CUDA headers otherwise. The device sources keep their CUDA spelling and are compiled as HIP. internal/cusparse_compat.h selects the standard hipsparseSpMV path on ROCm, since hipSPARSE does not provide the cusparseSpMVOp variant. CMakeLists.txt gains a USE_HIP option (off by default). When enabled the project is configured with the HIP language, the .cu sources are compiled as HIP, and the targets link hipBLAS, hipSPARSE, and hipCUB instead of the CUDA libraries. GPU architectures are chosen with CMAKE_HIP_ARCHITECTURES, defaulting to gfx90a. On Windows the CLI-only mps_parser.c is excluded from the core library because it relies on strtok_r. The interface test gains a case that runs the GPU solver path with presolve disabled, exercising the hipBLAS and hipSPARSE execution path end to end. Test Plan: Built and ran on an AMD Instinct MI200 (gfx90a) with ROCm 7.2.1: ``` cmake -B build -DUSE_HIP=ON -DCMAKE_HIP_ARCHITECTURES=gfx90a -DCMAKE_PREFIX_PATH=/opt/rocm \ -DCUPDLPX_BUILD_CLI=ON -DCUPDLPX_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release cmake --build build -j$(nproc) ./build/tests/test_interface ``` The interface suite passes, including the GPU solver case (Status: OPTIMAL). The same configuration builds cleanly for gfx1100 (RDNA3) and gfx1201 (RDNA4); the device code objects are identical across the documentation and formatting commits that followed validation. The CUDA build path is unaffected by these changes. This work was authored with the assistance of Claude, an AI assistant by Anthropic.

The ROCm support commit routed all CUDA/HIP includes through internal/cuda_to_hip.h and pulled it into utils.h and internal_types.h, which are included by the C translation units (cli.c, cupdlpx.c, mps_parser.c, presolve.c). On the CUDA path that header included <cub/device/device_reduce.cuh> unconditionally; cub is C++ only, so the C compiler failed with "unknown type name 'namespace'", breaking every CUDA build job (all Linux and Windows toolchains, CUDA 12.4 through 13.1). The HIP path was unaffected because its hipcub include was already guarded with #ifdef __cplusplus. The fix mirrors that guard on the CUDA branch: the cub header is only included for C++ translation units (the .cu device sources that actually use cub::DeviceReduce). The change is entirely within the #else CUDA branch, so the HIP/ROCm device code is unchanged. Authored with assistance from Claude. Test Plan: reproduced and verified the CUDA path locally with the CUDA 12.8 toolkit (gcc 13, ninja), matching the upstream CI configure: ``` cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release \ -DCUPDLPX_BUILD_TESTS=OFF -DCMAKE_CUDA_ARCHITECTURES=80 cmake --build build --clean-first ``` Before: cc -std=gnu99 -c src/cupdlpx.c fails on cub/device/device_reduce.cuh. After: clean build, links cupdlpx and libcupdlpx.so with 0 errors.

ZedongPeng

Thanks a lot for adding ROCm/HIP support, @jeffdaily. This is a genuinely useful contribution, and I really appreciate the clean implementation. I left a few comments, mostly minor ones related to project consistency.

Addresses review feedback on the ROCm support PR. cuda_to_hip.h already includes the CUDA runtime/cuBLAS/cuSPARSE headers on CUDA builds and their HIP equivalents on ROCm builds, so the per-file `#if !defined(USE_HIP)` include blocks in utils.h and preconditioner.cu were redundant. They are removed in favor of relying on cuda_to_hip.h alone. The USE_HIP definition moves from directory-scoped add_compile_definitions to target_compile_definitions on the cupdlpx_compile_flags interface target so it travels reliably to every consumer, including the Python bindings. test_interface Test 9 previously only checked for a non-NULL result, so it could not catch a wrong answer. It now asserts TERMINATION_REASON_OPTIMAL and the known optimum (objective 3.0 within 1e-4). The default 1e-4 relative tolerance stops the solver around 3.0005, so the test tightens the convergence tolerance to 1e-8, after which the objective reaches the true optimum. On the device-link question: the HIP build does not use relocatable device code (-fgpu-rdc is off), so each object is compiled whole-program and is self-contained; no archive-boundary device link is needed for the static lib. The CUDA path keeps CUDA_SEPARABLE_COMPILATION/CUDA_RESOLVE_DEVICE_SYMBOLS, which it does require. The rationale is now recorded as comments in CMake. Also drops the added per-file copyright/author lines from cuda_to_hip.h to match the project's existing header convention. This work was authored with the assistance of Claude, an AI assistant. Test Plan: Build and run the interface tests on gfx90a (MI250X, ROCm 7.2.1): ``` cmake -S . -B build -DUSE_HIP=ON -DCMAKE_HIP_ARCHITECTURES=gfx90a \ -DCUPDLPX_BUILD_CLI=ON -DCUPDLPX_BUILD_TESTS=ON -DCUPDLPX_BUILD_PYTHON=OFF \ -DCMAKE_BUILD_TYPE=Release cmake --build build -j$(nproc) HIP_VISIBLE_DEVICES=0 ./build/tests/test_interface # 9/9 pass, Test 9 obj=3.000000001 HIP_VISIBLE_DEVICES=0 ./build/cupdlpx 2club200v15p5scn.mps.gz . # OPTIMAL, obj -121.2216698 ``` Build and exercise the ROCm Python extension with a HIP-aware C++ compiler: ``` cmake -S . -B pybuild -DUSE_HIP=ON -DCMAKE_HIP_ARCHITECTURES=gfx90a \ -DCMAKE_CXX_COMPILER=amdclang++ -DCUPDLPX_BUILD_PYTHON=ON -DCMAKE_BUILD_TYPE=Release cmake --build pybuild -j$(nproc) # import + solve -> Status OPTIMAL, ObjVal 3.0, X [1, 2] ```

ZedongPeng · 2026-07-02T18:44:57Z

Thanks again for this, @jeffdaily.

One more thing before we merge: could you add a CI job that builds the HIP/ROCm path? Our workflows currently only exercise the CUDA build, so the HIP path can regress silently whenever someone touches the shared headers or CMake. A compile-only job (-DUSE_HIP=ON, configure + build, no GPU execution) would be enough to guard against that — the GitHub-hosted runners don't have AMD GPUs, so it would mirror how the CUDA jobs build-check without running on device.

jeffdaily added 2 commits June 19, 2026 20:11

ZedongPeng self-requested a review June 22, 2026 18:24

ZedongPeng requested changes Jun 30, 2026

View reviewed changes

Comment thread internal/utils.h Outdated

Comment thread test/test_interface.c

Comment thread CMakeLists.txt Outdated

Comment thread CMakeLists.txt

Comment thread internal/cuda_to_hip.h

ZedongPeng and others added 2 commits June 30, 2026 16:21

Merge branch 'main' into moat-port

991a9b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add AMD GPU support via ROCm/HIP#94

Add AMD GPU support via ROCm/HIP#94
jeffdaily wants to merge 4 commits into
MIT-Lu-Lab:mainfrom
jeffdaily:moat-port

jeffdaily commented Jun 19, 2026

Uh oh!

ZedongPeng left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZedongPeng commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jeffdaily commented Jun 19, 2026

What changed

Building for AMD GPUs

Validation

Uh oh!

ZedongPeng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZedongPeng commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants