The float32 VORTEXM4 SSYRK kernel computes incorrect results.
Here is a repo with a vibe coded minimal reproducer:
https://github.com/ogrisel/repro_pypi_numpy_openblas_float32_matmul/tree/main/openblas_ssyrk_repro
On my M4 laptop I get:
./repro_ssyrk testdata/A.bin .venv/lib/python3.13/site-packages/numpy/.dylibs/libscipy_openblas64_.dylib
OpenBLAS repro: float32 SSYRK vs SGEMM for A @ A.T
matrix: testdata/A.bin (300 x 672, row-major float32)
BLAS: .venv/lib/python3.13/site-packages/numpy/.dylibs/libscipy_openblas64_.dylib
SSYRK (RowMajor, Upper, NoTrans, n=300, k=672, lda=672, ldc=300):
max = 0.000000 expected ~185.7 => FAIL
SGEMM control (RowMajor, NoTrans, Trans, m=300, n=300, k=672):
max = 185.703522 expected ~185.7 => OK
Naive float64 reference GEMM:
max = 185.703644
Reproduced: SSYRK broken, SGEMM OK (matches NumPy issue).
while I get the expected results when forcing the NEOVERSEN1 kernel instead:
OPENBLAS_CORETYPE=NEOVERSEN1 ./repro_ssyrk testdata/A.bin .venv/lib/python3.13/site-packages/numpy/.dylibs/libscipy_openblas64_.dylib
OpenBLAS repro: float32 SSYRK vs SGEMM for A @ A.T
matrix: testdata/A.bin (300 x 672, row-major float32)
BLAS: .venv/lib/python3.13/site-packages/numpy/.dylibs/libscipy_openblas64_.dylib
OPENBLAS_CORETYPE=NEOVERSEN1
SSYRK (RowMajor, Upper, NoTrans, n=300, k=672, lda=672, ldc=300):
max = 185.703613 expected ~185.7 => OK
SGEMM control (RowMajor, NoTrans, Trans, m=300, n=300, k=672):
max = 185.703613 expected ~185.7 => OK
Naive float64 reference GEMM:
max = 185.703644
Both paths OK on this host.
Originally reported at: numpy/numpy#31776
The
float32VORTEXM4SSYRKkernel computes incorrect results.Here is a repo with a vibe coded minimal reproducer:
https://github.com/ogrisel/repro_pypi_numpy_openblas_float32_matmul/tree/main/openblas_ssyrk_repro
On my M4 laptop I get:
while I get the expected results when forcing the
NEOVERSEN1kernel instead:Originally reported at: numpy/numpy#31776