Skip to content

float32 SSYRK wrong on ARM64 VORTEXM4 (RowMajor, Upper, NoTrans) #5873

Description

@ogrisel

The float32 VORTEXM4 SSYRK kernel computes incorrect results.

Here is a repo with a vibe coded minimal reproducer:

https://github.com/ogrisel/repro_pypi_numpy_openblas_float32_matmul/tree/main/openblas_ssyrk_repro

On my M4 laptop I get:

./repro_ssyrk testdata/A.bin .venv/lib/python3.13/site-packages/numpy/.dylibs/libscipy_openblas64_.dylib
OpenBLAS repro: float32 SSYRK vs SGEMM for A @ A.T
  matrix: testdata/A.bin (300 x 672, row-major float32)
  BLAS:   .venv/lib/python3.13/site-packages/numpy/.dylibs/libscipy_openblas64_.dylib

SSYRK (RowMajor, Upper, NoTrans, n=300, k=672, lda=672, ldc=300):
  max = 0.000000  expected ~185.7  => FAIL

SGEMM control (RowMajor, NoTrans, Trans, m=300, n=300, k=672):
  max = 185.703522  expected ~185.7  => OK

Naive float64 reference GEMM:
  max = 185.703644

Reproduced: SSYRK broken, SGEMM OK (matches NumPy issue).

while I get the expected results when forcing the NEOVERSEN1 kernel instead:

OPENBLAS_CORETYPE=NEOVERSEN1 ./repro_ssyrk testdata/A.bin .venv/lib/python3.13/site-packages/numpy/.dylibs/libscipy_openblas64_.dylib
OpenBLAS repro: float32 SSYRK vs SGEMM for A @ A.T
  matrix: testdata/A.bin (300 x 672, row-major float32)
  BLAS:   .venv/lib/python3.13/site-packages/numpy/.dylibs/libscipy_openblas64_.dylib
  OPENBLAS_CORETYPE=NEOVERSEN1

SSYRK (RowMajor, Upper, NoTrans, n=300, k=672, lda=672, ldc=300):
  max = 185.703613  expected ~185.7  => OK

SGEMM control (RowMajor, NoTrans, Trans, m=300, n=300, k=672):
  max = 185.703613  expected ~185.7  => OK

Naive float64 reference GEMM:
  max = 185.703644

Both paths OK on this host.

Originally reported at: numpy/numpy#31776

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions