Commit Graph

2705 Commits

Author SHA1 Message Date
teddygood
6bb0dbfd3c Use generic SDOT kernel for WASM128_GENERIC 2026-03-19 13:54:58 +09:00
teddygood
99d05575d0 Enable SAXPY for WebAssembly SIMD backend 2026-03-18 21:27:45 +09:00
teddygood
7ff3588833 Refine WebAssembly SIMD backend scope 2026-03-18 17:24:02 +09:00
teddygood
53d0be88f8 Add WebAssembly SIMD backend for universal intrinsics 2026-03-18 03:23:31 +09:00
Martin Kroeker
7a95460bb1 Merge pull request #5680 from teddygood/wasm128-generic-target-exp
Some checks failed
apple m / build (cmake, gfortran, 0, 0) (push) Has been cancelled
apple m / build (cmake, gfortran, 0, 1) (push) Has been cancelled
apple m / build (cmake, gfortran, 1, 0) (push) Has been cancelled
apple m / build (cmake, gfortran, 1, 1) (push) Has been cancelled
apple m / build (make, gfortran, 0, 0) (push) Has been cancelled
apple m / build (make, gfortran, 0, 1) (push) Has been cancelled
apple m / build (make, gfortran, 1, 0) (push) Has been cancelled
apple m / build (make, gfortran, 1, 1) (push) Has been cancelled
arm64 graviton cirun / build (cmake, gfortran) (push) Has been cancelled
arm64 graviton cirun / build (make, gfortran) (push) Has been cancelled
c910v qemu test / TEST (riscv64-linux-gnu, NO_SHARED=1 TARGET=C910V, C910V, riscv64-unknown-linux-gnu) (push) Has been cancelled
c910v qemu test / TEST (riscv64-linux-gnu, NO_SHARED=1 TARGET=RISCV64_GENERIC, RISCV64_GENERIC, riscv64-linux-gnu) (push) Has been cancelled
Run codspeed benchmarks / benchmarks (make, gfortran, ubuntu-22.04, 3.12) (push) Has been cancelled
Publish docs via GitHub Pages / Deploy docs (push) Has been cancelled
continuous build / build (cmake, clang, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, macos-latest) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang-21, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang-21, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, clang-21, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, gcc, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, gcc, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, gcc, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang, gfortran, macos-latest) (push) Has been cancelled
continuous build / build (make, clang, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, clang, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang-21, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang-21, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, clang-21, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, gcc, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, gcc, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, gcc, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / msys2 (None, fc, int32, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, CLANG64, mingw-w64-clang-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, MINGW32, mingw-w64-i686) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int64, -DBINARY=64 -DINTERFACE64=1, CLANG64, mingw-w64-clang-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int64, -DBINARY=64 -DINTERFACE64=1, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / cross_build (DYNAMIC_ARCH=1 TARGET=GENERIC, mips64el, mips64el-linux-gnuabi64) (push) Has been cancelled
continuous build / cross_build (TARGET=EV4, alpha, alpha-linux-gnu) (push) Has been cancelled
continuous build / cross_build (TARGET=MIPS1004K, mipsel, mipsel-linux-gnu) (push) Has been cancelled
continuous build / cross_build (TARGET=RISCV64_GENERIC, riscv64, riscv64-linux-gnu) (push) Has been cancelled
continuous build / neoverse_build (push) Has been cancelled
harmonyos / build (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=GENERIC, DYNAMIC_ARCH, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA264, LA264, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA464, LA464, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA64_GENERIC, LA64_GENERIC, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON2K1000, LOONGSON2K1000, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON3R5, LOONGSON3R5, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSONGENERIC, LOONGSONGENERIC, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=GENERIC, DYNAMIC_ARCH) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA264, LA264) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA464, LA464) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA64_GENERIC, LA64_GENERIC) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON2K1000, LOONGSON2K1000) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON3R5, LOONGSON3R5) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSONGENERIC, LOONGSONGENERIC) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=I6400, I6400, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=I6500, I6500, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=MIPS64_GENERIC, MIPS64_GENERIC, mips64el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=P6600, P6600, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=SICORTEX, SICORTEX, mips64el-linux-gnuabi64) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_GENERIC BINARY=64 ARCH=riscv64 DYNAMIC_ARCH=1, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64, DYNAMIC_ARCH=1) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_ZVL128B BINARY=64 ARCH=riscv64, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=128,elen=64, RISCV64_ZVL128B) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_ZVL256B BINARY=64 ARCH=riscv64 BUILD_BFLOAT16=1 BUILD_HFLOAT16=1, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64,zfh=true,zvfh=true,zvfbfwma=true, RISCV64_ZVL256B) (push) Has been cancelled
Windows ARM64 CI / build (push) Has been cancelled
Nightly-Homebrew-Build / build-OpenBLAS-with-Homebrew (push) Has been cancelled
Add WebAssembly SIMD SGEMM and DGEMM kernels
2026-03-17 14:25:39 +01:00
Martin Kroeker
a1fd7a4658 Merge pull request #5677 from CheryDan/riscv/zdrot
Optimize ZROT_RVV for the non-unit-stride case
2026-03-17 11:10:32 +01:00
teddygood
86d1451cbe Add WebAssembly SIMD GEMM kernels 2026-03-17 05:51:54 +09:00
daichengrong
aa967ef6ba Optimize ZROT_RVV for the non-unit-stride case
Optimize the RVV implementation of ZROT when inc_x and inc_y are
non-unit strides (inc_x != 1, inc_y != 1).

Reorder several operations to reduce vector register pressure and
avoid unnecessary vector register spill to the stack. This helps GCC
keep vector values in registers and reduces redundant spill/reload
instructions, improving runtime performance.

No functional change.

Signed-off-by: daichengrong <daichengrong@iscas.ac.cn>
2026-03-16 14:22:54 +08:00
Martin Kroeker
4a888bcb73 set USE_TRMM for WASM 2026-03-15 23:07:16 +01:00
Martin Kroeker
ef3315527f Don't include the CPUID code in WebAssembly builds 2026-03-15 19:30:13 +01:00
Martin Kroeker
48f0a0f0ec Generate WASM kernel including existing intrinsics-based kernels 2026-03-15 19:28:08 +01:00
Martin Kroeker
cc64ce68c3 Create generic C KERNEL as baseline for WASM 2026-03-15 19:26:42 +01:00
Martin Kroeker
37262654d9 Merge pull request #5667 from fadara01/accelerate_sve128_sbgemm
Some checks failed
apple m / build (cmake, gfortran, 0, 0) (push) Has been cancelled
apple m / build (cmake, gfortran, 0, 1) (push) Has been cancelled
apple m / build (cmake, gfortran, 1, 0) (push) Has been cancelled
apple m / build (cmake, gfortran, 1, 1) (push) Has been cancelled
apple m / build (make, gfortran, 0, 0) (push) Has been cancelled
apple m / build (make, gfortran, 0, 1) (push) Has been cancelled
apple m / build (make, gfortran, 1, 0) (push) Has been cancelled
apple m / build (make, gfortran, 1, 1) (push) Has been cancelled
arm64 graviton cirun / build (cmake, gfortran) (push) Has been cancelled
arm64 graviton cirun / build (make, gfortran) (push) Has been cancelled
c910v qemu test / TEST (riscv64-linux-gnu, NO_SHARED=1 TARGET=C910V, C910V, riscv64-unknown-linux-gnu) (push) Has been cancelled
c910v qemu test / TEST (riscv64-linux-gnu, NO_SHARED=1 TARGET=RISCV64_GENERIC, RISCV64_GENERIC, riscv64-linux-gnu) (push) Has been cancelled
Run codspeed benchmarks / benchmarks (make, gfortran, ubuntu-22.04, 3.12) (push) Has been cancelled
Publish docs via GitHub Pages / Deploy docs (push) Has been cancelled
continuous build / build (cmake, clang, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, macos-latest) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang-21, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang-21, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, clang-21, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, gcc, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, gcc, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, gcc, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang, gfortran, macos-latest) (push) Has been cancelled
continuous build / build (make, clang, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, clang, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang-21, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang-21, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, clang-21, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, gcc, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, gcc, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, gcc, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / msys2 (None, fc, int32, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, CLANG64, mingw-w64-clang-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, MINGW32, mingw-w64-i686) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int64, -DBINARY=64 -DINTERFACE64=1, CLANG64, mingw-w64-clang-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int64, -DBINARY=64 -DINTERFACE64=1, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / cross_build (DYNAMIC_ARCH=1 TARGET=GENERIC, mips64el, mips64el-linux-gnuabi64) (push) Has been cancelled
continuous build / cross_build (TARGET=EV4, alpha, alpha-linux-gnu) (push) Has been cancelled
continuous build / cross_build (TARGET=MIPS1004K, mipsel, mipsel-linux-gnu) (push) Has been cancelled
continuous build / cross_build (TARGET=RISCV64_GENERIC, riscv64, riscv64-linux-gnu) (push) Has been cancelled
continuous build / neoverse_build (push) Has been cancelled
harmonyos / build (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=GENERIC, DYNAMIC_ARCH, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA264, LA264, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA464, LA464, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA64_GENERIC, LA64_GENERIC, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON2K1000, LOONGSON2K1000, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON3R5, LOONGSON3R5, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSONGENERIC, LOONGSONGENERIC, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=GENERIC, DYNAMIC_ARCH) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA264, LA264) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA464, LA464) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA64_GENERIC, LA64_GENERIC) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON2K1000, LOONGSON2K1000) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON3R5, LOONGSON3R5) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSONGENERIC, LOONGSONGENERIC) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=I6400, I6400, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=I6500, I6500, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=MIPS64_GENERIC, MIPS64_GENERIC, mips64el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=P6600, P6600, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=SICORTEX, SICORTEX, mips64el-linux-gnuabi64) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_GENERIC BINARY=64 ARCH=riscv64 DYNAMIC_ARCH=1, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64, DYNAMIC_ARCH=1) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_ZVL128B BINARY=64 ARCH=riscv64, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=128,elen=64, RISCV64_ZVL128B) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_ZVL256B BINARY=64 ARCH=riscv64 BUILD_BFLOAT16=1 BUILD_HFLOAT16=1, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64,zfh=true,zvfh=true,zvfbfwma=true, RISCV64_ZVL256B) (push) Has been cancelled
Windows ARM64 CI / build (push) Has been cancelled
Accelerate SVE128 SBGEMM/BGEMM
2026-03-06 09:14:44 +01:00
Fadi Arafeh
f30202b705 Accelerate SVE128 SBGEMM/BGEMM
This accelerates SBGEMM/BGEMM by extending the existing 8x4 kernel to 8x8 (unrolling N by 8)

Not sure if it's a good idea to delete the previous 8x4 kernel?

Here are the speedups on single core Neoverse-V2 (SVE128) compared to prev state:

Per-shape speedup
  M=N=K=64: SBGEMM 1.164x (16.42%), BGEMM 1.133x (13.30%)
  M=N=K=128: SBGEMM 1.220x (22.02%), BGEMM 1.186x (18.56%)
  M=N=K=256: SBGEMM 1.241x (24.08%), BGEMM 1.235x (23.54%)
  M=N=K=512: SBGEMM 1.240x (23.95%), BGEMM 1.227x (22.75%)
  M=N=K=1024: SBGEMM 1.251x (25.11%), BGEMM 1.232x (23.23%)
  M=N=K=2048: SBGEMM 1.235x (23.47%), BGEMM 1.246x (24.64%)

Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2026-03-05 13:50:07 +00:00
Martin Kroeker
df29cc0205 Use AVX2 in the tail loop too for consistent FMA rounding 2026-03-03 15:51:51 +01:00
Martin Kroeker
ef27ec6bed Add pragma to limit optimization level 2026-02-22 13:42:41 +01:00
Martin Kroeker
30cf14c548 Merge pull request #5640 from ChipKerchner/RVV_Narrow_Accumulate_FP16_GEMM
Some checks failed
apple m / build (cmake, gfortran, 0, 0) (push) Has been cancelled
apple m / build (cmake, gfortran, 0, 1) (push) Has been cancelled
apple m / build (cmake, gfortran, 1, 0) (push) Has been cancelled
apple m / build (cmake, gfortran, 1, 1) (push) Has been cancelled
apple m / build (make, gfortran, 0, 0) (push) Has been cancelled
apple m / build (make, gfortran, 0, 1) (push) Has been cancelled
apple m / build (make, gfortran, 1, 0) (push) Has been cancelled
apple m / build (make, gfortran, 1, 1) (push) Has been cancelled
arm64 graviton cirun / build (cmake, gfortran) (push) Has been cancelled
arm64 graviton cirun / build (make, gfortran) (push) Has been cancelled
c910v qemu test / TEST (riscv64-linux-gnu, NO_SHARED=1 TARGET=C910V, C910V, riscv64-unknown-linux-gnu) (push) Has been cancelled
c910v qemu test / TEST (riscv64-linux-gnu, NO_SHARED=1 TARGET=RISCV64_GENERIC, RISCV64_GENERIC, riscv64-linux-gnu) (push) Has been cancelled
Run codspeed benchmarks / benchmarks (make, gfortran, ubuntu-22.04, 3.12) (push) Has been cancelled
Publish docs via GitHub Pages / Deploy docs (push) Has been cancelled
continuous build / build (cmake, clang, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, macos-latest) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang-21, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang-21, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, clang-21, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, gcc, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, gcc, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, gcc, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang, gfortran, macos-latest) (push) Has been cancelled
continuous build / build (make, clang, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, clang, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang-21, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang-21, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, clang-21, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, gcc, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, gcc, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, gcc, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / msys2 (None, fc, int32, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, CLANG64, mingw-w64-clang-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, MINGW32, mingw-w64-i686) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int64, -DBINARY=64 -DINTERFACE64=1, CLANG64, mingw-w64-clang-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int64, -DBINARY=64 -DINTERFACE64=1, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / cross_build (DYNAMIC_ARCH=1 TARGET=GENERIC, mips64el, mips64el-linux-gnuabi64) (push) Has been cancelled
continuous build / cross_build (TARGET=EV4, alpha, alpha-linux-gnu) (push) Has been cancelled
continuous build / cross_build (TARGET=MIPS1004K, mipsel, mipsel-linux-gnu) (push) Has been cancelled
continuous build / cross_build (TARGET=RISCV64_GENERIC, riscv64, riscv64-linux-gnu) (push) Has been cancelled
continuous build / neoverse_build (push) Has been cancelled
harmonyos / build (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=GENERIC, DYNAMIC_ARCH, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA264, LA264, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA464, LA464, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA64_GENERIC, LA64_GENERIC, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON2K1000, LOONGSON2K1000, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON3R5, LOONGSON3R5, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSONGENERIC, LOONGSONGENERIC, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=GENERIC, DYNAMIC_ARCH) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA264, LA264) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA464, LA464) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA64_GENERIC, LA64_GENERIC) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON2K1000, LOONGSON2K1000) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON3R5, LOONGSON3R5) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSONGENERIC, LOONGSONGENERIC) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=I6400, I6400, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=I6500, I6500, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=MIPS64_GENERIC, MIPS64_GENERIC, mips64el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=P6600, P6600, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=SICORTEX, SICORTEX, mips64el-linux-gnuabi64) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_GENERIC BINARY=64 ARCH=riscv64 DYNAMIC_ARCH=1, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64, DYNAMIC_ARCH=1) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_ZVL128B BINARY=64 ARCH=riscv64, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=128,elen=64, RISCV64_ZVL128B) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_ZVL256B BINARY=64 ARCH=riscv64 BUILD_BFLOAT16=1 BUILD_HFLOAT16=1, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64,zfh=true,zvfh=true,zvfbfwma=true, RISCV64_ZVL256B) (push) Has been cancelled
Windows ARM64 CI / build (push) Has been cancelled
Added ability to accumulate in FP16.  Convert BF16 to FP32.  For FP16 and BF16 GEMM in RISC-V (BF16 now works for pre-RVA23)
2026-02-20 14:22:27 +01:00
Martin Kroeker
46b963b9a0 Use generic C kernels for SCAL on FreeBSD 2026-02-19 22:46:03 +01:00
Chip Kerchner
efe63e7970 Add pre-RVA23 to BF16 GEMM. 2026-02-15 15:49:59 +00:00
Chip Kerchner
1d6aa0dc31 Add dummy memsets - just in case. 2026-02-13 20:03:35 +00:00
Chip Kerchner
7a1d23400f Add flag for not converting A & B - will be used in future to do conversion during packing. 2026-02-13 19:00:41 +00:00
Chip Kerchner
1cc377ef61 Only convert B if M is greater or equal to 4. 2026-02-13 18:14:11 +00:00
Chip Kerchner
0acb60aab3 Conversion from BF16 to FP32 only once. 2026-02-13 17:55:15 +00:00
Chip Kerchner
9701a80a9f One small change. 2026-02-12 20:35:41 +00:00
Chip Kerchner
4121a22c02 Convert BF16 values once (and vectorized). 2026-02-12 18:45:39 +00:00
Chip Kerchner
33560437f5 Convert inputs from BF16 to FP32 and use FP32 vector madds. 18% faster. 2026-02-11 19:50:48 +00:00
Chip Kerchner
e3cb067bf4 Fixed MADD to use float16 values. Use LMUL = 2 in main loop. Now 1.85X faster on BananaPi. 2026-02-11 00:27:27 +00:00
Chip Kerchner
74d9fe2832 Forget to add defintion. 2026-02-10 19:00:26 +00:00
Chip Kerchner
aa1cebd45b 128-bit versions. 2026-02-10 18:30:02 +00:00
Chip Kerchner
b5f2a50fe9 Added ability to accumulate in FP16 for GEMM. Widens once at the end of loops. 2026-02-10 17:30:05 +00:00
Martin Kroeker
69d92490c1 move inclusion of sme_abi header into the conditional section 2026-01-29 22:24:00 +01:00
Martin Kroeker
601bdde8ec fix stack location of dummy2 flag 2026-01-27 22:40:50 +01:00
Martin Kroeker
d53d2b11a9 fix stack location of dummy2 flag 2026-01-27 22:39:37 +01:00
Martin Kroeker
861b3db733 Reuse ?SUM kernels from ThunderX2T99 2026-01-20 15:42:09 +01:00
Martin Kroeker
71261a7b3f Trivially derive optimized S/DSUM for existing SASUM/DASUM kernels 2026-01-20 15:38:50 +01:00
Jameson Nash
a18a4ee08a arm64: fix clang ICE on Windows for zdot_thunderx2t99.c
Guard .align directive to avoid internal compiler error on
AArch64 Windows with clang.

See: https://github.com/llvm/llvm-project/issues/149547
See: #5076

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 15:36:14 +00:00
Martin Kroeker
4001d7a74f Increase LA464/16MB DGEMM_R for minimal spacing of 64 to MAX(p,q) 2026-01-17 20:53:51 +01:00
Martin Kroeker
7acf919836 typo 2026-01-15 00:03:24 +01:00
Martin Kroeker
d49df4c579 force linking to clang_rt_builtins when using LLVM for AppleM4 2026-01-14 23:08:58 +01:00
Martin Kroeker
88c583ed49 Update Makefile 2026-01-14 17:00:45 +01:00
Martin Kroeker
d3e4b41136 remove cpu=apple-m4 as not required and less portable 2026-01-14 13:55:16 +01:00
Martin Kroeker
31bb6ca7df Apple Clang requires +sme in the arch string for M4 2026-01-13 21:10:07 +01:00
Martin Kroeker
aafd3cb0db Merge branch 'OpenMathLib:develop' into issue5414 2026-01-12 00:51:25 +01:00
Martin Kroeker
4d08156266 Use the generic C kernel for DNRM2 2026-01-11 21:58:31 +01:00
Martin Kroeker
6de062cfc2 Merge branch 'OpenMathLib:develop' into issue5414 2026-01-11 17:45:11 +01:00
Martin Kroeker
d1de282a4e Improve the precision of S/CNRM2 by summing in double precision 2026-01-11 13:04:00 +01:00
Martin Kroeker
a9a6edaf17 Adapt for DYNAMIC_ARCH with multiple ...preprocess symbols 2026-01-09 15:29:36 +01:00
Martin Kroeker
2d46f1ec65 Merge branch 'develop' into issue5414 2026-01-09 15:04:06 +01:00
Martin Kroeker
c040d5ed86 Merge pull request #5591 from quic/topic/ssyr2k_direct_sme1
Some checks failed
apple m / build (cmake, gfortran, 0, 0) (push) Has been cancelled
apple m / build (cmake, gfortran, 0, 1) (push) Has been cancelled
apple m / build (cmake, gfortran, 1, 0) (push) Has been cancelled
apple m / build (cmake, gfortran, 1, 1) (push) Has been cancelled
apple m / build (make, gfortran, 0, 0) (push) Has been cancelled
apple m / build (make, gfortran, 0, 1) (push) Has been cancelled
apple m / build (make, gfortran, 1, 0) (push) Has been cancelled
apple m / build (make, gfortran, 1, 1) (push) Has been cancelled
arm64 graviton cirun / build (cmake, gfortran) (push) Has been cancelled
arm64 graviton cirun / build (make, gfortran) (push) Has been cancelled
c910v qemu test / TEST (riscv64-linux-gnu, NO_SHARED=1 TARGET=C910V, C910V, riscv64-unknown-linux-gnu) (push) Has been cancelled
c910v qemu test / TEST (riscv64-linux-gnu, NO_SHARED=1 TARGET=RISCV64_GENERIC, RISCV64_GENERIC, riscv64-linux-gnu) (push) Has been cancelled
Run codspeed benchmarks / benchmarks (make, gfortran, ubuntu-22.04, 3.12) (push) Has been cancelled
Publish docs via GitHub Pages / Deploy docs (push) Has been cancelled
continuous build / build (cmake, clang, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, macos-latest) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang-21, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang-21, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, clang-21, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, gcc, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, gcc, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, gcc, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang, gfortran, macos-latest) (push) Has been cancelled
continuous build / build (make, clang, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, clang, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang-21, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang-21, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, clang-21, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, gcc, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, gcc, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, gcc, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / msys2 (None, fc, int32, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, CLANG64, mingw-w64-clang-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, MINGW32, mingw-w64-i686) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int64, -DBINARY=64 -DINTERFACE64=1, CLANG64, mingw-w64-clang-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int64, -DBINARY=64 -DINTERFACE64=1, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / cross_build (DYNAMIC_ARCH=1 TARGET=GENERIC, mips64el, mips64el-linux-gnuabi64) (push) Has been cancelled
continuous build / cross_build (TARGET=EV4, alpha, alpha-linux-gnu) (push) Has been cancelled
continuous build / cross_build (TARGET=MIPS1004K, mipsel, mipsel-linux-gnu) (push) Has been cancelled
continuous build / cross_build (TARGET=RISCV64_GENERIC, riscv64, riscv64-linux-gnu) (push) Has been cancelled
continuous build / neoverse_build (push) Has been cancelled
harmonyos / build (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=GENERIC, DYNAMIC_ARCH, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA264, LA264, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA464, LA464, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA64_GENERIC, LA64_GENERIC, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON2K1000, LOONGSON2K1000, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON3R5, LOONGSON3R5, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSONGENERIC, LOONGSONGENERIC, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=GENERIC, DYNAMIC_ARCH) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA264, LA264) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA464, LA464) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA64_GENERIC, LA64_GENERIC) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON2K1000, LOONGSON2K1000) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON3R5, LOONGSON3R5) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSONGENERIC, LOONGSONGENERIC) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=I6400, I6400, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=I6500, I6500, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=MIPS64_GENERIC, MIPS64_GENERIC, mips64el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=P6600, P6600, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=SICORTEX, SICORTEX, mips64el-linux-gnuabi64) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_GENERIC BINARY=64 ARCH=riscv64 DYNAMIC_ARCH=1, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64, DYNAMIC_ARCH=1) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_ZVL128B BINARY=64 ARCH=riscv64, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=128,elen=64, RISCV64_ZVL128B) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_ZVL256B BINARY=64 ARCH=riscv64 BUILD_BFLOAT16=1 BUILD_HFLOAT16=1, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64,zfh=true,zvfh=true,zvfbfwma=true, RISCV64_ZVL256B) (push) Has been cancelled
Windows ARM64 CI / build (push) Has been cancelled
Nightly-Homebrew-Build / build-OpenBLAS-with-Homebrew (push) Has been cancelled
Support for SME1 based ssyr2k_direct kernel for cblas_ssyr2k level 3 API
2026-01-08 15:47:38 +01:00
Zhiqing xie
6939a43c3b Support for SME1 based ssyr2k_direct kernel for cblas_ssyr2k level 3 API 2026-01-08 11:09:04 +08:00