Commit Graph

2693 Commits

Author SHA1 Message Date
Martin Kroeker
37262654d9 Merge pull request #5667 from fadara01/accelerate_sve128_sbgemm
Some checks failed
apple m / build (cmake, gfortran, 0, 0) (push) Has been cancelled
apple m / build (cmake, gfortran, 0, 1) (push) Has been cancelled
apple m / build (cmake, gfortran, 1, 0) (push) Has been cancelled
apple m / build (cmake, gfortran, 1, 1) (push) Has been cancelled
apple m / build (make, gfortran, 0, 0) (push) Has been cancelled
apple m / build (make, gfortran, 0, 1) (push) Has been cancelled
apple m / build (make, gfortran, 1, 0) (push) Has been cancelled
apple m / build (make, gfortran, 1, 1) (push) Has been cancelled
arm64 graviton cirun / build (cmake, gfortran) (push) Has been cancelled
arm64 graviton cirun / build (make, gfortran) (push) Has been cancelled
c910v qemu test / TEST (riscv64-linux-gnu, NO_SHARED=1 TARGET=C910V, C910V, riscv64-unknown-linux-gnu) (push) Has been cancelled
c910v qemu test / TEST (riscv64-linux-gnu, NO_SHARED=1 TARGET=RISCV64_GENERIC, RISCV64_GENERIC, riscv64-linux-gnu) (push) Has been cancelled
Run codspeed benchmarks / benchmarks (make, gfortran, ubuntu-22.04, 3.12) (push) Has been cancelled
Publish docs via GitHub Pages / Deploy docs (push) Has been cancelled
continuous build / build (cmake, clang, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, macos-latest) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang-21, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang-21, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, clang-21, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, gcc, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, gcc, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, gcc, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang, gfortran, macos-latest) (push) Has been cancelled
continuous build / build (make, clang, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, clang, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang-21, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang-21, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, clang-21, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, gcc, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, gcc, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, gcc, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / msys2 (None, fc, int32, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, CLANG64, mingw-w64-clang-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, MINGW32, mingw-w64-i686) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int64, -DBINARY=64 -DINTERFACE64=1, CLANG64, mingw-w64-clang-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int64, -DBINARY=64 -DINTERFACE64=1, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / cross_build (DYNAMIC_ARCH=1 TARGET=GENERIC, mips64el, mips64el-linux-gnuabi64) (push) Has been cancelled
continuous build / cross_build (TARGET=EV4, alpha, alpha-linux-gnu) (push) Has been cancelled
continuous build / cross_build (TARGET=MIPS1004K, mipsel, mipsel-linux-gnu) (push) Has been cancelled
continuous build / cross_build (TARGET=RISCV64_GENERIC, riscv64, riscv64-linux-gnu) (push) Has been cancelled
continuous build / neoverse_build (push) Has been cancelled
harmonyos / build (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=GENERIC, DYNAMIC_ARCH, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA264, LA264, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA464, LA464, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA64_GENERIC, LA64_GENERIC, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON2K1000, LOONGSON2K1000, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON3R5, LOONGSON3R5, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSONGENERIC, LOONGSONGENERIC, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=GENERIC, DYNAMIC_ARCH) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA264, LA264) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA464, LA464) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA64_GENERIC, LA64_GENERIC) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON2K1000, LOONGSON2K1000) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON3R5, LOONGSON3R5) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSONGENERIC, LOONGSONGENERIC) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=I6400, I6400, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=I6500, I6500, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=MIPS64_GENERIC, MIPS64_GENERIC, mips64el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=P6600, P6600, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=SICORTEX, SICORTEX, mips64el-linux-gnuabi64) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_GENERIC BINARY=64 ARCH=riscv64 DYNAMIC_ARCH=1, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64, DYNAMIC_ARCH=1) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_ZVL128B BINARY=64 ARCH=riscv64, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=128,elen=64, RISCV64_ZVL128B) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_ZVL256B BINARY=64 ARCH=riscv64 BUILD_BFLOAT16=1 BUILD_HFLOAT16=1, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64,zfh=true,zvfh=true,zvfbfwma=true, RISCV64_ZVL256B) (push) Has been cancelled
Windows ARM64 CI / build (push) Has been cancelled
Accelerate SVE128 SBGEMM/BGEMM
2026-03-06 09:14:44 +01:00
Fadi Arafeh
f30202b705 Accelerate SVE128 SBGEMM/BGEMM
This accelerates SBGEMM/BGEMM by extending the existing 8x4 kernel to 8x8 (unrolling N by 8)

Not sure if it's a good idea to delete the previous 8x4 kernel?

Here are the speedups on single core Neoverse-V2 (SVE128) compared to prev state:

Per-shape speedup
  M=N=K=64: SBGEMM 1.164x (16.42%), BGEMM 1.133x (13.30%)
  M=N=K=128: SBGEMM 1.220x (22.02%), BGEMM 1.186x (18.56%)
  M=N=K=256: SBGEMM 1.241x (24.08%), BGEMM 1.235x (23.54%)
  M=N=K=512: SBGEMM 1.240x (23.95%), BGEMM 1.227x (22.75%)
  M=N=K=1024: SBGEMM 1.251x (25.11%), BGEMM 1.232x (23.23%)
  M=N=K=2048: SBGEMM 1.235x (23.47%), BGEMM 1.246x (24.64%)

Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2026-03-05 13:50:07 +00:00
Martin Kroeker
df29cc0205 Use AVX2 in the tail loop too for consistent FMA rounding 2026-03-03 15:51:51 +01:00
Martin Kroeker
ef27ec6bed Add pragma to limit optimization level 2026-02-22 13:42:41 +01:00
Martin Kroeker
30cf14c548 Merge pull request #5640 from ChipKerchner/RVV_Narrow_Accumulate_FP16_GEMM
Some checks failed
apple m / build (cmake, gfortran, 0, 0) (push) Has been cancelled
apple m / build (cmake, gfortran, 0, 1) (push) Has been cancelled
apple m / build (cmake, gfortran, 1, 0) (push) Has been cancelled
apple m / build (cmake, gfortran, 1, 1) (push) Has been cancelled
apple m / build (make, gfortran, 0, 0) (push) Has been cancelled
apple m / build (make, gfortran, 0, 1) (push) Has been cancelled
apple m / build (make, gfortran, 1, 0) (push) Has been cancelled
apple m / build (make, gfortran, 1, 1) (push) Has been cancelled
arm64 graviton cirun / build (cmake, gfortran) (push) Has been cancelled
arm64 graviton cirun / build (make, gfortran) (push) Has been cancelled
c910v qemu test / TEST (riscv64-linux-gnu, NO_SHARED=1 TARGET=C910V, C910V, riscv64-unknown-linux-gnu) (push) Has been cancelled
c910v qemu test / TEST (riscv64-linux-gnu, NO_SHARED=1 TARGET=RISCV64_GENERIC, RISCV64_GENERIC, riscv64-linux-gnu) (push) Has been cancelled
Run codspeed benchmarks / benchmarks (make, gfortran, ubuntu-22.04, 3.12) (push) Has been cancelled
Publish docs via GitHub Pages / Deploy docs (push) Has been cancelled
continuous build / build (cmake, clang, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, macos-latest) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang-21, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang-21, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, clang-21, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, gcc, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, gcc, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, gcc, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang, gfortran, macos-latest) (push) Has been cancelled
continuous build / build (make, clang, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, clang, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang-21, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang-21, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, clang-21, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, gcc, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, gcc, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, gcc, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / msys2 (None, fc, int32, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, CLANG64, mingw-w64-clang-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, MINGW32, mingw-w64-i686) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int64, -DBINARY=64 -DINTERFACE64=1, CLANG64, mingw-w64-clang-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int64, -DBINARY=64 -DINTERFACE64=1, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / cross_build (DYNAMIC_ARCH=1 TARGET=GENERIC, mips64el, mips64el-linux-gnuabi64) (push) Has been cancelled
continuous build / cross_build (TARGET=EV4, alpha, alpha-linux-gnu) (push) Has been cancelled
continuous build / cross_build (TARGET=MIPS1004K, mipsel, mipsel-linux-gnu) (push) Has been cancelled
continuous build / cross_build (TARGET=RISCV64_GENERIC, riscv64, riscv64-linux-gnu) (push) Has been cancelled
continuous build / neoverse_build (push) Has been cancelled
harmonyos / build (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=GENERIC, DYNAMIC_ARCH, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA264, LA264, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA464, LA464, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA64_GENERIC, LA64_GENERIC, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON2K1000, LOONGSON2K1000, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON3R5, LOONGSON3R5, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSONGENERIC, LOONGSONGENERIC, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=GENERIC, DYNAMIC_ARCH) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA264, LA264) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA464, LA464) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA64_GENERIC, LA64_GENERIC) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON2K1000, LOONGSON2K1000) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON3R5, LOONGSON3R5) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSONGENERIC, LOONGSONGENERIC) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=I6400, I6400, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=I6500, I6500, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=MIPS64_GENERIC, MIPS64_GENERIC, mips64el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=P6600, P6600, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=SICORTEX, SICORTEX, mips64el-linux-gnuabi64) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_GENERIC BINARY=64 ARCH=riscv64 DYNAMIC_ARCH=1, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64, DYNAMIC_ARCH=1) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_ZVL128B BINARY=64 ARCH=riscv64, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=128,elen=64, RISCV64_ZVL128B) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_ZVL256B BINARY=64 ARCH=riscv64 BUILD_BFLOAT16=1 BUILD_HFLOAT16=1, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64,zfh=true,zvfh=true,zvfbfwma=true, RISCV64_ZVL256B) (push) Has been cancelled
Windows ARM64 CI / build (push) Has been cancelled
Added ability to accumulate in FP16.  Convert BF16 to FP32.  For FP16 and BF16 GEMM in RISC-V (BF16 now works for pre-RVA23)
2026-02-20 14:22:27 +01:00
Martin Kroeker
46b963b9a0 Use generic C kernels for SCAL on FreeBSD 2026-02-19 22:46:03 +01:00
Chip Kerchner
efe63e7970 Add pre-RVA23 to BF16 GEMM. 2026-02-15 15:49:59 +00:00
Chip Kerchner
1d6aa0dc31 Add dummy memsets - just in case. 2026-02-13 20:03:35 +00:00
Chip Kerchner
7a1d23400f Add flag for not converting A & B - will be used in future to do conversion during packing. 2026-02-13 19:00:41 +00:00
Chip Kerchner
1cc377ef61 Only convert B if M is greater or equal to 4. 2026-02-13 18:14:11 +00:00
Chip Kerchner
0acb60aab3 Conversion from BF16 to FP32 only once. 2026-02-13 17:55:15 +00:00
Chip Kerchner
9701a80a9f One small change. 2026-02-12 20:35:41 +00:00
Chip Kerchner
4121a22c02 Convert BF16 values once (and vectorized). 2026-02-12 18:45:39 +00:00
Chip Kerchner
33560437f5 Convert inputs from BF16 to FP32 and use FP32 vector madds. 18% faster. 2026-02-11 19:50:48 +00:00
Chip Kerchner
e3cb067bf4 Fixed MADD to use float16 values. Use LMUL = 2 in main loop. Now 1.85X faster on BananaPi. 2026-02-11 00:27:27 +00:00
Chip Kerchner
74d9fe2832 Forget to add defintion. 2026-02-10 19:00:26 +00:00
Chip Kerchner
aa1cebd45b 128-bit versions. 2026-02-10 18:30:02 +00:00
Chip Kerchner
b5f2a50fe9 Added ability to accumulate in FP16 for GEMM. Widens once at the end of loops. 2026-02-10 17:30:05 +00:00
Martin Kroeker
69d92490c1 move inclusion of sme_abi header into the conditional section 2026-01-29 22:24:00 +01:00
Martin Kroeker
601bdde8ec fix stack location of dummy2 flag 2026-01-27 22:40:50 +01:00
Martin Kroeker
d53d2b11a9 fix stack location of dummy2 flag 2026-01-27 22:39:37 +01:00
Martin Kroeker
861b3db733 Reuse ?SUM kernels from ThunderX2T99 2026-01-20 15:42:09 +01:00
Martin Kroeker
71261a7b3f Trivially derive optimized S/DSUM for existing SASUM/DASUM kernels 2026-01-20 15:38:50 +01:00
Jameson Nash
a18a4ee08a arm64: fix clang ICE on Windows for zdot_thunderx2t99.c
Guard .align directive to avoid internal compiler error on
AArch64 Windows with clang.

See: https://github.com/llvm/llvm-project/issues/149547
See: #5076

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 15:36:14 +00:00
Martin Kroeker
4001d7a74f Increase LA464/16MB DGEMM_R for minimal spacing of 64 to MAX(p,q) 2026-01-17 20:53:51 +01:00
Martin Kroeker
7acf919836 typo 2026-01-15 00:03:24 +01:00
Martin Kroeker
d49df4c579 force linking to clang_rt_builtins when using LLVM for AppleM4 2026-01-14 23:08:58 +01:00
Martin Kroeker
88c583ed49 Update Makefile 2026-01-14 17:00:45 +01:00
Martin Kroeker
d3e4b41136 remove cpu=apple-m4 as not required and less portable 2026-01-14 13:55:16 +01:00
Martin Kroeker
31bb6ca7df Apple Clang requires +sme in the arch string for M4 2026-01-13 21:10:07 +01:00
Martin Kroeker
aafd3cb0db Merge branch 'OpenMathLib:develop' into issue5414 2026-01-12 00:51:25 +01:00
Martin Kroeker
4d08156266 Use the generic C kernel for DNRM2 2026-01-11 21:58:31 +01:00
Martin Kroeker
6de062cfc2 Merge branch 'OpenMathLib:develop' into issue5414 2026-01-11 17:45:11 +01:00
Martin Kroeker
d1de282a4e Improve the precision of S/CNRM2 by summing in double precision 2026-01-11 13:04:00 +01:00
Martin Kroeker
a9a6edaf17 Adapt for DYNAMIC_ARCH with multiple ...preprocess symbols 2026-01-09 15:29:36 +01:00
Martin Kroeker
2d46f1ec65 Merge branch 'develop' into issue5414 2026-01-09 15:04:06 +01:00
Martin Kroeker
c040d5ed86 Merge pull request #5591 from quic/topic/ssyr2k_direct_sme1
Some checks failed
apple m / build (cmake, gfortran, 0, 0) (push) Has been cancelled
apple m / build (cmake, gfortran, 0, 1) (push) Has been cancelled
apple m / build (cmake, gfortran, 1, 0) (push) Has been cancelled
apple m / build (cmake, gfortran, 1, 1) (push) Has been cancelled
apple m / build (make, gfortran, 0, 0) (push) Has been cancelled
apple m / build (make, gfortran, 0, 1) (push) Has been cancelled
apple m / build (make, gfortran, 1, 0) (push) Has been cancelled
apple m / build (make, gfortran, 1, 1) (push) Has been cancelled
arm64 graviton cirun / build (cmake, gfortran) (push) Has been cancelled
arm64 graviton cirun / build (make, gfortran) (push) Has been cancelled
c910v qemu test / TEST (riscv64-linux-gnu, NO_SHARED=1 TARGET=C910V, C910V, riscv64-unknown-linux-gnu) (push) Has been cancelled
c910v qemu test / TEST (riscv64-linux-gnu, NO_SHARED=1 TARGET=RISCV64_GENERIC, RISCV64_GENERIC, riscv64-linux-gnu) (push) Has been cancelled
Run codspeed benchmarks / benchmarks (make, gfortran, ubuntu-22.04, 3.12) (push) Has been cancelled
Publish docs via GitHub Pages / Deploy docs (push) Has been cancelled
continuous build / build (cmake, clang, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, macos-latest) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang-21, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang-21, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, clang-21, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, gcc, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, gcc, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, gcc, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang, gfortran, macos-latest) (push) Has been cancelled
continuous build / build (make, clang, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, clang, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang-21, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang-21, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, clang-21, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, gcc, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, gcc, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, gcc, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / msys2 (None, fc, int32, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, CLANG64, mingw-w64-clang-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, MINGW32, mingw-w64-i686) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int64, -DBINARY=64 -DINTERFACE64=1, CLANG64, mingw-w64-clang-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int64, -DBINARY=64 -DINTERFACE64=1, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / cross_build (DYNAMIC_ARCH=1 TARGET=GENERIC, mips64el, mips64el-linux-gnuabi64) (push) Has been cancelled
continuous build / cross_build (TARGET=EV4, alpha, alpha-linux-gnu) (push) Has been cancelled
continuous build / cross_build (TARGET=MIPS1004K, mipsel, mipsel-linux-gnu) (push) Has been cancelled
continuous build / cross_build (TARGET=RISCV64_GENERIC, riscv64, riscv64-linux-gnu) (push) Has been cancelled
continuous build / neoverse_build (push) Has been cancelled
harmonyos / build (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=GENERIC, DYNAMIC_ARCH, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA264, LA264, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA464, LA464, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA64_GENERIC, LA64_GENERIC, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON2K1000, LOONGSON2K1000, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON3R5, LOONGSON3R5, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSONGENERIC, LOONGSONGENERIC, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=GENERIC, DYNAMIC_ARCH) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA264, LA264) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA464, LA464) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA64_GENERIC, LA64_GENERIC) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON2K1000, LOONGSON2K1000) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON3R5, LOONGSON3R5) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSONGENERIC, LOONGSONGENERIC) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=I6400, I6400, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=I6500, I6500, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=MIPS64_GENERIC, MIPS64_GENERIC, mips64el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=P6600, P6600, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=SICORTEX, SICORTEX, mips64el-linux-gnuabi64) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_GENERIC BINARY=64 ARCH=riscv64 DYNAMIC_ARCH=1, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64, DYNAMIC_ARCH=1) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_ZVL128B BINARY=64 ARCH=riscv64, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=128,elen=64, RISCV64_ZVL128B) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_ZVL256B BINARY=64 ARCH=riscv64 BUILD_BFLOAT16=1 BUILD_HFLOAT16=1, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64,zfh=true,zvfh=true,zvfbfwma=true, RISCV64_ZVL256B) (push) Has been cancelled
Windows ARM64 CI / build (push) Has been cancelled
Nightly-Homebrew-Build / build-OpenBLAS-with-Homebrew (push) Has been cancelled
Support for SME1 based ssyr2k_direct kernel for cblas_ssyr2k level 3 API
2026-01-08 15:47:38 +01:00
Zhiqing xie
6939a43c3b Support for SME1 based ssyr2k_direct kernel for cblas_ssyr2k level 3 API 2026-01-08 11:09:04 +08:00
Amrita H S
b53d18b3ad Fixing warning messages in dgemm and dgemv kernels
Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2026-01-06 10:20:56 -06:00
Martin Kroeker
a18a53605e Adjust M4 options to avoid unresolved reference with non-Apple LLVM 2026-01-05 19:10:52 +01:00
Rajalakshmi Srinivasaraghavan
2283fcbbe7 POWER10: Reduce sgemm loop unrolling
With GCC 14, unnecessary move and lxvp instructions appear when unrolling the inner loop for larger sizes.
Reducing the loop unroll factor restores performance to GCC 11.
2026-01-04 17:01:01 -06:00
Martin Kroeker
67fd33e729 syntax fix 2025-12-31 19:46:20 +01:00
Martin Kroeker
5c8cf37d83 Add workaround for current LLVM SME bug on Windows 2025-12-31 15:33:15 +01:00
Martin Kroeker
d39b77748f Make .align conditional on not being on WoA and strip CRLF endings 2025-12-24 20:00:45 +01:00
Martin Kroeker
ac2c66321d remove special handling of C/ZDOT for LLVM on WoA 2025-12-19 17:04:21 +01:00
Martin Kroeker
cfa28bcf71 Support compilation with LLVM for Windows on Arm 2025-12-19 17:00:47 +01:00
Martin Kroeker
e85efb8d86 remove za from clobber lists 2025-12-03 22:40:02 +01:00
pengxu
f6533ccea0 Fix floating point registers ld/st bug of Loongarch 2025-12-03 10:52:59 +08:00
Martin Kroeker
c3c857c95e fix sequence 2025-11-24 22:38:49 +01:00
Martin Kroeker
7ab8dc125d rework ARM64 SME dependency handling 2025-11-24 22:36:02 +01:00