Files
OpenBLAS/kernel
Fadi Arafeh f30202b705 Accelerate SVE128 SBGEMM/BGEMM
This accelerates SBGEMM/BGEMM by extending the existing 8x4 kernel to 8x8 (unrolling N by 8)

Not sure if it's a good idea to delete the previous 8x4 kernel?

Here are the speedups on single core Neoverse-V2 (SVE128) compared to prev state:

Per-shape speedup
  M=N=K=64: SBGEMM 1.164x (16.42%), BGEMM 1.133x (13.30%)
  M=N=K=128: SBGEMM 1.220x (22.02%), BGEMM 1.186x (18.56%)
  M=N=K=256: SBGEMM 1.241x (24.08%), BGEMM 1.235x (23.54%)
  M=N=K=512: SBGEMM 1.240x (23.95%), BGEMM 1.227x (22.75%)
  M=N=K=1024: SBGEMM 1.251x (25.11%), BGEMM 1.232x (23.23%)
  M=N=K=2048: SBGEMM 1.235x (23.47%), BGEMM 1.246x (24.64%)

Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2026-03-05 13:50:07 +00:00
..
2026-03-05 13:50:07 +00:00
2025-10-07 15:03:24 +00:00
2025-06-13 13:37:15 +02:00
2026-02-15 15:49:59 +00:00
2020-11-12 17:35:17 +08:00
2026-01-11 21:58:31 +01:00
2025-05-25 14:47:06 -07:00
2026-01-15 00:03:24 +01:00
2025-07-15 14:48:57 +01:00
2025-10-07 15:03:24 +00:00