Commit Graph

328 Commits

Author SHA1 Message Date
Martin Kroeker
d7b0fccbb4 Enable SME-based kernels for VortexM4 with clang-based compilers only 2025-10-19 13:34:26 -07:00
Martin Kroeker
9bfc3612f9 Merge branch 'OpenMathLib:develop' into issue5414 2025-10-12 09:18:06 -07:00
Martin Kroeker
e40714cabd Merge pull request #5450 from quic/topic/strmm_direct_sme1
Support for SME1 based strmm_direct kernel for cblas_strmm level 3 API
2025-10-11 15:20:19 -07:00
changjua
644ea07ef9 Support for SME1 based strmm_direct kernel for cblas_strmm level 3 API 2025-10-10 10:48:27 +08:00
Martin Kroeker
fc516af155 Merge branch 'develop' into issue5414 2025-10-01 14:12:59 -07:00
Martin Kroeker
e939c6c315 Merge pull request #5471 from quic/topic/ssymm_direct_sme1
Support for SME1 based ssymm_direct kernel for cblas_ssymm level 3 API
2025-10-01 06:22:36 -07:00
Rajendra Prasad Matcha
19268471cc Support for SME1 based ssymm_direct kernel for cblas_ssymm level 3 API 2025-09-30 15:05:33 +05:30
Chip Kerchner
92f09a6a98 Add BF16 sbgemm on RISCV. 2025-09-22 14:32:43 +00:00
Martin Kroeker
cb6c4392a5 Make GEMM3M parameters available on 32bit X86-GENERIC 2025-09-10 22:44:14 +02:00
Martin Kroeker
202a7a0e2a Separate VORTEXM4 from VORTEX and ARMV9SME 2025-08-18 01:45:40 -07:00
Chris Sidebottom
114316f361 Optimize SBGEMM / BGEMM for NEOVERSEV1 further
This changes the kernels to pack full SVE vectors and reduces the
overall complexity of the inner GEMM loop.
2025-08-11 09:25:13 +00:00
Masato Nakagawa
7e29f11396 Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1) 2025-07-29 18:54:36 +09:00
Martin Kroeker
c504aedca1 Merge pull request #5400 from Mousius/neoversev2-target
Add NEOVERSEV2 target support
2025-07-25 15:47:06 +02:00
Chris Sidebottom
87247daadc Add NEOVERSEV2 target support
Did a quick run around to make `TARGET=NEVOERSEV2` build successfully.

Fixes #5385
2025-07-24 12:40:31 +01:00
Chris Sidebottom
ea2faf0c9a Add optimized BGEMM for NEOVERSEN2 target
This re-uses the existing NEOVERSEN2 8x4 `sbgemm` kernel to implement `bgemm`.
2025-07-24 10:59:28 +00:00
Chris Sidebottom
740efd71c4 Add optimized BGEMM kernel for NEOVERSEV1 target
This also improves the testing and generic kernel by re-using the BF16
conversion functions.

Built on top of https://github.com/OpenMathLib/OpenBLAS/pull/5357 and derived from https://github.com/OpenMathLib/OpenBLAS/pull/5287

Co-authored-by: Ye Tao <ye.tao@arm.com>
2025-07-10 23:23:27 +00:00
Chris Sidebottom
f95e7b0e32 Add infrastructure for BGEMM
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.

Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287

Co-authored-by: Ye Tao <ye.tao@arm.com>
2025-07-08 16:22:41 +01:00
Masato Nakagawa
5253c8f165 Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for
A64FX.
2025-06-30 21:35:16 +09:00
h-motoki
bba75d5e45 GEMM_PREFERED_SIZE parameter has been changed for A64FX. 2025-06-27 19:37:36 +09:00
Martin Kroeker
d96daa220d Merge pull request #5290 from Srangrang/develop
Add support for FP16 to openBLAS and shgemm on RISCV
2025-06-24 23:10:15 +02:00
davidz-ampere
aa90ab4142 Add support for Ampere AmpereOne processors 2025-06-24 00:12:34 -04:00
davidz-ampere
be68ef03b4 Add support for Ampere processors 2025-06-15 22:00:40 -04:00
gkdddd
670ec6f757 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B
Added HFLOAT16 support for RISCV64
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16
The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0

Related to issue #5279
Co-authored-by Linjin Li <linjin_li@163.com>
2025-06-03 20:14:30 +08:00
Srangrang
0a967797a1 Add FP16 support for RISCV 2025-05-27 14:34:57 +08:00
Martin Kroeker
a34b487f22 Remove spurious cast from Alpha and Cell's DEFAULT_ALIGN 2025-04-09 17:25:46 +02:00
Vaisakh K V
f66ca05b31 Merge branch 'develop' into topic/sgemm_direct_sme1 2025-02-13 14:54:37 +05:30
Vaisakh K V
d23eb3b93e Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
2025-02-13 14:51:21 +05:30
Ye Tao
c748e6a338 optimized sbgemm kernel for neoverse-v1 (sve-256)
Signed-off-by: Ye Tao <ye.tao@arm.com>
2025-02-05 10:06:37 +00:00
Aditya Tewari
4379a6fbe3 * checkpoint sbgemm for SVE-256 2025-02-03 12:49:49 +00:00
Martin Kroeker
926e56e389 Align GEMM3M parameters for GENERIC with ZGEMM and add P/Q/R 2024-11-14 14:04:25 -08:00
Martin Kroeker
a47b3c8867 Fix unroll parameter selection for MIPS64_GENERIC 2024-10-13 22:54:34 +02:00
Martin Kroeker
7c4f3638fd switch PPCG4 SGEMM kernel to 4x4 2024-10-03 22:00:15 +02:00
gxw
48698b2b1d LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Chip Kerchner
b1737698db Fix DEFAULTS in SBGEMM for POWER10. Also comparisons for SBGEMM unit test can be exactly due to epilison differences. 2024-08-13 07:01:21 -05:00
Piotr Kubaj
4c12090776 Fix build on FreeBSD/powerpc64* 2024-07-10 22:21:48 +00:00
gxw
6017ad7146 loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6 2024-05-08 10:10:26 +08:00
Usui, Tetsuzo
ca673ca774 Add GEMM_PREFERED_SIZE parameter for Neoverse V1 2024-04-12 17:21:14 +09:00
Martin Kroeker
93d975d8fd Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
loongarch: Optimizing the performance of the GEMM on servers
2024-04-10 14:23:31 +02:00
gxw
d8c4ea8793 loongarch: Optimizing the performance of the GEMM on servers 2024-04-09 09:03:34 -04:00
Martin Kroeker
ba6d485102 Adjust SWITCH_RATIO for ZEN and apply GEMM_PREFERRED_SIZE 2024-04-04 18:52:38 +02:00
Martin Kroeker
584e87661d set SWITCH_RATIO for Cortex-A76 2024-04-02 23:10:45 +02:00
Martin Kroeker
b925f61fb0 Add support for Cortex-A76 2024-04-02 19:44:17 +02:00
Rajalakshmi Srinivasaraghavan
f5b2a877e2 POWER9: Use default param values from POWER8 on AIX
AIX uses KERNEL.POWER8 optimization on POWER9 and changing
the default GEMM parameters in param.h to use POWER8 values
on POWER9.
2024-03-20 10:17:49 -05:00
pengxu
4787a55c64 Optimized cgemm kernel 16x4 LASX for LoongArch 2024-02-21 15:28:47 +08:00
pengxu
fe3da43b7d Optimized zgemm kernel 8*4 LASX, 4*4 LSX and cgemm kernel 8*4 LSX for LoongArch 2024-02-06 11:49:01 +08:00
Martin Kroeker
e5d2725e5a Merge pull request #4185 from XiWeiGu/mips_enable_msa
MIPS: Enable MSA
2024-02-05 15:50:16 +01:00
Sergei Lewis
1093def0d1 Merge branch 'risc-v' into develop 2024-01-29 11:11:39 +00:00
Martin Kroeker
889c5d026a Merge pull request #4456 from kseniyazaytseva/riscv-rvv10
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
2024-01-26 13:31:09 +01:00
kseniyazaytseva
b193ea3d7b Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
* Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores)
* Fixed nrm2, axpby, ncopy, zgemv and scal kernels
* Added zero size checks
2024-01-18 22:14:32 +03:00
Dirreke
ec89466e14 Add CSKY support 2024-01-16 23:45:06 +08:00