Martin Kroeker
d7b0fccbb4
Enable SME-based kernels for VortexM4 with clang-based compilers only
2025-10-19 13:34:26 -07:00
Martin Kroeker
9bfc3612f9
Merge branch 'OpenMathLib:develop' into issue5414
2025-10-12 09:18:06 -07:00
Martin Kroeker
e40714cabd
Merge pull request #5450 from quic/topic/strmm_direct_sme1
...
Support for SME1 based strmm_direct kernel for cblas_strmm level 3 API
2025-10-11 15:20:19 -07:00
changjua
644ea07ef9
Support for SME1 based strmm_direct kernel for cblas_strmm level 3 API
2025-10-10 10:48:27 +08:00
Martin Kroeker
fc516af155
Merge branch 'develop' into issue5414
2025-10-01 14:12:59 -07:00
Martin Kroeker
e939c6c315
Merge pull request #5471 from quic/topic/ssymm_direct_sme1
...
Support for SME1 based ssymm_direct kernel for cblas_ssymm level 3 API
2025-10-01 06:22:36 -07:00
Rajendra Prasad Matcha
19268471cc
Support for SME1 based ssymm_direct kernel for cblas_ssymm level 3 API
2025-09-30 15:05:33 +05:30
Chip Kerchner
92f09a6a98
Add BF16 sbgemm on RISCV.
2025-09-22 14:32:43 +00:00
Martin Kroeker
cb6c4392a5
Make GEMM3M parameters available on 32bit X86-GENERIC
2025-09-10 22:44:14 +02:00
Martin Kroeker
202a7a0e2a
Separate VORTEXM4 from VORTEX and ARMV9SME
2025-08-18 01:45:40 -07:00
Chris Sidebottom
114316f361
Optimize SBGEMM / BGEMM for NEOVERSEV1 further
...
This changes the kernels to pack full SVE vectors and reduces the
overall complexity of the inner GEMM loop.
2025-08-11 09:25:13 +00:00
Masato Nakagawa
7e29f11396
Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1)
2025-07-29 18:54:36 +09:00
Martin Kroeker
c504aedca1
Merge pull request #5400 from Mousius/neoversev2-target
...
Add NEOVERSEV2 target support
2025-07-25 15:47:06 +02:00
Chris Sidebottom
87247daadc
Add NEOVERSEV2 target support
...
Did a quick run around to make `TARGET=NEVOERSEV2` build successfully.
Fixes #5385
2025-07-24 12:40:31 +01:00
Chris Sidebottom
ea2faf0c9a
Add optimized BGEMM for NEOVERSEN2 target
...
This re-uses the existing NEOVERSEN2 8x4 `sbgemm` kernel to implement `bgemm`.
2025-07-24 10:59:28 +00:00
Chris Sidebottom
740efd71c4
Add optimized BGEMM kernel for NEOVERSEV1 target
...
This also improves the testing and generic kernel by re-using the BF16
conversion functions.
Built on top of https://github.com/OpenMathLib/OpenBLAS/pull/5357 and derived from https://github.com/OpenMathLib/OpenBLAS/pull/5287
Co-authored-by: Ye Tao <ye.tao@arm.com >
2025-07-10 23:23:27 +00:00
Chris Sidebottom
f95e7b0e32
Add infrastructure for BGEMM
...
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.
Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287
Co-authored-by: Ye Tao <ye.tao@arm.com >
2025-07-08 16:22:41 +01:00
Masato Nakagawa
5253c8f165
Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for
...
A64FX.
2025-06-30 21:35:16 +09:00
h-motoki
bba75d5e45
GEMM_PREFERED_SIZE parameter has been changed for A64FX.
2025-06-27 19:37:36 +09:00
Martin Kroeker
d96daa220d
Merge pull request #5290 from Srangrang/develop
...
Add support for FP16 to openBLAS and shgemm on RISCV
2025-06-24 23:10:15 +02:00
davidz-ampere
aa90ab4142
Add support for Ampere AmpereOne processors
2025-06-24 00:12:34 -04:00
davidz-ampere
be68ef03b4
Add support for Ampere processors
2025-06-15 22:00:40 -04:00
gkdddd
670ec6f757
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B
...
Added HFLOAT16 support for RISCV64
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16
The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0
Related to issue #5279
Co-authored-by Linjin Li <linjin_li@163.com >
2025-06-03 20:14:30 +08:00
Srangrang
0a967797a1
Add FP16 support for RISCV
2025-05-27 14:34:57 +08:00
Martin Kroeker
a34b487f22
Remove spurious cast from Alpha and Cell's DEFAULT_ALIGN
2025-04-09 17:25:46 +02:00
Vaisakh K V
f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1
2025-02-13 14:54:37 +05:30
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
...
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
2025-02-13 14:51:21 +05:30
Ye Tao
c748e6a338
optimized sbgemm kernel for neoverse-v1 (sve-256)
...
Signed-off-by: Ye Tao <ye.tao@arm.com >
2025-02-05 10:06:37 +00:00
Aditya Tewari
4379a6fbe3
* checkpoint sbgemm for SVE-256
2025-02-03 12:49:49 +00:00
Martin Kroeker
926e56e389
Align GEMM3M parameters for GENERIC with ZGEMM and add P/Q/R
2024-11-14 14:04:25 -08:00
Martin Kroeker
a47b3c8867
Fix unroll parameter selection for MIPS64_GENERIC
2024-10-13 22:54:34 +02:00
Martin Kroeker
7c4f3638fd
switch PPCG4 SGEMM kernel to 4x4
2024-10-03 22:00:15 +02:00
gxw
48698b2b1d
LoongArch64: Rename core
...
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Chip Kerchner
b1737698db
Fix DEFAULTS in SBGEMM for POWER10. Also comparisons for SBGEMM unit test can be exactly due to epilison differences.
2024-08-13 07:01:21 -05:00
Piotr Kubaj
4c12090776
Fix build on FreeBSD/powerpc64*
2024-07-10 22:21:48 +00:00
gxw
6017ad7146
loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6
2024-05-08 10:10:26 +08:00
Usui, Tetsuzo
ca673ca774
Add GEMM_PREFERED_SIZE parameter for Neoverse V1
2024-04-12 17:21:14 +09:00
Martin Kroeker
93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
...
loongarch: Optimizing the performance of the GEMM on servers
2024-04-10 14:23:31 +02:00
gxw
d8c4ea8793
loongarch: Optimizing the performance of the GEMM on servers
2024-04-09 09:03:34 -04:00
Martin Kroeker
ba6d485102
Adjust SWITCH_RATIO for ZEN and apply GEMM_PREFERRED_SIZE
2024-04-04 18:52:38 +02:00
Martin Kroeker
584e87661d
set SWITCH_RATIO for Cortex-A76
2024-04-02 23:10:45 +02:00
Martin Kroeker
b925f61fb0
Add support for Cortex-A76
2024-04-02 19:44:17 +02:00
Rajalakshmi Srinivasaraghavan
f5b2a877e2
POWER9: Use default param values from POWER8 on AIX
...
AIX uses KERNEL.POWER8 optimization on POWER9 and changing
the default GEMM parameters in param.h to use POWER8 values
on POWER9.
2024-03-20 10:17:49 -05:00
pengxu
4787a55c64
Optimized cgemm kernel 16x4 LASX for LoongArch
2024-02-21 15:28:47 +08:00
pengxu
fe3da43b7d
Optimized zgemm kernel 8*4 LASX, 4*4 LSX and cgemm kernel 8*4 LSX for LoongArch
2024-02-06 11:49:01 +08:00
Martin Kroeker
e5d2725e5a
Merge pull request #4185 from XiWeiGu/mips_enable_msa
...
MIPS: Enable MSA
2024-02-05 15:50:16 +01:00
Sergei Lewis
1093def0d1
Merge branch 'risc-v' into develop
2024-01-29 11:11:39 +00:00
Martin Kroeker
889c5d026a
Merge pull request #4456 from kseniyazaytseva/riscv-rvv10
...
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
2024-01-26 13:31:09 +01:00
kseniyazaytseva
b193ea3d7b
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
...
* Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores)
* Fixed nrm2, axpby, ncopy, zgemv and scal kernels
* Added zero size checks
2024-01-18 22:14:32 +03:00
Dirreke
ec89466e14
Add CSKY support
2024-01-16 23:45:06 +08:00