Chris Sidebottom
f95e7b0e32
Add infrastructure for BGEMM
...
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.
Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287
Co-authored-by: Ye Tao <ye.tao@arm.com >
2025-07-08 16:22:41 +01:00
Masato Nakagawa
5253c8f165
Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for
...
A64FX.
2025-06-30 21:35:16 +09:00
h-motoki
bba75d5e45
GEMM_PREFERED_SIZE parameter has been changed for A64FX.
2025-06-27 19:37:36 +09:00
Martin Kroeker
d96daa220d
Merge pull request #5290 from Srangrang/develop
...
Add support for FP16 to openBLAS and shgemm on RISCV
2025-06-24 23:10:15 +02:00
davidz-ampere
aa90ab4142
Add support for Ampere AmpereOne processors
2025-06-24 00:12:34 -04:00
davidz-ampere
be68ef03b4
Add support for Ampere processors
2025-06-15 22:00:40 -04:00
gkdddd
670ec6f757
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B
...
Added HFLOAT16 support for RISCV64
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16
The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0
Related to issue #5279
Co-authored-by Linjin Li <linjin_li@163.com >
2025-06-03 20:14:30 +08:00
Srangrang
0a967797a1
Add FP16 support for RISCV
2025-05-27 14:34:57 +08:00
Martin Kroeker
a34b487f22
Remove spurious cast from Alpha and Cell's DEFAULT_ALIGN
2025-04-09 17:25:46 +02:00
Vaisakh K V
f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1
2025-02-13 14:54:37 +05:30
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
...
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
2025-02-13 14:51:21 +05:30
Ye Tao
c748e6a338
optimized sbgemm kernel for neoverse-v1 (sve-256)
...
Signed-off-by: Ye Tao <ye.tao@arm.com >
2025-02-05 10:06:37 +00:00
Aditya Tewari
4379a6fbe3
* checkpoint sbgemm for SVE-256
2025-02-03 12:49:49 +00:00
Martin Kroeker
926e56e389
Align GEMM3M parameters for GENERIC with ZGEMM and add P/Q/R
2024-11-14 14:04:25 -08:00
Martin Kroeker
a47b3c8867
Fix unroll parameter selection for MIPS64_GENERIC
2024-10-13 22:54:34 +02:00
Martin Kroeker
7c4f3638fd
switch PPCG4 SGEMM kernel to 4x4
2024-10-03 22:00:15 +02:00
gxw
48698b2b1d
LoongArch64: Rename core
...
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Chip Kerchner
b1737698db
Fix DEFAULTS in SBGEMM for POWER10. Also comparisons for SBGEMM unit test can be exactly due to epilison differences.
2024-08-13 07:01:21 -05:00
Piotr Kubaj
4c12090776
Fix build on FreeBSD/powerpc64*
2024-07-10 22:21:48 +00:00
gxw
6017ad7146
loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6
2024-05-08 10:10:26 +08:00
Usui, Tetsuzo
ca673ca774
Add GEMM_PREFERED_SIZE parameter for Neoverse V1
2024-04-12 17:21:14 +09:00
Martin Kroeker
93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
...
loongarch: Optimizing the performance of the GEMM on servers
2024-04-10 14:23:31 +02:00
gxw
d8c4ea8793
loongarch: Optimizing the performance of the GEMM on servers
2024-04-09 09:03:34 -04:00
Martin Kroeker
ba6d485102
Adjust SWITCH_RATIO for ZEN and apply GEMM_PREFERRED_SIZE
2024-04-04 18:52:38 +02:00
Martin Kroeker
584e87661d
set SWITCH_RATIO for Cortex-A76
2024-04-02 23:10:45 +02:00
Martin Kroeker
b925f61fb0
Add support for Cortex-A76
2024-04-02 19:44:17 +02:00
Rajalakshmi Srinivasaraghavan
f5b2a877e2
POWER9: Use default param values from POWER8 on AIX
...
AIX uses KERNEL.POWER8 optimization on POWER9 and changing
the default GEMM parameters in param.h to use POWER8 values
on POWER9.
2024-03-20 10:17:49 -05:00
pengxu
4787a55c64
Optimized cgemm kernel 16x4 LASX for LoongArch
2024-02-21 15:28:47 +08:00
pengxu
fe3da43b7d
Optimized zgemm kernel 8*4 LASX, 4*4 LSX and cgemm kernel 8*4 LSX for LoongArch
2024-02-06 11:49:01 +08:00
Martin Kroeker
e5d2725e5a
Merge pull request #4185 from XiWeiGu/mips_enable_msa
...
MIPS: Enable MSA
2024-02-05 15:50:16 +01:00
Sergei Lewis
1093def0d1
Merge branch 'risc-v' into develop
2024-01-29 11:11:39 +00:00
Martin Kroeker
889c5d026a
Merge pull request #4456 from kseniyazaytseva/riscv-rvv10
...
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
2024-01-26 13:31:09 +01:00
kseniyazaytseva
b193ea3d7b
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
...
* Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores)
* Fixed nrm2, axpby, ncopy, zgemv and scal kernels
* Added zero size checks
2024-01-18 22:14:32 +03:00
Dirreke
ec89466e14
Add CSKY support
2024-01-16 23:45:06 +08:00
Martin Kroeker
504f9b0c5e
Increase S/D GEMM PQ to match typical L2 size as forNeoverseV1
2024-01-02 18:46:21 +01:00
Martin Kroeker
2802478449
revert change to Loongson2k1000 zgemm
2023-12-30 23:35:51 +01:00
Martin Kroeker
44b5b9e39f
Update C/ZGEMM MN for Loongson2k1000
2023-12-30 22:50:40 +01:00
Martin Kroeker
519b40fad9
Merge pull request #4398 from yinshiyou/la-dev
...
Add Optimizations for LoongArch.
2023-12-30 19:51:08 +01:00
pengxu
a5d0d21378
loongarch64: Add zgemm and cgemm optimization
2023-12-29 18:06:26 +08:00
Hao Chen
179ed51d3b
Add dgemm_kernel_8x4.S file.
2023-12-29 17:30:57 +08:00
Darshan Patel
dab0da8243
Update GEMM param for NEOVERSEV1
2023-12-19 13:56:55 +05:30
Octavian Maghiar
e4586e81b8
[RISC-V] Add RISC-V Vector 128-bit target
...
Current RVV x280 target depends on vlen=512-bits for Level 3 operations.
Commit adds generic target that supports vlen=128-bits.
New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations.
Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.
2023-12-04 11:02:18 +00:00
Rajalakshmi Srinivasaraghavan
980f702f72
POWER: AIX: Make use of power10 optimization
...
POWER10 optimizations are disabled when using default AIX assembler.
As we have fixed many issues recently, enabling optimization path
for default assembler.
2023-10-19 18:48:19 -05:00
gxw
553cc1372f
LoongArch64: Add sgemm_kernel
2023-08-23 16:08:43 +08:00
gxw
4d0f000db6
MIPS: Enable MSA
2023-08-07 21:00:10 +08:00
gxw
d46772e037
LoongArch64: Add compiler feature checks
2023-08-05 10:21:43 +08:00
Chris Sidebottom
84a268b6ca
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
...
This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868 , this means I'm happy to enable this on any applicable cores.
I also replicated the unrolling the copies from sgemm and dgemm.
2023-07-27 14:12:20 +01:00
Chris Sidebottom
f971ef55f2
Add ARMV8SVE to AArch64 Dynamic Dispatch
...
In order to enable support for future cores which have similar tunings
(in this case I'm doing this for the Arm(R) Neoverse(TM) V2 core), this generically detects SVE support and enables it. This should better manage the size and complexity of dynamic dispatch rather than just copy pasting the same parameters.
To make `ARMV8SVE` more representive of the common 128-bit SVE case,
I've split it and similar parameters from A64FX which has the wider
512-bit SVE.
2023-07-25 18:35:15 +01:00
Martin Kroeker
72caceb324
Merge pull request #4009 from Mousius/sve-gemm
...
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
2023-04-22 13:56:45 +02:00
Martin Kroeker
437c0bf2b4
Merge pull request #3843 from Mousius/switch-ratio
...
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
2023-04-19 11:51:54 +02:00