Commit Graph

132 Commits

Author SHA1 Message Date
Masato Nakagawa
7e29f11396 Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1) 2025-07-29 18:54:36 +09:00
youcai
41f9701ebc Fix cmake building with cblas_bgemm 2025-07-23 22:10:53 +08:00
Chris Sidebottom
48394384ef Use correct constants for per-target BGEMM/SBGEMM
This fixes the build and tests on `NEOVERSEV1` target, which was failing
with specific constants for `SBGEMM`

Co-authored-by: Ye Tao <ye.tao@arm.com>
2025-07-08 16:23:27 +01:00
Chris Sidebottom
f95e7b0e32 Add infrastructure for BGEMM
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.

Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287

Co-authored-by: Ye Tao <ye.tao@arm.com>
2025-07-08 16:22:41 +01:00
Chris Sidebottom
7a97c4ca97 Rename HALF -> BFLOAT16 in some more places 2025-07-07 10:13:39 +00:00
Masato Nakagawa
5253c8f165 Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for
A64FX.
2025-06-30 21:35:16 +09:00
Srangrang
9f13b2c6ac style: modify HALF to BFLOAT16 in benchmark folder 2025-06-15 20:57:05 +08:00
gkdddd
670ec6f757 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B
Added HFLOAT16 support for RISCV64
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16
The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0

Related to issue #5279
Co-authored-by Linjin Li <linjin_li@163.com>
2025-06-03 20:14:30 +08:00
Martin Kroeker
20f2ba0141 Move declaration of i for pre-C99 compilers 2025-05-21 23:44:17 +02:00
Masato Nakagawa
2351a98005 Update 2D thread-partitioned GEMM for M << N case. 2025-05-21 21:21:52 +09:00
Ruiyang Wu
02fd1df10b CMake: Pass OpenMP compiler and linker flags through CMake targets
Using `OpenMP::OpenMP_LANG` targets for CMake is less error-prone than
passing the compiler and linker flags manually. Furthermore, it allows
the user to customize those flags by setting `OpenMP_LANG_FLAGS`,
`OpenMP_LANG_LIB_NAMES`, and `OpenMP_omp_LIBRARY`.
2025-03-26 23:09:54 -04:00
Masato Nakagawa
80d3c2ad95 Add Improving Load Imbalance in Thread-Parallel GEMM 2025-03-11 20:18:20 +09:00
Martin Kroeker
77c638db67 Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO" 2025-02-15 20:37:48 +01:00
John Hein
6cd9bbe531 fix signedness of pointer to integer type passed to blas_lock() 2025-02-01 17:22:57 -07:00
Martin Kroeker
8a1710dd0d don't apply switch_ratio to tail of loop 2024-10-06 20:03:32 +02:00
shivammonaka
9e22d70957 Dynamic locking in Pthread Backend to allow multiple BLAS calls to be executed parallelly 2024-06-07 08:40:17 +05:30
Martin Kroeker
db070a9223 add gemm_batch drivers 2024-05-31 18:29:27 +02:00
Martin Kroeker
d0794f88dc add gemm_batch driver 2024-05-29 15:49:20 +02:00
yamazaki-mitsufumi
51ab1903e7 Expanding the scop of 2D thread distribution 2024-04-18 18:20:25 +09:00
shivammonaka
d49ebc54e1 Merge branch 'shivam-develop' into shivam-Locks 2024-02-29 11:58:14 +05:30
shivammonaka
bc191015e3 Using OpenMP locks with NUM_PARALLEL 2024-02-29 11:47:05 +05:30
Martin Kroeker
c4bd4a2e5d fix improper function prototypes (empty parentheses) 2023-09-30 12:49:24 +02:00
Chris Sidebottom
32f2fafde7 Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
Previously dynamic builds were either using the default SWITCH_RATIO
or one from the higher level architecture; this patch ensures the
dynamic builds can use this parameter as well.
2023-04-17 15:34:12 +01:00
Honglin Zhu
4989e039a5 Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build 2022-10-27 14:10:26 +08:00
Honglin Zhu
b00d5b9746 New sbgemm implementation for Neoverse N2
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
    2. Padding k to a power of 4.
2022-10-26 15:09:41 +08:00
Wangyang Guo
3dc6052c7e initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
Martin Kroeker
2f8220d757 Add sbgemm 2021-09-14 16:14:43 +02:00
Martin Kroeker
307c4c0786 Fix typo 2021-06-16 13:41:16 +02:00
Martin Kroeker
e83df93975 Work around another recent macro name collision with winnt.h 2021-06-16 12:32:34 +02:00
Martin Kroeker
a554712439 remove extra/intermediate size step for min_jj introduced in PR747 2020-12-08 21:01:36 +01:00
Martin Kroeker
5d26223f4a remove extra/intermediate size step of min_jj from PR747 2020-12-08 20:59:56 +01:00
Martin Kroeker
d3ff1f889f Convert ifndefs to ifneq 2020-11-22 16:27:17 +01:00
Rajalakshmi Srinivasaraghavan
b5d30b390d Fix build issues with bfloat16
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
2020-10-13 11:00:22 -05:00
Martin Kroeker
006c7f6671 Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:06:06 +02:00
Martin Kroeker
886a8e3190 Adapt for supporting only a subset of variable types 2020-10-11 14:57:32 +02:00
Martin Kroeker
ac653c94f3 Merge branch 'develop' into issue2588-cmake 2020-10-11 13:57:07 +02:00
Martin Kroeker
988a6f429e Add BUILD_vartype defines 2020-09-22 23:23:33 +02:00
Martin Kroeker
e5e2fbd593 Support building only selected types 2020-09-22 23:21:30 +02:00
y00512012
06cf73a239 fix a bug of trmm 2020-09-22 16:47:10 +08:00
Martin Kroeker
ddec244a5a Merge pull request #2838 from austinpagan/gordon_trmm
Adding performance patch for trmm, just like trsm (#2836)
2020-09-15 21:17:48 +02:00
fossum
dfeca46098 Adding performance patch for trmm, just like #2836 2020-09-15 08:59:50 -05:00
fossum
274d6e015b Fixing a performance bug in trsm_[LR].c. 2020-09-14 13:10:48 -05:00
Martin Kroeker
330044d821 Fix potentiol domain error in sqrt 2020-09-05 09:44:33 +02:00
Chen, Guobing
e740c4873d Enable COOPERLAKE build target
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
2020-08-13 06:18:00 +08:00
Martin Kroeker
ce45af8151 Update conditional for atomics to use HAVE_C11 2020-07-18 17:09:56 +00:00
Martin Kroeker
6f38de06d2 Update conditional for atomics to use HAVE_C11 2020-07-18 17:09:01 +00:00
Martin Kroeker
5dd14e3d48 Make building the bfloat16 functions conditional on option BUILD_HALF (#2590)
* make building the bfloat16 BLAS functions conditional on BUILD_HALF

* pass the BUILD_HALF option to gensymbol

* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Rajalakshmi Srinivasaraghavan
7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Ali Saidi
97ce6bbce2 Fix barriers in level3_thread 2020-02-29 17:45:17 +00:00
wjc404
2f96a2c55b Update trmm_R.c 2020-02-05 10:15:02 +08:00