Commit Graph

460 Commits

Author SHA1 Message Date
Martin Kroeker
c1c1285236 Add lower limit for multithreading 2025-10-28 09:40:24 +01:00
Martin Kroeker
8e44cde3f6 Add lower limit for multithreading 2025-10-28 09:39:16 +01:00
Martin Kroeker
75b3e110c4 Add lower limit for multithreading 2025-10-28 09:34:45 +01:00
Martin Kroeker
c5b0d1efd1 Add lower limit for multithreading 2025-10-28 09:33:32 +01:00
Martin Kroeker
e40714cabd Merge pull request #5450 from quic/topic/strmm_direct_sme1
Support for SME1 based strmm_direct kernel for cblas_strmm level 3 API
2025-10-11 15:20:19 -07:00
changjua
644ea07ef9 Support for SME1 based strmm_direct kernel for cblas_strmm level 3 API 2025-10-10 10:48:27 +08:00
Martin Kroeker
fa912ce852 rework definitions of ?FLOAT16_GEMM_GEMV_FORWARD 2025-10-08 11:11:52 +02:00
Chris Sidebottom
37fc3bbca0 Add Infrastructure for SHGEMV
This adds all the relevant bits and pieces to add a `shgemv` path as
well as a future `hgemm`/`hgemv` path in a similar model to `sb` and `b`
interfaces.

I've also fixed a few bits and pieces around `shgemm` which didn't build
in a few situations.
2025-10-07 15:03:24 +00:00
Martin Kroeker
e939c6c315 Merge pull request #5471 from quic/topic/ssymm_direct_sme1
Support for SME1 based ssymm_direct kernel for cblas_ssymm level 3 API
2025-10-01 06:22:36 -07:00
Rajendra Prasad Matcha
19268471cc Support for SME1 based ssymm_direct kernel for cblas_ssymm level 3 API 2025-09-30 15:05:33 +05:30
Martin Kroeker
e58f6dc50d Add extensions ?GEMM_BATCH_STRIDED and CBLAS_?GEMM_BATCH_STRIDED (#5458)
* Add ?GEMM_BATCH_STRIDED and CBLAS_?GEMM_BATCH_STRIDED
2025-09-26 14:00:47 +02:00
Martin Kroeker
99c077a3a7 Update gemm_batch.c 2025-09-23 15:43:33 +02:00
Martin Kroeker
fc042d928b fix symbol naming (underscoring) 2025-09-21 19:01:11 +02:00
Martin Kroeker
cdeac0a3ae Build non-CBLAS ?gemm_batch too 2025-09-17 11:42:11 -07:00
Martin Kroeker
eb931deb22 Add BLAS interface to ?GEMM_BATCH 2025-09-17 10:23:48 -07:00
Martin Kroeker
30d11bc92c Adjust multithreading threshold and add an intermediate step 2025-07-30 08:13:33 -07:00
Martin Kroeker
a9e8fa06bf Introduce a (crude) threshold to multithreading 2025-07-25 15:15:46 +02:00
Martin Kroeker
965463f177 Include float-bfloat conversion functions in ONLY_CBLAS builds as well 2025-07-24 23:33:20 +02:00
youcai
41f9701ebc Fix cmake building with cblas_bgemm 2025-07-23 22:10:53 +08:00
Martin Kroeker
30dbca5051 fix misleading indentation to silence a gcc warning 2025-07-18 23:51:04 +02:00
Martin Kroeker
39c90f9859 Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta
SME1 based direct kernel (with alpha and beta) for cblas_sgemm level 3
2025-07-18 23:23:39 +02:00
Rajendra Prasad Matcha
eae0abfdb6 SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API. 2025-07-17 16:14:31 +05:30
Chris Sidebottom
947d7af4c9 Fix CMake references to bscal and bgemv 2025-07-15 15:41:53 +01:00
Chris Sidebottom
e105411460 Add infrastructure for bgemv/bscal
- Sets up all the various entrypoints for `bgemv`
- Adds `bscal` for use in the `bgemv` interface
- Adds test cases for comparing `sgemv` and `bgemv`
- Adds generic kernels for `bgemv_n` and `bgemv_t` which are accurate
enough to pass above tests
2025-07-15 14:48:57 +01:00
Chris Sidebottom
740efd71c4 Add optimized BGEMM kernel for NEOVERSEV1 target
This also improves the testing and generic kernel by re-using the BF16
conversion functions.

Built on top of https://github.com/OpenMathLib/OpenBLAS/pull/5357 and derived from https://github.com/OpenMathLib/OpenBLAS/pull/5287

Co-authored-by: Ye Tao <ye.tao@arm.com>
2025-07-10 23:23:27 +00:00
Chris Sidebottom
66d9185ebe Fix CMake support 2025-07-08 22:49:55 +00:00
Chris Sidebottom
f95e7b0e32 Add infrastructure for BGEMM
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.

Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287

Co-authored-by: Ye Tao <ye.tao@arm.com>
2025-07-08 16:22:41 +01:00
Usui, Tetsuzo
14107e37d9 Add parallel laed3 2025-07-01 22:12:27 +09:00
Martin Kroeker
d96daa220d Merge pull request #5290 from Srangrang/develop
Add support for FP16 to openBLAS and shgemm on RISCV
2025-06-24 23:10:15 +02:00
Srangrang
ec14e1648c fix: resolve non-RISCV host build failed issue
- adjust interface to disable "small matrix" pathway
- separate HFLOAT16 from BFLOAT16
- remove SHGEMM_UNROLL_M and SHGEMM_UNROLL_N equal conditions

Related to PR#5290
Co-authored-by Martin
2025-06-15 20:25:15 +08:00
Martin Kroeker
5e393f207c fix source file used for sbgemmt/sbgemmtr 2025-06-15 00:06:34 +02:00
Martin Kroeker
11ff18bb0f Merge pull request #5081 from XiWeiGu/kernel_generic_fixed_cscal_zscal
kernel/generic: Fixed cscal and zscal
2025-06-12 01:03:00 -07:00
gkdddd
670ec6f757 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B
Added HFLOAT16 support for RISCV64
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16
The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0

Related to issue #5279
Co-authored-by Linjin Li <linjin_li@163.com>
2025-06-03 20:14:30 +08:00
Martin Kroeker
42b7d1f897 Fix addressing of alpha in CBLAS 2025-05-21 22:03:38 +02:00
Martin Kroeker
6680e0592f Fix conditional inclusion of SGEMM_KERNEL_DIRECT 2025-05-17 05:12:15 -07:00
Martin Kroeker
70865a894e Merge pull request #5180 from ywwry66/openmp_use_cmake
CMake: Pass `OpenMP` compiler and linker flags through CMake targets
2025-04-08 13:16:07 -07:00
Ruiyang Wu
02fd1df10b CMake: Pass OpenMP compiler and linker flags through CMake targets
Using `OpenMP::OpenMP_LANG` targets for CMake is less error-prone than
passing the compiler and linker flags manually. Furthermore, it allows
the user to customize those flags by setting `OpenMP_LANG_FLAGS`,
`OpenMP_LANG_LIB_NAMES`, and `OpenMP_omp_LIBRARY`.
2025-03-26 23:09:54 -04:00
Martin Kroeker
51c1fb1f93 Fix ?spmv build and misinterpretation of NO_LAPACK=0 2025-03-26 23:36:49 +01:00
shubham.chaudhari
8e289ecddc Simplified thread throttling function in gemv 2025-03-18 13:24:05 +05:30
shubham.chaudhari
189dbbc04f Add thread throttling for dynamic arch neoversev1 2025-03-18 13:14:30 +05:30
shubham.chaudhari
b6cb5ece58 Add thread throttling profile for DGEMV on NEOVERSEV1 2025-03-18 13:14:30 +05:30
Martin Kroeker
7338a473a7 Merge pull request #5150 from Harishmcw/WoA-Experiments
Redefined threading logic for GESV and GEMV on WoA
2025-03-03 21:45:53 +01:00
Martin Kroeker
09ba099461 make throttling code conditional on SMP 2025-02-25 12:10:48 +01:00
Harishmcw
030ae1fd97 Redefined threading logic for WoA 2025-02-25 15:40:39 +05:30
Martin Kroeker
c03a81b927 Merge pull request #5141 from michalowski-arm/fork-throttle
Add throttling profile for SGEMM and SGEMV on `NEOVERSEV2`
2025-02-23 12:16:09 +01:00
Martin Kroeker
75b958a018 Transform the B array back if necessary before returning 2025-02-20 23:54:12 +01:00
Marek Michalowski
650a062e19 Add thread throttling profile for SGEMV on NEOVERSEV2 2025-02-20 10:28:31 +00:00
Marek Michalowski
b723c1b7b7 Add thread throttling profile for SGEMM on NEOVERSEV2 2025-02-20 10:28:21 +00:00
Vaisakh K V
f66ca05b31 Merge branch 'develop' into topic/sgemm_direct_sme1 2025-02-13 14:54:37 +05:30
Vaisakh K V
d23eb3b93e Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
2025-02-13 14:51:21 +05:30