Commit Graph

428 Commits

Author SHA1 Message Date
Martin Kroeker
11ff18bb0f Merge pull request #5081 from XiWeiGu/kernel_generic_fixed_cscal_zscal
kernel/generic: Fixed cscal and zscal
2025-06-12 01:03:00 -07:00
Martin Kroeker
42b7d1f897 Fix addressing of alpha in CBLAS 2025-05-21 22:03:38 +02:00
Martin Kroeker
6680e0592f Fix conditional inclusion of SGEMM_KERNEL_DIRECT 2025-05-17 05:12:15 -07:00
Martin Kroeker
70865a894e Merge pull request #5180 from ywwry66/openmp_use_cmake
CMake: Pass `OpenMP` compiler and linker flags through CMake targets
2025-04-08 13:16:07 -07:00
Ruiyang Wu
02fd1df10b CMake: Pass OpenMP compiler and linker flags through CMake targets
Using `OpenMP::OpenMP_LANG` targets for CMake is less error-prone than
passing the compiler and linker flags manually. Furthermore, it allows
the user to customize those flags by setting `OpenMP_LANG_FLAGS`,
`OpenMP_LANG_LIB_NAMES`, and `OpenMP_omp_LIBRARY`.
2025-03-26 23:09:54 -04:00
Martin Kroeker
51c1fb1f93 Fix ?spmv build and misinterpretation of NO_LAPACK=0 2025-03-26 23:36:49 +01:00
shubham.chaudhari
8e289ecddc Simplified thread throttling function in gemv 2025-03-18 13:24:05 +05:30
shubham.chaudhari
189dbbc04f Add thread throttling for dynamic arch neoversev1 2025-03-18 13:14:30 +05:30
shubham.chaudhari
b6cb5ece58 Add thread throttling profile for DGEMV on NEOVERSEV1 2025-03-18 13:14:30 +05:30
Martin Kroeker
7338a473a7 Merge pull request #5150 from Harishmcw/WoA-Experiments
Redefined threading logic for GESV and GEMV on WoA
2025-03-03 21:45:53 +01:00
Martin Kroeker
09ba099461 make throttling code conditional on SMP 2025-02-25 12:10:48 +01:00
Harishmcw
030ae1fd97 Redefined threading logic for WoA 2025-02-25 15:40:39 +05:30
Martin Kroeker
c03a81b927 Merge pull request #5141 from michalowski-arm/fork-throttle
Add throttling profile for SGEMM and SGEMV on `NEOVERSEV2`
2025-02-23 12:16:09 +01:00
Martin Kroeker
75b958a018 Transform the B array back if necessary before returning 2025-02-20 23:54:12 +01:00
Marek Michalowski
650a062e19 Add thread throttling profile for SGEMV on NEOVERSEV2 2025-02-20 10:28:31 +00:00
Marek Michalowski
b723c1b7b7 Add thread throttling profile for SGEMM on NEOVERSEV2 2025-02-20 10:28:21 +00:00
Vaisakh K V
f66ca05b31 Merge branch 'develop' into topic/sgemm_direct_sme1 2025-02-13 14:54:37 +05:30
Vaisakh K V
d23eb3b93e Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
2025-02-13 14:51:21 +05:30
Harish-Gits
daf16b8229 Adjusted GESV threading logic for optimal performance on WoA 2025-02-12 19:25:25 +05:30
Martin Kroeker
60d0be0e97 Update nrm2.c 2025-02-08 23:42:21 +01:00
Martin Kroeker
0fd5448b2c Handle INCX=0 2025-02-08 19:33:05 +01:00
Martin Kroeker
db7e5f1fa7 Update gemmt.c 2025-02-06 21:26:20 +01:00
Martin Kroeker
ff30ac9666 Update Makefile 2025-02-06 19:51:23 +01:00
Martin Kroeker
7c3e169b67 Update gemmt.c 2025-02-06 19:21:08 +01:00
Martin Kroeker
09414a4187 Ensure that GEMMTR name appears in XERBLA if gemmt was called as such 2025-02-06 18:52:00 +01:00
Marek Michalowski
838bb57e27 Merge branch 'develop' into develop 2025-01-24 14:19:35 +00:00
Martin Kroeker
a54f9a9c69 Merge pull request #5071 from annop-w/sgemm_throttling
Add thread throttling profile for SGEMM on NEOVERSEV1
2025-01-23 22:42:12 +01:00
Marek Michalowski
4d5b13f765 Add thread throttling profile for SGEMV on NEOVERSEV1 2025-01-22 10:50:04 +00:00
tingbo.liao
3c8df6358f Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2025-01-22 11:41:12 +08:00
gxw
e114880dc4 kernel/generic: Fixed cscal and zscal 2025-01-21 11:44:22 +08:00
Annop Wongwathanarat
c8cd8da496 Add thread throttling profile for SGEMM on NEOVERSEV1 2025-01-13 15:43:08 +00:00
Martin Kroeker
a1075477c3 Merge pull request #4994 from martin-frbg/issue4886
Disable multithreading in ?TRTRI for small workloads
2024-12-30 23:10:55 +01:00
Martin Kroeker
0c440f8a27 disable multithreading for small workloads 2024-11-27 23:15:41 +01:00
Martin Kroeker
2a290dfc2c forward GEMM3M calls for GENERIC targets to the regular C/ZGEMM for now 2024-11-14 14:07:08 -08:00
Martin Kroeker
0cf656fd3e Add copies of GEMMT under its new name GEMMTR 2024-10-30 12:55:14 +01:00
Chris Daley
cb48505251 optimize gemv forwarding on ARM64 systems 2024-10-24 21:05:26 -07:00
Chip Kerchner
36bd3eeddf Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power). 2024-10-13 13:46:11 -05:00
Chip Kerchner
1d51ca5798 Change multi-threading logic for SBGEMV to be the same as SGEMV. 2024-10-11 16:08:48 -05:00
Martin Kroeker
9762464718 Fix CBLAS interface filling in the wrong triangle for Row-Major 2024-10-09 18:06:39 +02:00
gxw
48698b2b1d LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Martin Kroeker
7878976236 disable forwarding from SBGEMM to SBGEMV for now 2024-08-08 18:03:38 +02:00
Chris Sidebottom
b26424c6a2 Allow opt into GEMM -> GEMV forwarding 2024-07-31 13:09:14 +01:00
Chris Sidebottom
90eb863d4b Re-add accidental removal 2024-07-31 13:09:14 +01:00
Chris Sidebottom
28b5334f22 Complete implementation of GEMV forwarding 2024-07-31 13:09:14 +01:00
Martin Kroeker
3db5dbc88e forward to GEMV when one argument is actually a vector 2024-07-31 13:09:14 +01:00
gxw
f3cebb3ca3 x86: Fixed numpy CI failure when the target is ZEN. 2024-07-12 16:09:30 +08:00
Martin Kroeker
2f12a47405 fix build options for CAXPYC/ZAXPYC 2024-06-09 20:32:10 +02:00
Martin Kroeker
db9f7bc552 fix float array types to include bfloat16 2024-06-03 00:22:16 +02:00
Martin Kroeker
076766df4e Update CMakeLists.txt 2024-05-31 18:23:18 +02:00
Martin Kroeker
ff6670cb83 don't generate non-cblas files for gemm_batch 2024-05-30 18:26:02 +02:00