Harishmcw
030ae1fd97
Redefined threading logic for WoA
2025-02-25 15:40:39 +05:30
Harish-Gits
daf16b8229
Adjusted GESV threading logic for optimal performance on WoA
2025-02-12 19:25:25 +05:30
Martin Kroeker
60d0be0e97
Update nrm2.c
2025-02-08 23:42:21 +01:00
Martin Kroeker
0fd5448b2c
Handle INCX=0
2025-02-08 19:33:05 +01:00
Martin Kroeker
db7e5f1fa7
Update gemmt.c
2025-02-06 21:26:20 +01:00
Martin Kroeker
ff30ac9666
Update Makefile
2025-02-06 19:51:23 +01:00
Martin Kroeker
7c3e169b67
Update gemmt.c
2025-02-06 19:21:08 +01:00
Martin Kroeker
09414a4187
Ensure that GEMMTR name appears in XERBLA if gemmt was called as such
2025-02-06 18:52:00 +01:00
Marek Michalowski
838bb57e27
Merge branch 'develop' into develop
2025-01-24 14:19:35 +00:00
Martin Kroeker
a54f9a9c69
Merge pull request #5071 from annop-w/sgemm_throttling
...
Add thread throttling profile for SGEMM on NEOVERSEV1
2025-01-23 22:42:12 +01:00
Marek Michalowski
4d5b13f765
Add thread throttling profile for SGEMV on NEOVERSEV1
2025-01-22 10:50:04 +00:00
tingbo.liao
3c8df6358f
Further rearranged the rotm kernel for the different architectures.
...
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com >
2025-01-22 11:41:12 +08:00
Annop Wongwathanarat
c8cd8da496
Add thread throttling profile for SGEMM on NEOVERSEV1
2025-01-13 15:43:08 +00:00
Martin Kroeker
a1075477c3
Merge pull request #4994 from martin-frbg/issue4886
...
Disable multithreading in ?TRTRI for small workloads
2024-12-30 23:10:55 +01:00
Martin Kroeker
0c440f8a27
disable multithreading for small workloads
2024-11-27 23:15:41 +01:00
Martin Kroeker
2a290dfc2c
forward GEMM3M calls for GENERIC targets to the regular C/ZGEMM for now
2024-11-14 14:07:08 -08:00
Martin Kroeker
0cf656fd3e
Add copies of GEMMT under its new name GEMMTR
2024-10-30 12:55:14 +01:00
Chris Daley
cb48505251
optimize gemv forwarding on ARM64 systems
2024-10-24 21:05:26 -07:00
Chip Kerchner
36bd3eeddf
Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power).
2024-10-13 13:46:11 -05:00
Chip Kerchner
1d51ca5798
Change multi-threading logic for SBGEMV to be the same as SGEMV.
2024-10-11 16:08:48 -05:00
Martin Kroeker
9762464718
Fix CBLAS interface filling in the wrong triangle for Row-Major
2024-10-09 18:06:39 +02:00
gxw
48698b2b1d
LoongArch64: Rename core
...
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Martin Kroeker
7878976236
disable forwarding from SBGEMM to SBGEMV for now
2024-08-08 18:03:38 +02:00
Chris Sidebottom
b26424c6a2
Allow opt into GEMM -> GEMV forwarding
2024-07-31 13:09:14 +01:00
Chris Sidebottom
90eb863d4b
Re-add accidental removal
2024-07-31 13:09:14 +01:00
Chris Sidebottom
28b5334f22
Complete implementation of GEMV forwarding
2024-07-31 13:09:14 +01:00
Martin Kroeker
3db5dbc88e
forward to GEMV when one argument is actually a vector
2024-07-31 13:09:14 +01:00
gxw
f3cebb3ca3
x86: Fixed numpy CI failure when the target is ZEN.
2024-07-12 16:09:30 +08:00
Martin Kroeker
2f12a47405
fix build options for CAXPYC/ZAXPYC
2024-06-09 20:32:10 +02:00
Martin Kroeker
db9f7bc552
fix float array types to include bfloat16
2024-06-03 00:22:16 +02:00
Martin Kroeker
076766df4e
Update CMakeLists.txt
2024-05-31 18:23:18 +02:00
Martin Kroeker
ff6670cb83
don't generate non-cblas files for gemm_batch
2024-05-30 18:26:02 +02:00
Martin Kroeker
362a063396
remove return value
2024-05-29 23:16:58 +02:00
Martin Kroeker
89c7bbcba6
add cblas_?gemm_batch
2024-05-29 15:47:02 +02:00
Martin Kroeker
2957281275
Introduce a lower limit for multithreading
2024-05-14 18:59:21 +02:00
Martin Kroeker
5fd871d7ea
Introduce a lower limit for multithreading
2024-05-14 18:48:03 +02:00
gxw
637c650f4f
loongarch64: Add buffer offset for target LOONGSON3R5
2024-05-10 11:42:53 +08:00
Martin Kroeker
93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
...
loongarch: Optimizing the performance of the GEMM on servers
2024-04-10 14:23:31 +02:00
gxw
d8c4ea8793
loongarch: Optimizing the performance of the GEMM on servers
2024-04-09 09:03:34 -04:00
Martin Kroeker
d277c6d15b
Merge pull request #4585 from martin-frbg/issue1881
...
Cap the number of parallel threads for GEMM;GETRF and POTRF to ensure sensible workloads on big systems
2024-04-03 18:35:16 +02:00
Igor Zhuravlov
22d305e2df
fix dtrtrs_ and ztrtrs_ to accept case-insensitive parameters uplo and diag
...
Changes to be committed:
modified: interface/lapack/trtrs.c
modified: interface/lapack/ztrtrs.c
2024-04-03 19:01:38 +10:00
Martin Kroeker
68ab5185d0
Update potrf.c
2024-03-27 22:10:01 +01:00
Martin Kroeker
19b29b3448
Update getrf.c
2024-03-27 22:09:30 +01:00
Martin Kroeker
a3354a7630
Cap the number of parallel threads
2024-03-27 22:00:30 +01:00
Martin Kroeker
5da4c93ef2
Cap the number of parallel threads
2024-03-27 20:34:55 +01:00
Martin Kroeker
496106642f
Cap the number of parallel threads
2024-03-27 20:32:11 +01:00
Martin Kroeker
cb8131cfd9
Merge pull request #4499 from kseniyazaytseva/new-tests
...
Tests for BLAS-like and BLAS API
2024-02-25 22:40:59 +01:00
Martin Kroeker
baf88564bc
Fix potential buffer overflow
2024-02-25 19:23:41 +01:00
kseniyazaytseva
7e9b1c0807
fix uninitialized data usage
2024-02-10 00:49:42 +03:00
kseniyazaytseva
c6f30fd414
check for zero inc
2024-02-10 00:48:07 +03:00