Commit Graph

9104 Commits

Author SHA1 Message Date
Annop Wongwathanarat
edaf51dd99 Add sbgemv_t_bfdot kernel for ARM64
This improves performance for sbgemv_t by up to 100x on NEOVERSEV1.
The geometric mean speedup is ~61x for M=N=[2,512].
2025-02-28 12:31:50 +00:00
Martin Kroeker
ef9e3f7159 Merge pull request #5149 from martin-frbg/fixup5077-5088
Make the Neoverse GEMM/GEMV throttling code conditional on SMP
2025-02-25 14:01:13 +01:00
Martin Kroeker
09ba099461 make throttling code conditional on SMP 2025-02-25 12:10:48 +01:00
Martin Kroeker
1533fe49be Merge pull request #5144 from taoye9/dispatch_neoversve2_to_neoversven2
dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting
2025-02-24 16:07:06 +01:00
Martin Kroeker
c03a81b927 Merge pull request #5141 from michalowski-arm/fork-throttle
Add throttling profile for SGEMM and SGEMV on `NEOVERSEV2`
2025-02-23 12:16:09 +01:00
Martin Kroeker
643966d9c7 Merge pull request #5146 from martin-frbg/issue5123
Fix "dummy2" flag reading in PPC970 S/DSCAL
2025-02-22 21:57:09 +01:00
Martin Kroeker
77fba0f400 Fix "dummy2" flag handling 2025-02-22 20:09:21 +01:00
Ye Tao
f0bea79a6e dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting 2025-02-21 10:30:11 +00:00
Martin Kroeker
20d1118865 Merge pull request #5143 from martin-frbg/issue5111
Fix GEMMT transforming the input array B in some complex cases
2025-02-21 09:20:39 +01:00
Martin Kroeker
75b958a018 Transform the B array back if necessary before returning 2025-02-20 23:54:12 +01:00
Marek Michalowski
650a062e19 Add thread throttling profile for SGEMV on NEOVERSEV2 2025-02-20 10:28:31 +00:00
Marek Michalowski
b723c1b7b7 Add thread throttling profile for SGEMM on NEOVERSEV2 2025-02-20 10:28:21 +00:00
Martin Kroeker
ceb8f1e34b Merge pull request #5140 from martin-frbg/issue5139
Add ARM64 options for NVIDIA HPC
2025-02-19 18:17:15 +01:00
Martin Kroeker
f1fa370579 fix missing endif 2025-02-19 15:22:26 +01:00
Martin Kroeker
6d1444be3a Add ARM64 options for NVIDIA HPC 2025-02-19 14:26:43 +01:00
Martin Kroeker
eb84aac7ad Merge pull request #5084 from quic/topic/sgemm_direct_sme1
Support for SGEMM_DIRECT Kernel based on SME1
2025-02-19 10:56:49 +01:00
Martin Kroeker
abbd78aa59 Merge pull request #5138 from martin-frbg/issue5131
Ensure that gmake builds with flang-new link the flang runtime into the shared library
2025-02-18 09:53:31 +01:00
Martin Kroeker
ebcab90976 Handle flang-new runtime library linking on Linux like classic-flang 2025-02-17 23:12:58 +01:00
Martin Kroeker
ed1584666c Merge pull request #5137 from martin-frbg/issue5136
Fix the CMake build to define USE_TRMM for RISCV64 targets as well
2025-02-17 07:37:07 +01:00
Martin Kroeker
b9ae246f20 define USE_TRMM for RISCV64 targets as well 2025-02-16 23:18:04 +01:00
Martin Kroeker
86cf9d8a2e Merge pull request #5133 from OpenMathLib/revert-4920-issue4917
Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO"
2025-02-16 19:16:43 +01:00
Martin Kroeker
0b3c56968d Merge pull request #5135 from martin-frbg/ghwf-n2
CI: remove the express NeoverseN2 target from the Cobalt100 job in the gh workflow
2025-02-16 19:16:10 +01:00
Martin Kroeker
c1bb90a823 remove the express NeoverseN2 target from the Cobalt100 job 2025-02-16 14:23:07 +01:00
Martin Kroeker
77c638db67 Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO" 2025-02-15 20:37:48 +01:00
Vaisakh K V
f66ca05b31 Merge branch 'develop' into topic/sgemm_direct_sme1 2025-02-13 14:54:37 +05:30
Vaisakh K V
d23eb3b93e Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
2025-02-13 14:51:21 +05:30
Martin Kroeker
a64b75a2e0 Merge pull request #5127 from Harishmcw/gesv-threshold
Refined GESV Parallelization Logic for Windows on ARM64
2025-02-12 22:02:37 +01:00
Martin Kroeker
453efbd103 Merge pull request #5128 from martin-frbg/issue5120
Add -O2 to flang flags when building on WoA in Release mode
2025-02-12 21:02:06 +01:00
Martin Kroeker
877d5a5be6 Add -O2 to flang flags when building on WoA in Release mode 2025-02-12 17:01:06 +01:00
Martin Kroeker
8d487ef6eb Merge pull request #5124 from XiWeiGu/LoongArch64-LA264-lapack-fixed
LoongArch64: Fixed lapack test for LA264
2025-02-12 14:58:30 +01:00
Harish-Gits
daf16b8229 Adjusted GESV threading logic for optimal performance on WoA 2025-02-12 19:25:25 +05:30
Martin Kroeker
e8b11a126b Merge pull request #5125 from martin-frbg/issue5122
Fix SGEMV on POWER8 by reverting to the non-vectorized earlier code
2025-02-12 12:50:44 +01:00
Martin Kroeker
9a3948df82 Merge pull request #5126 from martin-frbg/cirrusbsd4
CirrusCI: Update FreeBSD jobs to 14.2
2025-02-12 12:50:21 +01:00
Martin Kroeker
7f1f776f58 Update FreeBSD jobs to 14.2 2025-02-12 11:23:02 +01:00
Martin Kroeker
81eed868b6 Restore the non-vectorized code from before PR4880 for POWER8 2025-02-12 09:07:20 +01:00
Martin Kroeker
98b5ef929c Restore the non-vectorized code from before PR4880 for POWER8 2025-02-12 09:04:22 +01:00
gxw
2c4a5cc6e6 LoongArch64: Fixed snrm2_lsx.S and cnrm2_lsx.S
When the data type is single-precision real or single-precision complex,
converting it to double precision does not prevent overflow (as exposed in LAPACK tests).
The only solution is to follow C's approach: find the maximum value in the
array and divide each element by that maximum to avoid this issue
2025-02-12 15:48:01 +08:00
gxw
9e75d6b3d1 LoongArch64: Fixed swap_lsx.S
Fixed the error when the stride is zero
2025-02-12 14:57:35 +08:00
gxw
e8c740368c LoongArch64: Fixed rot_lsx.S ane crot_lsx.S
Do not check whether the input parameters c and s are zero,
as this may cause errors with special values (same as scal).
Although OpenBLAS's own test suite doesn't catch this, it will
cause LAPACK test cases to fail.
2025-02-12 14:52:49 +08:00
Hao Chen
c2212d0abd LoongArch64: Fixed copy_lsx.S
Fixed incorrect store operation

Signed-off-by: gxw <guxiwei-hf@loongson.cn>
2025-02-12 14:52:20 +08:00
Hao Chen
7f1ebc7ae6 LoongArch64: Fixed iamax_lsx.S
Fixed index retrieval issue when there are
identical maximum absolute values

Signed-off-by: Hao Chen <chenhao@loongson.cn>
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
2025-02-12 14:44:44 +08:00
Hao Chen
31d326f895 LoongArch64: Fixed dot_lsx.S
Fixed incorrect register usage in instructions

Signed-off-by: gxw <guxiwei-hf@loongson.cn>
2025-02-12 14:44:11 +08:00
Hao Chen
5d6356bc16 LoongArch64: Fixed amax_lsx.S
Fixed register zeroing operation

Signed-off-by: Hao Chen <chenhao@loongson.cn>
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
2025-02-12 14:39:29 +08:00
Martin Kroeker
f42ce7067f Merge pull request #5116 from martin-frbg/issue5110
Handle INCX=0 in ?NRM2
2025-02-09 23:17:20 +01:00
Martin Kroeker
7478c10268 Merge branch 'OpenMathLib:develop' into issue5110 2025-02-09 21:40:02 +01:00
Martin Kroeker
c54f5417cc Merge pull request #5118 from martin-frbg/zrot_utestext
Disable extended utests for CSROT/ZDROT that invoke undefined behavior
2025-02-09 21:39:30 +01:00
Martin Kroeker
57208b8bce Disable tests with incx,incy=0 (undefined behavior) 2025-02-09 20:17:29 +01:00
Martin Kroeker
3a4a9b21eb Disable tests with incx,incy=0 (undefined behavior) 2025-02-09 20:16:03 +01:00
Martin Kroeker
60d0be0e97 Update nrm2.c 2025-02-08 23:42:21 +01:00
Martin Kroeker
0fd5448b2c Handle INCX=0 2025-02-08 19:33:05 +01:00