Annop Wongwathanarat
edaf51dd99
Add sbgemv_t_bfdot kernel for ARM64
...
This improves performance for sbgemv_t by up to 100x on NEOVERSEV1.
The geometric mean speedup is ~61x for M=N=[2,512].
2025-02-28 12:31:50 +00:00
Martin Kroeker
ef9e3f7159
Merge pull request #5149 from martin-frbg/fixup5077-5088
...
Make the Neoverse GEMM/GEMV throttling code conditional on SMP
2025-02-25 14:01:13 +01:00
Martin Kroeker
09ba099461
make throttling code conditional on SMP
2025-02-25 12:10:48 +01:00
Martin Kroeker
1533fe49be
Merge pull request #5144 from taoye9/dispatch_neoversve2_to_neoversven2
...
dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting
2025-02-24 16:07:06 +01:00
Martin Kroeker
c03a81b927
Merge pull request #5141 from michalowski-arm/fork-throttle
...
Add throttling profile for SGEMM and SGEMV on `NEOVERSEV2`
2025-02-23 12:16:09 +01:00
Martin Kroeker
643966d9c7
Merge pull request #5146 from martin-frbg/issue5123
...
Fix "dummy2" flag reading in PPC970 S/DSCAL
2025-02-22 21:57:09 +01:00
Martin Kroeker
77fba0f400
Fix "dummy2" flag handling
2025-02-22 20:09:21 +01:00
Ye Tao
f0bea79a6e
dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting
2025-02-21 10:30:11 +00:00
Martin Kroeker
20d1118865
Merge pull request #5143 from martin-frbg/issue5111
...
Fix GEMMT transforming the input array B in some complex cases
2025-02-21 09:20:39 +01:00
Martin Kroeker
75b958a018
Transform the B array back if necessary before returning
2025-02-20 23:54:12 +01:00
Marek Michalowski
650a062e19
Add thread throttling profile for SGEMV on NEOVERSEV2
2025-02-20 10:28:31 +00:00
Marek Michalowski
b723c1b7b7
Add thread throttling profile for SGEMM on NEOVERSEV2
2025-02-20 10:28:21 +00:00
Martin Kroeker
ceb8f1e34b
Merge pull request #5140 from martin-frbg/issue5139
...
Add ARM64 options for NVIDIA HPC
2025-02-19 18:17:15 +01:00
Martin Kroeker
f1fa370579
fix missing endif
2025-02-19 15:22:26 +01:00
Martin Kroeker
6d1444be3a
Add ARM64 options for NVIDIA HPC
2025-02-19 14:26:43 +01:00
Martin Kroeker
eb84aac7ad
Merge pull request #5084 from quic/topic/sgemm_direct_sme1
...
Support for SGEMM_DIRECT Kernel based on SME1
2025-02-19 10:56:49 +01:00
Martin Kroeker
abbd78aa59
Merge pull request #5138 from martin-frbg/issue5131
...
Ensure that gmake builds with flang-new link the flang runtime into the shared library
2025-02-18 09:53:31 +01:00
Martin Kroeker
ebcab90976
Handle flang-new runtime library linking on Linux like classic-flang
2025-02-17 23:12:58 +01:00
Martin Kroeker
ed1584666c
Merge pull request #5137 from martin-frbg/issue5136
...
Fix the CMake build to define USE_TRMM for RISCV64 targets as well
2025-02-17 07:37:07 +01:00
Martin Kroeker
b9ae246f20
define USE_TRMM for RISCV64 targets as well
2025-02-16 23:18:04 +01:00
Martin Kroeker
86cf9d8a2e
Merge pull request #5133 from OpenMathLib/revert-4920-issue4917
...
Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO"
2025-02-16 19:16:43 +01:00
Martin Kroeker
0b3c56968d
Merge pull request #5135 from martin-frbg/ghwf-n2
...
CI: remove the express NeoverseN2 target from the Cobalt100 job in the gh workflow
2025-02-16 19:16:10 +01:00
Martin Kroeker
c1bb90a823
remove the express NeoverseN2 target from the Cobalt100 job
2025-02-16 14:23:07 +01:00
Martin Kroeker
77c638db67
Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO"
2025-02-15 20:37:48 +01:00
Vaisakh K V
f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1
2025-02-13 14:54:37 +05:30
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
...
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
2025-02-13 14:51:21 +05:30
Martin Kroeker
a64b75a2e0
Merge pull request #5127 from Harishmcw/gesv-threshold
...
Refined GESV Parallelization Logic for Windows on ARM64
2025-02-12 22:02:37 +01:00
Martin Kroeker
453efbd103
Merge pull request #5128 from martin-frbg/issue5120
...
Add -O2 to flang flags when building on WoA in Release mode
2025-02-12 21:02:06 +01:00
Martin Kroeker
877d5a5be6
Add -O2 to flang flags when building on WoA in Release mode
2025-02-12 17:01:06 +01:00
Martin Kroeker
8d487ef6eb
Merge pull request #5124 from XiWeiGu/LoongArch64-LA264-lapack-fixed
...
LoongArch64: Fixed lapack test for LA264
2025-02-12 14:58:30 +01:00
Harish-Gits
daf16b8229
Adjusted GESV threading logic for optimal performance on WoA
2025-02-12 19:25:25 +05:30
Martin Kroeker
e8b11a126b
Merge pull request #5125 from martin-frbg/issue5122
...
Fix SGEMV on POWER8 by reverting to the non-vectorized earlier code
2025-02-12 12:50:44 +01:00
Martin Kroeker
9a3948df82
Merge pull request #5126 from martin-frbg/cirrusbsd4
...
CirrusCI: Update FreeBSD jobs to 14.2
2025-02-12 12:50:21 +01:00
Martin Kroeker
7f1f776f58
Update FreeBSD jobs to 14.2
2025-02-12 11:23:02 +01:00
Martin Kroeker
81eed868b6
Restore the non-vectorized code from before PR4880 for POWER8
2025-02-12 09:07:20 +01:00
Martin Kroeker
98b5ef929c
Restore the non-vectorized code from before PR4880 for POWER8
2025-02-12 09:04:22 +01:00
gxw
2c4a5cc6e6
LoongArch64: Fixed snrm2_lsx.S and cnrm2_lsx.S
...
When the data type is single-precision real or single-precision complex,
converting it to double precision does not prevent overflow (as exposed in LAPACK tests).
The only solution is to follow C's approach: find the maximum value in the
array and divide each element by that maximum to avoid this issue
2025-02-12 15:48:01 +08:00
gxw
9e75d6b3d1
LoongArch64: Fixed swap_lsx.S
...
Fixed the error when the stride is zero
2025-02-12 14:57:35 +08:00
gxw
e8c740368c
LoongArch64: Fixed rot_lsx.S ane crot_lsx.S
...
Do not check whether the input parameters c and s are zero,
as this may cause errors with special values (same as scal).
Although OpenBLAS's own test suite doesn't catch this, it will
cause LAPACK test cases to fail.
2025-02-12 14:52:49 +08:00
Hao Chen
c2212d0abd
LoongArch64: Fixed copy_lsx.S
...
Fixed incorrect store operation
Signed-off-by: gxw <guxiwei-hf@loongson.cn >
2025-02-12 14:52:20 +08:00
Hao Chen
7f1ebc7ae6
LoongArch64: Fixed iamax_lsx.S
...
Fixed index retrieval issue when there are
identical maximum absolute values
Signed-off-by: Hao Chen <chenhao@loongson.cn >
Signed-off-by: gxw <guxiwei-hf@loongson.cn >
2025-02-12 14:44:44 +08:00
Hao Chen
31d326f895
LoongArch64: Fixed dot_lsx.S
...
Fixed incorrect register usage in instructions
Signed-off-by: gxw <guxiwei-hf@loongson.cn >
2025-02-12 14:44:11 +08:00
Hao Chen
5d6356bc16
LoongArch64: Fixed amax_lsx.S
...
Fixed register zeroing operation
Signed-off-by: Hao Chen <chenhao@loongson.cn >
Signed-off-by: gxw <guxiwei-hf@loongson.cn >
2025-02-12 14:39:29 +08:00
Martin Kroeker
f42ce7067f
Merge pull request #5116 from martin-frbg/issue5110
...
Handle INCX=0 in ?NRM2
2025-02-09 23:17:20 +01:00
Martin Kroeker
7478c10268
Merge branch 'OpenMathLib:develop' into issue5110
2025-02-09 21:40:02 +01:00
Martin Kroeker
c54f5417cc
Merge pull request #5118 from martin-frbg/zrot_utestext
...
Disable extended utests for CSROT/ZDROT that invoke undefined behavior
2025-02-09 21:39:30 +01:00
Martin Kroeker
57208b8bce
Disable tests with incx,incy=0 (undefined behavior)
2025-02-09 20:17:29 +01:00
Martin Kroeker
3a4a9b21eb
Disable tests with incx,incy=0 (undefined behavior)
2025-02-09 20:16:03 +01:00
Martin Kroeker
60d0be0e97
Update nrm2.c
2025-02-08 23:42:21 +01:00
Martin Kroeker
0fd5448b2c
Handle INCX=0
2025-02-08 19:33:05 +01:00