OpenBLAS

mirror of https://github.com/OpenMathLib/OpenBLAS synced 2026-06-15 07:51:43 +08:00

Author	SHA1	Message	Date
Annop Wongwathanarat	edaf51dd99	Add sbgemv_t_bfdot kernel for ARM64 This improves performance for sbgemv_t by up to 100x on NEOVERSEV1. The geometric mean speedup is ~61x for M=N=[2,512].	2025-02-28 12:31:50 +00:00
Martin Kroeker	ef9e3f7159	Merge pull request #5149 from martin-frbg/fixup5077-5088 Make the Neoverse GEMM/GEMV throttling code conditional on SMP	2025-02-25 14:01:13 +01:00
Martin Kroeker	09ba099461	make throttling code conditional on SMP	2025-02-25 12:10:48 +01:00
Martin Kroeker	1533fe49be	Merge pull request #5144 from taoye9/dispatch_neoversve2_to_neoversven2 dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting	2025-02-24 16:07:06 +01:00
Martin Kroeker	c03a81b927	Merge pull request #5141 from michalowski-arm/fork-throttle Add throttling profile for SGEMM and SGEMV on `NEOVERSEV2`	2025-02-23 12:16:09 +01:00
Martin Kroeker	643966d9c7	Merge pull request #5146 from martin-frbg/issue5123 Fix "dummy2" flag reading in PPC970 S/DSCAL	2025-02-22 21:57:09 +01:00
Martin Kroeker	77fba0f400	Fix "dummy2" flag handling	2025-02-22 20:09:21 +01:00
Ye Tao	f0bea79a6e	dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting	2025-02-21 10:30:11 +00:00
Martin Kroeker	20d1118865	Merge pull request #5143 from martin-frbg/issue5111 Fix GEMMT transforming the input array B in some complex cases	2025-02-21 09:20:39 +01:00
Martin Kroeker	75b958a018	Transform the B array back if necessary before returning	2025-02-20 23:54:12 +01:00
Marek Michalowski	650a062e19	Add thread throttling profile for SGEMV on `NEOVERSEV2`	2025-02-20 10:28:31 +00:00
Marek Michalowski	b723c1b7b7	Add thread throttling profile for SGEMM on `NEOVERSEV2`	2025-02-20 10:28:21 +00:00
Martin Kroeker	ceb8f1e34b	Merge pull request #5140 from martin-frbg/issue5139 Add ARM64 options for NVIDIA HPC	2025-02-19 18:17:15 +01:00
Martin Kroeker	f1fa370579	fix missing endif	2025-02-19 15:22:26 +01:00
Martin Kroeker	6d1444be3a	Add ARM64 options for NVIDIA HPC	2025-02-19 14:26:43 +01:00
Martin Kroeker	eb84aac7ad	Merge pull request #5084 from quic/topic/sgemm_direct_sme1 Support for SGEMM_DIRECT Kernel based on SME1	2025-02-19 10:56:49 +01:00
Martin Kroeker	abbd78aa59	Merge pull request #5138 from martin-frbg/issue5131 Ensure that gmake builds with flang-new link the flang runtime into the shared library	2025-02-18 09:53:31 +01:00
Martin Kroeker	ebcab90976	Handle flang-new runtime library linking on Linux like classic-flang	2025-02-17 23:12:58 +01:00
Martin Kroeker	ed1584666c	Merge pull request #5137 from martin-frbg/issue5136 Fix the CMake build to define USE_TRMM for RISCV64 targets as well	2025-02-17 07:37:07 +01:00
Martin Kroeker	b9ae246f20	define USE_TRMM for RISCV64 targets as well	2025-02-16 23:18:04 +01:00
Martin Kroeker	86cf9d8a2e	Merge pull request #5133 from OpenMathLib/revert-4920-issue4917 Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO"	2025-02-16 19:16:43 +01:00
Martin Kroeker	0b3c56968d	Merge pull request #5135 from martin-frbg/ghwf-n2 CI: remove the express NeoverseN2 target from the Cobalt100 job in the gh workflow	2025-02-16 19:16:10 +01:00
Martin Kroeker	c1bb90a823	remove the express NeoverseN2 target from the Cobalt100 job	2025-02-16 14:23:07 +01:00
Martin Kroeker	77c638db67	Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO"	2025-02-15 20:37:48 +01:00
Vaisakh K V	f66ca05b31	Merge branch 'develop' into topic/sgemm_direct_sme1	2025-02-13 14:54:37 +05:30
Vaisakh K V	d23eb3b93e	Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API * Added ARMV9SME target * Added SGEMM_DIRECT kernel based on SME1	2025-02-13 14:51:21 +05:30
Martin Kroeker	a64b75a2e0	Merge pull request #5127 from Harishmcw/gesv-threshold Refined GESV Parallelization Logic for Windows on ARM64	2025-02-12 22:02:37 +01:00
Martin Kroeker	453efbd103	Merge pull request #5128 from martin-frbg/issue5120 Add -O2 to flang flags when building on WoA in Release mode	2025-02-12 21:02:06 +01:00
Martin Kroeker	877d5a5be6	Add -O2 to flang flags when building on WoA in Release mode	2025-02-12 17:01:06 +01:00
Martin Kroeker	8d487ef6eb	Merge pull request #5124 from XiWeiGu/LoongArch64-LA264-lapack-fixed LoongArch64: Fixed lapack test for LA264	2025-02-12 14:58:30 +01:00
Harish-Gits	daf16b8229	Adjusted GESV threading logic for optimal performance on WoA	2025-02-12 19:25:25 +05:30
Martin Kroeker	e8b11a126b	Merge pull request #5125 from martin-frbg/issue5122 Fix SGEMV on POWER8 by reverting to the non-vectorized earlier code	2025-02-12 12:50:44 +01:00
Martin Kroeker	9a3948df82	Merge pull request #5126 from martin-frbg/cirrusbsd4 CirrusCI: Update FreeBSD jobs to 14.2	2025-02-12 12:50:21 +01:00
Martin Kroeker	7f1f776f58	Update FreeBSD jobs to 14.2	2025-02-12 11:23:02 +01:00
Martin Kroeker	81eed868b6	Restore the non-vectorized code from before PR4880 for POWER8	2025-02-12 09:07:20 +01:00
Martin Kroeker	98b5ef929c	Restore the non-vectorized code from before PR4880 for POWER8	2025-02-12 09:04:22 +01:00
gxw	2c4a5cc6e6	LoongArch64: Fixed snrm2_lsx.S and cnrm2_lsx.S When the data type is single-precision real or single-precision complex, converting it to double precision does not prevent overflow (as exposed in LAPACK tests). The only solution is to follow C's approach: find the maximum value in the array and divide each element by that maximum to avoid this issue	2025-02-12 15:48:01 +08:00
gxw	9e75d6b3d1	LoongArch64: Fixed swap_lsx.S Fixed the error when the stride is zero	2025-02-12 14:57:35 +08:00
gxw	e8c740368c	LoongArch64: Fixed rot_lsx.S ane crot_lsx.S Do not check whether the input parameters c and s are zero, as this may cause errors with special values (same as scal). Although OpenBLAS's own test suite doesn't catch this, it will cause LAPACK test cases to fail.	2025-02-12 14:52:49 +08:00
Hao Chen	c2212d0abd	LoongArch64: Fixed copy_lsx.S Fixed incorrect store operation Signed-off-by: gxw <guxiwei-hf@loongson.cn>	2025-02-12 14:52:20 +08:00
Hao Chen	7f1ebc7ae6	LoongArch64: Fixed iamax_lsx.S Fixed index retrieval issue when there are identical maximum absolute values Signed-off-by: Hao Chen <chenhao@loongson.cn> Signed-off-by: gxw <guxiwei-hf@loongson.cn>	2025-02-12 14:44:44 +08:00
Hao Chen	31d326f895	LoongArch64: Fixed dot_lsx.S Fixed incorrect register usage in instructions Signed-off-by: gxw <guxiwei-hf@loongson.cn>	2025-02-12 14:44:11 +08:00
Hao Chen	5d6356bc16	LoongArch64: Fixed amax_lsx.S Fixed register zeroing operation Signed-off-by: Hao Chen <chenhao@loongson.cn> Signed-off-by: gxw <guxiwei-hf@loongson.cn>	2025-02-12 14:39:29 +08:00
Martin Kroeker	f42ce7067f	Merge pull request #5116 from martin-frbg/issue5110 Handle INCX=0 in ?NRM2	2025-02-09 23:17:20 +01:00
Martin Kroeker	7478c10268	Merge branch 'OpenMathLib:develop' into issue5110	2025-02-09 21:40:02 +01:00
Martin Kroeker	c54f5417cc	Merge pull request #5118 from martin-frbg/zrot_utestext Disable extended utests for CSROT/ZDROT that invoke undefined behavior	2025-02-09 21:39:30 +01:00
Martin Kroeker	57208b8bce	Disable tests with incx,incy=0 (undefined behavior)	2025-02-09 20:17:29 +01:00
Martin Kroeker	3a4a9b21eb	Disable tests with incx,incy=0 (undefined behavior)	2025-02-09 20:16:03 +01:00
Martin Kroeker	60d0be0e97	Update nrm2.c	2025-02-08 23:42:21 +01:00
Martin Kroeker	0fd5448b2c	Handle INCX=0	2025-02-08 19:33:05 +01:00

1 2 3 4 5 ...

9104 Commits