OpenBLAS

mirror of https://github.com/OpenMathLib/OpenBLAS synced 2026-05-31 00:45:48 +08:00

Author	SHA1	Message	Date
manjam01	5c4e38ab17	Optimize gemv_n_sve kernel	2025-03-10 16:39:20 +00:00
Martin Kroeker	77fba0f400	Fix "dummy2" flag handling	2025-02-22 20:09:21 +01:00
Martin Kroeker	eb84aac7ad	Merge pull request #5084 from quic/topic/sgemm_direct_sme1 Support for SGEMM_DIRECT Kernel based on SME1	2025-02-19 10:56:49 +01:00
Martin Kroeker	b9ae246f20	define USE_TRMM for RISCV64 targets as well	2025-02-16 23:18:04 +01:00
Vaisakh K V	f66ca05b31	Merge branch 'develop' into topic/sgemm_direct_sme1	2025-02-13 14:54:37 +05:30
Vaisakh K V	d23eb3b93e	Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API * Added ARMV9SME target * Added SGEMM_DIRECT kernel based on SME1	2025-02-13 14:51:21 +05:30
Martin Kroeker	8d487ef6eb	Merge pull request #5124 from XiWeiGu/LoongArch64-LA264-lapack-fixed LoongArch64: Fixed lapack test for LA264	2025-02-12 14:58:30 +01:00
Martin Kroeker	81eed868b6	Restore the non-vectorized code from before PR4880 for POWER8	2025-02-12 09:07:20 +01:00
Martin Kroeker	98b5ef929c	Restore the non-vectorized code from before PR4880 for POWER8	2025-02-12 09:04:22 +01:00
gxw	2c4a5cc6e6	LoongArch64: Fixed snrm2_lsx.S and cnrm2_lsx.S When the data type is single-precision real or single-precision complex, converting it to double precision does not prevent overflow (as exposed in LAPACK tests). The only solution is to follow C's approach: find the maximum value in the array and divide each element by that maximum to avoid this issue	2025-02-12 15:48:01 +08:00
gxw	9e75d6b3d1	LoongArch64: Fixed swap_lsx.S Fixed the error when the stride is zero	2025-02-12 14:57:35 +08:00
gxw	e8c740368c	LoongArch64: Fixed rot_lsx.S ane crot_lsx.S Do not check whether the input parameters c and s are zero, as this may cause errors with special values (same as scal). Although OpenBLAS's own test suite doesn't catch this, it will cause LAPACK test cases to fail.	2025-02-12 14:52:49 +08:00
Hao Chen	c2212d0abd	LoongArch64: Fixed copy_lsx.S Fixed incorrect store operation Signed-off-by: gxw <guxiwei-hf@loongson.cn>	2025-02-12 14:52:20 +08:00
Hao Chen	7f1ebc7ae6	LoongArch64: Fixed iamax_lsx.S Fixed index retrieval issue when there are identical maximum absolute values Signed-off-by: Hao Chen <chenhao@loongson.cn> Signed-off-by: gxw <guxiwei-hf@loongson.cn>	2025-02-12 14:44:44 +08:00
Hao Chen	31d326f895	LoongArch64: Fixed dot_lsx.S Fixed incorrect register usage in instructions Signed-off-by: gxw <guxiwei-hf@loongson.cn>	2025-02-12 14:44:11 +08:00
Hao Chen	5d6356bc16	LoongArch64: Fixed amax_lsx.S Fixed register zeroing operation Signed-off-by: Hao Chen <chenhao@loongson.cn> Signed-off-by: gxw <guxiwei-hf@loongson.cn>	2025-02-12 14:39:29 +08:00
Ye Tao	c748e6a338	optimized sbgemm kernel for neoverse-v1 (sve-256) Signed-off-by: Ye Tao <ye.tao@arm.com>	2025-02-05 10:06:37 +00:00
Aditya Tewari	4379a6fbe3	* checkpoint sbgemm for SVE-256	2025-02-03 12:49:49 +00:00
Martin Kroeker	d7036cfd74	Remove trailing blanks that break the cmake parser	2025-01-27 09:32:17 +01:00
Martin Kroeker	6e393a5599	Merge branch 'develop' into gemv_t	2025-01-25 12:54:04 +01:00
Martin Kroeker	876ba58e28	Merge pull request #5091 from goplanid/develop Small gemm kernel improvements for AArch64	2025-01-24 10:59:16 +01:00
Martin Kroeker	180ba5e7d0	Merge pull request #5069 from tingboliao/dev_rotm_20250107 Further rearranged the rotm kernel for the different architectures.	2025-01-23 10:16:43 +01:00
Deeksha Goplani	d1bfa979f7	small gemm kernel packing modifications	2025-01-23 09:41:45 +05:30
Martin Kroeker	1a6a9fb22f	add another generator line for rotm	2025-01-23 00:17:04 +01:00
Martin Kroeker	4924319c50	fix position of srotm, qrotm	2025-01-22 16:07:35 +01:00
Martin Kroeker	b58cba9eb6	fix qrotm build rules	2025-01-22 15:51:49 +01:00
tingbo.liao	3c8df6358f	Further rearranged the rotm kernel for the different architectures. Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>	2025-01-22 11:41:12 +08:00
Annop Wongwathanarat	c0318cea6e	Simplify gemv_t_sve_v1x3 kernel	2025-01-21 13:40:17 +00:00
Martin Kroeker	87083fdbf6	[WIP] Work around assembler limitations in current LLVM for Windows on Arm (#5076 ) * Protect align directives in assembly files that are currently problematic with LLVM on WoA * use the armv8 zdot on WoA to work around other LLVM issues	2025-01-18 16:45:56 +01:00
tingbo.liao	ef7f54b357	Optimized the gemm_tcopy_8_rvv to be compatible with the vlens 128 and 256. Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>	2025-01-15 11:31:28 +08:00
gxw	e0a8216554	LoongArch64: Update dsymv LSX version	2025-01-14 19:45:42 +08:00
gxw	a9070ba3f9	LoongArch64: Update ssymv LSX version	2025-01-14 09:06:59 +00:00
Xi Ruoyao	af10c132b8	LoongArch64: Fix dsymv and ssymv LASX version "fmov.d $f2, $f4" leaves all the bits higher than the 63-th bit unpredictable but it's obvious that the following code uses the value of those high bits. We actually want to replicate the lower 64 bits here, so we should use xvreplve0.d instead. LA464 (Loongson 3[A-Z]-5000) happens to replicate them for us due to some uarch internal details so the issue was not detected, but for LA664 (Loongson 3[A-Z]-6000) and future uarch we need to do things correctly or we end up getting a lot of test failures. Closes: https://bbs.aosc.io/t/topic/302 Signed-off-by: Xi Ruoyao <xry111@xry111.site>	2025-01-13 22:16:00 +08:00
Martin Kroeker	d74eb02954	Merge pull request #5057 from martin-frbg/issue5050 Replace while loop in generic C/ZGEMM_BETA to avoid going out of bounds	2025-01-11 11:33:56 -08:00
Martin Kroeker	30f7a4120b	Merge pull request #5056 from tingboliao/dev_omatcopy_20250108 Optimize the omatcopy_cn/zomatcopy_cn kernels with RVV 1.0 intrinsic.	2025-01-11 09:42:57 -08:00
gxw	20a8e48f25	LoongArch64: Update ssymv LASX version	2025-01-10 16:02:54 +08:00
gxw	e0748588b8	LoongArch64: Update dsymv LASX version	2025-01-10 14:52:57 +08:00
Martin Kroeker	d91d4fa6e9	convert the beta=0 branch to a for loop as well	2025-01-09 23:11:26 +01:00
Martin Kroeker	09e75f1588	fix absurd typo	2025-01-09 00:52:14 +01:00
Martin Kroeker	2891fd8d6d	Replace while loop with for	2025-01-08 23:17:45 +01:00
tingbo.liao	0a5dbf13d3	Optimize the omatcopy_cn and zomatcopy_cn kernels with RVV 1.0 intrinsic. Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>	2025-01-08 11:00:35 +08:00
Sergey Fedorov	229efa42ff	scal.S: use r11 on 32-bit Darwin on powerpc	2025-01-05 00:31:27 +08:00
Sergey Fedorov	81e1be8d90	Revert "temporarily disable the default S/DSCAL kernel" This reverts commit `9b9c0aa5c9`.	2025-01-04 22:54:54 +08:00
Martin Kroeker	9b9c0aa5c9	temporarily disable the default S/DSCAL kernel	2025-01-03 21:36:46 +01:00
tingbo.liao	c37509c213	Optimize the nrm2_rvv function to further improve performance. Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>	2024-12-31 08:46:55 +08:00
tingbo.liao	0bea1cfd9d	Optimize the zgemm_tcopy_4_rvv function to be compatible with the situations where the vector lengths(vlens) are 128 and 256. Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>	2024-12-24 10:33:27 +08:00
tingbo.liao	d00cc400b1	Replaced the __riscv_vid_v_i32m2 and __riscv_vid_v_i64m2 with __riscv_vid_v_u32m2 and __riscv_vid_v_u64m2 for riscv64-unknown-linux-gnu-gcc compiling. Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>	2024-12-18 08:38:30 +08:00
Martin Kroeker	229d8a025e	Merge pull request #4959 from CDAC-Bengaluru/level-1-sve SVE Implementation for Level-1 BLAS Routines	2024-12-13 05:20:51 -08:00
SushilPratap04	3368a4e697	Update swap_kernel_sve.c	2024-12-13 16:47:58 +05:30
CDAC-SSDG	dd71e4234a	Added Updated swap and rot sve kernels.	2024-12-13 11:15:29 +05:30

1 2 3 4 5 ...

2415 Commits