OpenBLAS

mirror of https://github.com/OpenMathLib/OpenBLAS synced 2026-06-05 00:17:12 +08:00

Author	SHA1	Message	Date
guoyuanplct	11ffc8680e	Format the code	2025-04-25 00:27:27 +08:00
guoyuanplct	7616c42095	Optimized RVV_ZVL256B Implementation of zgemv_n The implementation of zgemv_n using RVV_ZVL256B has been optimized. Compared to the previous implementation, it has achieved a 1.5x performance improvement.	2025-04-25 00:05:15 +08:00
Martin Kroeker	dd38b4e811	Merge pull request #5225 from annop-w/gemv_n Improve performance for SGEMVN on NEONVERSEN1	2025-04-17 01:54:10 -07:00
Martin Kroeker	0241d516f6	Merge pull request #5220 from iha-taisei/sdgemv_n_unroll Further performance improvements to non-transposed [SD]GEMV kernels for A64FX and Neoverse V1.	2025-04-16 12:55:55 -07:00
Annop Wongwathanarat	d535728803	Improve performance for SGEMVN on NEONVERSEN1	2025-04-16 09:54:30 +00:00
Usui, Tetsuzo	d711906e3e	Add symv kernels for arm64	2025-04-11 20:39:52 +09:00
Iha, Taisei	f1e628b889	Further performance improvements to [SD]GEMV.	2025-04-11 20:00:33 +09:00
Martin Kroeker	b30dc9701f	Merge pull request #5215 from annop-w/gemv_t Use SVE kernel for S/DGEMVT for SVE machines	2025-04-10 13:06:07 -07:00
Martin Kroeker	2893d0add4	Merge pull request #5211 from guoyuanplct/develop Optimizing the Implementation of GEMV on the RISC-V V Extension	2025-04-10 09:43:03 -07:00
Annop Wongwathanarat	ec146157d3	Use SVE kernel for S/DGEMVT for SVE machines	2025-04-09 20:38:14 +00:00
Martin Kroeker	70865a894e	Merge pull request #5180 from ywwry66/openmp_use_cmake CMake: Pass `OpenMP` compiler and linker flags through CMake targets	2025-04-08 13:16:07 -07:00
lglglglgy	1ff303f36e	Optimizing the Implementation of GEMV on the RISC-V V Extension Specialized some scenarios, performed loop unrolling, and reduced the number of multiplications.	2025-04-08 21:18:00 +08:00
ColumbusAI	7bf848454d	Update zsum.c -- fixed spelling error to successfully compile spelling error where zsum_kernel is used and it should be zasum_kernel. Will not compile without fix.	2025-04-05 09:57:53 -07:00
Egbert Eich	ea6515c4b3	On zarch don't produce objects from assembler with a writable stack section On z-series, the current version of the GNU toolchain produces warnings such as: ``` /usr/lib64/gcc/[...]/s390x-suse-linux/bin/ld: warning: ztrmm_kernel_RC_Z14.o: missing .note.GNU-stack section implies executable stack /usr/lib64/[...]/s390x-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker ``` To prevent this message and make sure we are future proof, add ``` .section .note.GNU-stack,"",@progbits ``` Also add the `.size` bit to give the asm defined functions a proper size in the symbol table. Signed-off-by: Egbert Eich <eich@suse.com>	2025-03-28 18:47:48 +01:00
Ruiyang Wu	02fd1df10b	CMake: Pass `OpenMP` compiler and linker flags through CMake targets Using `OpenMP::OpenMP_LANG` targets for CMake is less error-prone than passing the compiler and linker flags manually. Furthermore, it allows the user to customize those flags by setting `OpenMP_LANG_FLAGS`, `OpenMP_LANG_LIB_NAMES`, and `OpenMP_omp_LIBRARY`.	2025-03-26 23:09:54 -04:00
Ye Tao	f27ba5efd1	fix bugs in aarch64 sbgemv_n kernel	2025-03-14 17:55:40 +00:00
Annop Wongwathanarat	edef2e4441	Fix bug in ARM64 sbgemv_t	2025-03-13 20:55:31 +00:00
Martin Kroeker	b55ca71d5b	Merge pull request #5182 from annop-w/sgemm_ncopy Optimize aarch64 sgemm_ncopy	2025-03-13 16:04:39 +01:00
Martin Kroeker	2f778554b8	Merge pull request #5181 from taoye9/change_sbgemn_cast_bf16 replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16	2025-03-13 13:50:26 +01:00
Annop Wongwathanarat	9807f56580	Optimize aarch64 sgemm_ncopy	2025-03-13 10:17:43 +00:00
Martin Kroeker	a3e7b16072	Merge pull request #5157 from manaalmj/feature Optimize gemv_n_sve kernel	2025-03-12 21:08:23 +01:00
Ye Tao	4c00099ed6	replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16	2025-03-12 16:20:15 +00:00
Annop Wongwathanarat	a085b6c9ec	Fix aarch64 sbgemv_t compilation error for GCC < 13	2025-03-12 14:52:42 +00:00
manjam01	5c4e38ab17	Optimize gemv_n_sve kernel	2025-03-10 16:39:20 +00:00
Martin Kroeker	1d5ed5c46b	Merge pull request #5168 from taoye9/add_sbgemvn_on_neonversen2 Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2	2025-03-04 16:39:22 +01:00
Ye Tao	6b8b35cdf2	fix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c	2025-03-03 11:55:27 +00:00
Ye Tao	38ee7c9301	Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2	2025-03-03 11:32:05 +00:00
Martin Kroeker	2b941c44b5	Merge branch 'develop' into sbgemv_n_neon	2025-03-02 22:39:32 +01:00
Ye Tao	35bdbca153	Add sbgemv_n_neon kernel for arm64.	2025-02-28 14:37:06 +00:00
Annop Wongwathanarat	edaf51dd99	Add sbgemv_t_bfdot kernel for ARM64 This improves performance for sbgemv_t by up to 100x on NEOVERSEV1. The geometric mean speedup is ~61x for M=N=[2,512].	2025-02-28 12:31:50 +00:00
Martin Kroeker	77fba0f400	Fix "dummy2" flag handling	2025-02-22 20:09:21 +01:00
Martin Kroeker	eb84aac7ad	Merge pull request #5084 from quic/topic/sgemm_direct_sme1 Support for SGEMM_DIRECT Kernel based on SME1	2025-02-19 10:56:49 +01:00
Martin Kroeker	b9ae246f20	define USE_TRMM for RISCV64 targets as well	2025-02-16 23:18:04 +01:00
Vaisakh K V	f66ca05b31	Merge branch 'develop' into topic/sgemm_direct_sme1	2025-02-13 14:54:37 +05:30
Vaisakh K V	d23eb3b93e	Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API * Added ARMV9SME target * Added SGEMM_DIRECT kernel based on SME1	2025-02-13 14:51:21 +05:30
Martin Kroeker	8d487ef6eb	Merge pull request #5124 from XiWeiGu/LoongArch64-LA264-lapack-fixed LoongArch64: Fixed lapack test for LA264	2025-02-12 14:58:30 +01:00
Martin Kroeker	81eed868b6	Restore the non-vectorized code from before PR4880 for POWER8	2025-02-12 09:07:20 +01:00
Martin Kroeker	98b5ef929c	Restore the non-vectorized code from before PR4880 for POWER8	2025-02-12 09:04:22 +01:00
gxw	2c4a5cc6e6	LoongArch64: Fixed snrm2_lsx.S and cnrm2_lsx.S When the data type is single-precision real or single-precision complex, converting it to double precision does not prevent overflow (as exposed in LAPACK tests). The only solution is to follow C's approach: find the maximum value in the array and divide each element by that maximum to avoid this issue	2025-02-12 15:48:01 +08:00
gxw	9e75d6b3d1	LoongArch64: Fixed swap_lsx.S Fixed the error when the stride is zero	2025-02-12 14:57:35 +08:00
gxw	e8c740368c	LoongArch64: Fixed rot_lsx.S ane crot_lsx.S Do not check whether the input parameters c and s are zero, as this may cause errors with special values (same as scal). Although OpenBLAS's own test suite doesn't catch this, it will cause LAPACK test cases to fail.	2025-02-12 14:52:49 +08:00
Hao Chen	c2212d0abd	LoongArch64: Fixed copy_lsx.S Fixed incorrect store operation Signed-off-by: gxw <guxiwei-hf@loongson.cn>	2025-02-12 14:52:20 +08:00
Hao Chen	7f1ebc7ae6	LoongArch64: Fixed iamax_lsx.S Fixed index retrieval issue when there are identical maximum absolute values Signed-off-by: Hao Chen <chenhao@loongson.cn> Signed-off-by: gxw <guxiwei-hf@loongson.cn>	2025-02-12 14:44:44 +08:00
Hao Chen	31d326f895	LoongArch64: Fixed dot_lsx.S Fixed incorrect register usage in instructions Signed-off-by: gxw <guxiwei-hf@loongson.cn>	2025-02-12 14:44:11 +08:00
Hao Chen	5d6356bc16	LoongArch64: Fixed amax_lsx.S Fixed register zeroing operation Signed-off-by: Hao Chen <chenhao@loongson.cn> Signed-off-by: gxw <guxiwei-hf@loongson.cn>	2025-02-12 14:39:29 +08:00
Ye Tao	c748e6a338	optimized sbgemm kernel for neoverse-v1 (sve-256) Signed-off-by: Ye Tao <ye.tao@arm.com>	2025-02-05 10:06:37 +00:00
Aditya Tewari	4379a6fbe3	* checkpoint sbgemm for SVE-256	2025-02-03 12:49:49 +00:00
Martin Kroeker	d7036cfd74	Remove trailing blanks that break the cmake parser	2025-01-27 09:32:17 +01:00
Martin Kroeker	6e393a5599	Merge branch 'develop' into gemv_t	2025-01-25 12:54:04 +01:00
Martin Kroeker	876ba58e28	Merge pull request #5091 from goplanid/develop Small gemm kernel improvements for AArch64	2025-01-24 10:59:16 +01:00

1 2 3 4 5 ...

2444 Commits