OpenBLAS

mirror of https://github.com/OpenMathLib/OpenBLAS synced 2026-06-05 00:17:12 +08:00

Author	SHA1	Message	Date
Chip Kerchner	03a83778bb	Tie in SHGEMV for RISC-V.	2025-10-08 14:08:29 +00:00
Chip Kerchner	f552040c5d	Fix stride issue.	2025-10-07 17:17:18 +00:00
Chip Kerchner	aecb7f9537	Change signature of SBGEMV.	2025-10-07 13:14:20 +00:00
Chip Kerchner	809e1cba8f	Better FP16 vectorized GEMV - 20% faster.	2025-10-06 13:19:03 +00:00
Chip Kerchner	e07a9ae418	Merge branch 'develop' into vectorSBGEMV	2025-10-03 17:13:29 +00:00
Chip Kerchner	588f0e87cc	Add SBGEMV and SHGEMV routines to RISC-V.	2025-10-03 17:09:16 +00:00
Chip Kerchner	36f9cb85b1	Fix pre-RVV 1.0.	2025-09-30 22:41:31 +00:00
Chip Kerchner	2d82d144e2	Tranverse matrix data in a cache friendly manner for GEMV_N (RISCV).	2025-09-30 21:22:10 +00:00
Chip Kerchner	07d0e742c2	Add vectorized packing for FP16 and BF16. Reactivate vector packing for FP64 transposed.	2025-09-26 14:50:38 +00:00
Chip Kerchner	92f09a6a98	Add BF16 sbgemm on RISCV.	2025-09-22 14:32:43 +00:00
Chip Kerchner	a4abf7828e	Fix _Float16 casting issue and reduce LMUL for certain vector instruction from m2 to m1.	2025-09-18 21:30:22 +00:00
学习中的牛马	8b7e4c2b5c	Merge branch 'OpenMathLib:develop' into develop	2025-09-15 12:08:17 +08:00
Dayuxiaoshui	2265318d3e	Optimize RISC-V RVV omatcopy implementation with latest RVV API\n\nCo-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn>	2025-09-15 11:46:50 +08:00
yuanjia	826cb4588f	remove unused variable	2025-09-13 11:35:49 +08:00
yuanjia	53d7452cdf	riscv: gemv_t_vector.c optimize	2025-09-13 11:24:49 +08:00
Dayuxiaoshui	bd45b82ed0	Optimize RISC-V RVV omatcopy_ct implementation with advanced vectorization - Implement block-based memory access optimization (64x64 blocks) - Add 4-way loop unrolling to reduce loop overhead - Optimize VSETVL calls to improve vectorization efficiency - Add software prefetching for better memory access patterns - Implement fast path for small matrices (<64x64) - Add cross-compilation script for RISC-V testing - Improve boundary handling with separate main/tail loops Co-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn>	2025-09-11 20:01:39 +08:00
Dayuxiaoshui	708d586599	Add OMATCOPY_CT performance test with RVV optimization Co-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn>	2025-09-11 19:20:26 +08:00
yuanjia	c2cc7a3602	riscv64: optimize gemv_t_vector.c	2025-08-22 16:14:14 +08:00
Martin Kroeker	9d6df1dd3e	Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking Add and use vectorized packing in ZVL128B and ZVL256B for RISCV	2025-08-16 13:45:35 -07:00
Chip Kerchner	64401b4417	Disable vectorized packing for DGEMM - since it is slower than scalar.	2025-08-13 13:41:12 +00:00
Chip Kerchner	c00afc86a6	Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions.	2025-08-12 17:18:56 +00:00
Chip Kerchner	72f082f31d	Fix bad vector zero initializer and other compiler warnings for RISC-V.	2025-07-30 14:04:43 +00:00
Martin Kroeker	e2d941e9af	Declare the "small" kernel static in addition to inline	2025-07-22 11:02:32 +02:00
Martin Kroeker	8214700930	Declare the "small" kernel static in addition to inline	2025-07-22 11:01:37 +02:00
Martin Kroeker	d96daa220d	Merge pull request #5290 from Srangrang/develop Add support for FP16 to openBLAS and shgemm on RISCV	2025-06-24 23:10:15 +02:00
Srangrang	ec14e1648c	fix: resolve non-RISCV host build failed issue - adjust interface to disable "small matrix" pathway - separate HFLOAT16 from BFLOAT16 - remove SHGEMM_UNROLL_M and SHGEMM_UNROLL_N equal conditions Related to PR#5290 Co-authored-by Martin	2025-06-15 20:25:15 +08:00
Martin Kroeker	73af02b89f	use dummy2 as Inf/NAN handling flag	2025-06-12 13:33:56 -07:00
Martin Kroeker	f18b7a46bf	add dummy2 flag handling for inf/nan agnostic zeroing	2025-06-11 01:47:43 -07:00
guoyuanplct	2ae019161a	fixed the performance problem in RISCV64_ZVL256 when OPENBLAS_K is small	2025-06-05 21:53:03 +08:00
Srangrang	fb89820f20	Merge branch 'develop' of https://github.com/Srangrang/OpenBLAS into develop	2025-06-04 20:27:05 +08:00
Srangrang	4e1a381e5b	fix: resolve the compilation failure without zfh instruction - modify the macro conditions in Makefile.system - Delete development test code Related to issue#5279	2025-06-04 20:00:12 +08:00
gkdddd	670ec6f757	Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B Added HFLOAT16 support for RISCV64 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16 The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0 Related to issue #5279 Co-authored-by Linjin Li <linjin_li@163.com>	2025-06-03 20:14:30 +08:00
guoyuanplct	d2003dc886	del lines	2025-05-29 18:38:22 +08:00
guoyuanplct	45fd2d9b07	Optimized the axpby function.	2025-05-29 17:50:44 +08:00
Srangrang	2996c25c94	add shgemm for RISCV_ZVL128B	2025-05-24 23:55:49 +08:00
guoyuanplct	be9f7550b5	Format Code	2025-05-15 18:55:47 +08:00
guoyuanplct	4d213653d8	kernel/riscv64:Added support for omatcopy on riscv64.	2025-05-15 13:29:14 +08:00
guoyuanplct	9a7e3f102b	kernel/riscv64:Fixed the bug of openblas_utest_ext failing in c/zgemv and some c/zgbmv tests:	2025-05-14 00:09:26 +08:00
guoyuanplct	11ffc8680e	Format the code	2025-04-25 00:27:27 +08:00
guoyuanplct	7616c42095	Optimized RVV_ZVL256B Implementation of zgemv_n The implementation of zgemv_n using RVV_ZVL256B has been optimized. Compared to the previous implementation, it has achieved a 1.5x performance improvement.	2025-04-25 00:05:15 +08:00
lglglglgy	1ff303f36e	Optimizing the Implementation of GEMV on the RISC-V V Extension Specialized some scenarios, performed loop unrolling, and reduced the number of multiplications.	2025-04-08 21:18:00 +08:00
Martin Kroeker	180ba5e7d0	Merge pull request #5069 from tingboliao/dev_rotm_20250107 Further rearranged the rotm kernel for the different architectures.	2025-01-23 10:16:43 +01:00
tingbo.liao	3c8df6358f	Further rearranged the rotm kernel for the different architectures. Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>	2025-01-22 11:41:12 +08:00
tingbo.liao	ef7f54b357	Optimized the gemm_tcopy_8_rvv to be compatible with the vlens 128 and 256. Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>	2025-01-15 11:31:28 +08:00
tingbo.liao	0a5dbf13d3	Optimize the omatcopy_cn and zomatcopy_cn kernels with RVV 1.0 intrinsic. Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>	2025-01-08 11:00:35 +08:00
tingbo.liao	c37509c213	Optimize the nrm2_rvv function to further improve performance. Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>	2024-12-31 08:46:55 +08:00
tingbo.liao	0bea1cfd9d	Optimize the zgemm_tcopy_4_rvv function to be compatible with the situations where the vector lengths(vlens) are 128 and 256. Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>	2024-12-24 10:33:27 +08:00
tingbo.liao	d00cc400b1	Replaced the __riscv_vid_v_i32m2 and __riscv_vid_v_i64m2 with __riscv_vid_v_u32m2 and __riscv_vid_v_u64m2 for riscv64-unknown-linux-gnu-gcc compiling. Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>	2024-12-18 08:38:30 +08:00
Martin Kroeker	a875304eb0	fix inverted conditional for NAN handling	2024-07-26 09:50:20 +02:00
Martin Kroeker	f5d04318e3	Merge branch 'OpenMathLib:develop' into scalfixes	2024-07-21 13:43:43 +02:00

1 2 3

127 Commits