Commit Graph

127 Commits

Author SHA1 Message Date
Chip Kerchner
03a83778bb Tie in SHGEMV for RISC-V. 2025-10-08 14:08:29 +00:00
Chip Kerchner
f552040c5d Fix stride issue. 2025-10-07 17:17:18 +00:00
Chip Kerchner
aecb7f9537 Change signature of SBGEMV. 2025-10-07 13:14:20 +00:00
Chip Kerchner
809e1cba8f Better FP16 vectorized GEMV - 20% faster. 2025-10-06 13:19:03 +00:00
Chip Kerchner
e07a9ae418 Merge branch 'develop' into vectorSBGEMV 2025-10-03 17:13:29 +00:00
Chip Kerchner
588f0e87cc Add SBGEMV and SHGEMV routines to RISC-V. 2025-10-03 17:09:16 +00:00
Chip Kerchner
36f9cb85b1 Fix pre-RVV 1.0. 2025-09-30 22:41:31 +00:00
Chip Kerchner
2d82d144e2 Tranverse matrix data in a cache friendly manner for GEMV_N (RISCV). 2025-09-30 21:22:10 +00:00
Chip Kerchner
07d0e742c2 Add vectorized packing for FP16 and BF16. Reactivate vector packing for FP64 transposed. 2025-09-26 14:50:38 +00:00
Chip Kerchner
92f09a6a98 Add BF16 sbgemm on RISCV. 2025-09-22 14:32:43 +00:00
Chip Kerchner
a4abf7828e Fix _Float16 casting issue and reduce LMUL for certain vector instruction from m2 to m1. 2025-09-18 21:30:22 +00:00
学习中的牛马
8b7e4c2b5c Merge branch 'OpenMathLib:develop' into develop 2025-09-15 12:08:17 +08:00
Dayuxiaoshui
2265318d3e Optimize RISC-V RVV omatcopy implementation with latest RVV API\n\nCo-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn> 2025-09-15 11:46:50 +08:00
yuanjia
826cb4588f remove unused variable 2025-09-13 11:35:49 +08:00
yuanjia
53d7452cdf riscv: gemv_t_vector.c optimize 2025-09-13 11:24:49 +08:00
Dayuxiaoshui
bd45b82ed0 Optimize RISC-V RVV omatcopy_ct implementation with advanced vectorization
- Implement block-based memory access optimization (64x64 blocks)
- Add 4-way loop unrolling to reduce loop overhead
- Optimize VSETVL calls to improve vectorization efficiency
- Add software prefetching for better memory access patterns
- Implement fast path for small matrices (<64x64)
- Add cross-compilation script for RISC-V testing
- Improve boundary handling with separate main/tail loops

Co-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn>
2025-09-11 20:01:39 +08:00
Dayuxiaoshui
708d586599 Add OMATCOPY_CT performance test with RVV optimization
Co-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn>
2025-09-11 19:20:26 +08:00
yuanjia
c2cc7a3602 riscv64: optimize gemv_t_vector.c 2025-08-22 16:14:14 +08:00
Martin Kroeker
9d6df1dd3e Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking
Add and use vectorized packing in ZVL128B and ZVL256B for RISCV
2025-08-16 13:45:35 -07:00
Chip Kerchner
64401b4417 Disable vectorized packing for DGEMM - since it is slower than scalar. 2025-08-13 13:41:12 +00:00
Chip Kerchner
c00afc86a6 Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions. 2025-08-12 17:18:56 +00:00
Chip Kerchner
72f082f31d Fix bad vector zero initializer and other compiler warnings for RISC-V. 2025-07-30 14:04:43 +00:00
Martin Kroeker
e2d941e9af Declare the "small" kernel static in addition to inline 2025-07-22 11:02:32 +02:00
Martin Kroeker
8214700930 Declare the "small" kernel static in addition to inline 2025-07-22 11:01:37 +02:00
Martin Kroeker
d96daa220d Merge pull request #5290 from Srangrang/develop
Add support for FP16 to openBLAS and shgemm on RISCV
2025-06-24 23:10:15 +02:00
Srangrang
ec14e1648c fix: resolve non-RISCV host build failed issue
- adjust interface to disable "small matrix" pathway
- separate HFLOAT16 from BFLOAT16
- remove SHGEMM_UNROLL_M and SHGEMM_UNROLL_N equal conditions

Related to PR#5290
Co-authored-by Martin
2025-06-15 20:25:15 +08:00
Martin Kroeker
73af02b89f use dummy2 as Inf/NAN handling flag 2025-06-12 13:33:56 -07:00
Martin Kroeker
f18b7a46bf add dummy2 flag handling for inf/nan agnostic zeroing 2025-06-11 01:47:43 -07:00
guoyuanplct
2ae019161a fixed the performance problem in RISCV64_ZVL256 when OPENBLAS_K is small 2025-06-05 21:53:03 +08:00
Srangrang
fb89820f20 Merge branch 'develop' of https://github.com/Srangrang/OpenBLAS into develop 2025-06-04 20:27:05 +08:00
Srangrang
4e1a381e5b fix: resolve the compilation failure without zfh instruction
- modify the macro conditions in Makefile.system
- Delete development test code

Related to issue#5279
2025-06-04 20:00:12 +08:00
gkdddd
670ec6f757 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B
Added HFLOAT16 support for RISCV64
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16
The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0

Related to issue #5279
Co-authored-by Linjin Li <linjin_li@163.com>
2025-06-03 20:14:30 +08:00
guoyuanplct
d2003dc886 del lines 2025-05-29 18:38:22 +08:00
guoyuanplct
45fd2d9b07 Optimized the axpby function. 2025-05-29 17:50:44 +08:00
Srangrang
2996c25c94 add shgemm for RISCV_ZVL128B 2025-05-24 23:55:49 +08:00
guoyuanplct
be9f7550b5 Format Code 2025-05-15 18:55:47 +08:00
guoyuanplct
4d213653d8 kernel/riscv64:Added support for omatcopy on riscv64. 2025-05-15 13:29:14 +08:00
guoyuanplct
9a7e3f102b kernel/riscv64:Fixed the bug of openblas_utest_ext failing in c/zgemv and some c/zgbmv tests: 2025-05-14 00:09:26 +08:00
guoyuanplct
11ffc8680e Format the code 2025-04-25 00:27:27 +08:00
guoyuanplct
7616c42095 Optimized RVV_ZVL256B Implementation of zgemv_n
The implementation of zgemv_n using RVV_ZVL256B has been optimized.
Compared to the previous implementation, it has achieved a 1.5x
performance improvement.
2025-04-25 00:05:15 +08:00
lglglglgy
1ff303f36e Optimizing the Implementation of GEMV on the RISC-V V Extension
Specialized some scenarios, performed loop unrolling, and reduced the
number of multiplications.
2025-04-08 21:18:00 +08:00
Martin Kroeker
180ba5e7d0 Merge pull request #5069 from tingboliao/dev_rotm_20250107
Further rearranged the rotm kernel for the different architectures.
2025-01-23 10:16:43 +01:00
tingbo.liao
3c8df6358f Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2025-01-22 11:41:12 +08:00
tingbo.liao
ef7f54b357 Optimized the gemm_tcopy_8_rvv to be compatible with the vlens 128 and 256.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2025-01-15 11:31:28 +08:00
tingbo.liao
0a5dbf13d3 Optimize the omatcopy_cn and zomatcopy_cn kernels with RVV 1.0 intrinsic.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2025-01-08 11:00:35 +08:00
tingbo.liao
c37509c213 Optimize the nrm2_rvv function to further improve performance.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2024-12-31 08:46:55 +08:00
tingbo.liao
0bea1cfd9d Optimize the zgemm_tcopy_4_rvv function to be compatible with the situations where the vector lengths(vlens) are 128 and 256.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2024-12-24 10:33:27 +08:00
tingbo.liao
d00cc400b1 Replaced the __riscv_vid_v_i32m2 and __riscv_vid_v_i64m2 with __riscv_vid_v_u32m2 and __riscv_vid_v_u64m2 for riscv64-unknown-linux-gnu-gcc compiling.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2024-12-18 08:38:30 +08:00
Martin Kroeker
a875304eb0 fix inverted conditional for NAN handling 2024-07-26 09:50:20 +02:00
Martin Kroeker
f5d04318e3 Merge branch 'OpenMathLib:develop' into scalfixes 2024-07-21 13:43:43 +02:00