Chip Kerchner
03a83778bb
Tie in SHGEMV for RISC-V.
2025-10-08 14:08:29 +00:00
Chip Kerchner
f552040c5d
Fix stride issue.
2025-10-07 17:17:18 +00:00
Chip Kerchner
aecb7f9537
Change signature of SBGEMV.
2025-10-07 13:14:20 +00:00
Chip Kerchner
809e1cba8f
Better FP16 vectorized GEMV - 20% faster.
2025-10-06 13:19:03 +00:00
Chip Kerchner
e07a9ae418
Merge branch 'develop' into vectorSBGEMV
2025-10-03 17:13:29 +00:00
Chip Kerchner
588f0e87cc
Add SBGEMV and SHGEMV routines to RISC-V.
2025-10-03 17:09:16 +00:00
Chip Kerchner
36f9cb85b1
Fix pre-RVV 1.0.
2025-09-30 22:41:31 +00:00
Chip Kerchner
2d82d144e2
Tranverse matrix data in a cache friendly manner for GEMV_N (RISCV).
2025-09-30 21:22:10 +00:00
Chip Kerchner
07d0e742c2
Add vectorized packing for FP16 and BF16. Reactivate vector packing for FP64 transposed.
2025-09-26 14:50:38 +00:00
Chip Kerchner
92f09a6a98
Add BF16 sbgemm on RISCV.
2025-09-22 14:32:43 +00:00
Chip Kerchner
a4abf7828e
Fix _Float16 casting issue and reduce LMUL for certain vector instruction from m2 to m1.
2025-09-18 21:30:22 +00:00
学习中的牛马
8b7e4c2b5c
Merge branch 'OpenMathLib:develop' into develop
2025-09-15 12:08:17 +08:00
Dayuxiaoshui
2265318d3e
Optimize RISC-V RVV omatcopy implementation with latest RVV API\n\nCo-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn>
2025-09-15 11:46:50 +08:00
yuanjia
826cb4588f
remove unused variable
2025-09-13 11:35:49 +08:00
yuanjia
53d7452cdf
riscv: gemv_t_vector.c optimize
2025-09-13 11:24:49 +08:00
Dayuxiaoshui
bd45b82ed0
Optimize RISC-V RVV omatcopy_ct implementation with advanced vectorization
...
- Implement block-based memory access optimization (64x64 blocks)
- Add 4-way loop unrolling to reduce loop overhead
- Optimize VSETVL calls to improve vectorization efficiency
- Add software prefetching for better memory access patterns
- Implement fast path for small matrices (<64x64)
- Add cross-compilation script for RISC-V testing
- Improve boundary handling with separate main/tail loops
Co-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn >
2025-09-11 20:01:39 +08:00
Dayuxiaoshui
708d586599
Add OMATCOPY_CT performance test with RVV optimization
...
Co-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn >
2025-09-11 19:20:26 +08:00
yuanjia
c2cc7a3602
riscv64: optimize gemv_t_vector.c
2025-08-22 16:14:14 +08:00
Martin Kroeker
9d6df1dd3e
Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking
...
Add and use vectorized packing in ZVL128B and ZVL256B for RISCV
2025-08-16 13:45:35 -07:00
Chip Kerchner
64401b4417
Disable vectorized packing for DGEMM - since it is slower than scalar.
2025-08-13 13:41:12 +00:00
Chip Kerchner
c00afc86a6
Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions.
2025-08-12 17:18:56 +00:00
Chip Kerchner
72f082f31d
Fix bad vector zero initializer and other compiler warnings for RISC-V.
2025-07-30 14:04:43 +00:00
Martin Kroeker
e2d941e9af
Declare the "small" kernel static in addition to inline
2025-07-22 11:02:32 +02:00
Martin Kroeker
8214700930
Declare the "small" kernel static in addition to inline
2025-07-22 11:01:37 +02:00
Martin Kroeker
d96daa220d
Merge pull request #5290 from Srangrang/develop
...
Add support for FP16 to openBLAS and shgemm on RISCV
2025-06-24 23:10:15 +02:00
Srangrang
ec14e1648c
fix: resolve non-RISCV host build failed issue
...
- adjust interface to disable "small matrix" pathway
- separate HFLOAT16 from BFLOAT16
- remove SHGEMM_UNROLL_M and SHGEMM_UNROLL_N equal conditions
Related to PR#5290
Co-authored-by Martin
2025-06-15 20:25:15 +08:00
Martin Kroeker
73af02b89f
use dummy2 as Inf/NAN handling flag
2025-06-12 13:33:56 -07:00
Martin Kroeker
f18b7a46bf
add dummy2 flag handling for inf/nan agnostic zeroing
2025-06-11 01:47:43 -07:00
guoyuanplct
2ae019161a
fixed the performance problem in RISCV64_ZVL256 when OPENBLAS_K is small
2025-06-05 21:53:03 +08:00
Srangrang
fb89820f20
Merge branch 'develop' of https://github.com/Srangrang/OpenBLAS into develop
2025-06-04 20:27:05 +08:00
Srangrang
4e1a381e5b
fix: resolve the compilation failure without zfh instruction
...
- modify the macro conditions in Makefile.system
- Delete development test code
Related to issue#5279
2025-06-04 20:00:12 +08:00
gkdddd
670ec6f757
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B
...
Added HFLOAT16 support for RISCV64
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16
The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0
Related to issue #5279
Co-authored-by Linjin Li <linjin_li@163.com >
2025-06-03 20:14:30 +08:00
guoyuanplct
d2003dc886
del lines
2025-05-29 18:38:22 +08:00
guoyuanplct
45fd2d9b07
Optimized the axpby function.
2025-05-29 17:50:44 +08:00
Srangrang
2996c25c94
add shgemm for RISCV_ZVL128B
2025-05-24 23:55:49 +08:00
guoyuanplct
be9f7550b5
Format Code
2025-05-15 18:55:47 +08:00
guoyuanplct
4d213653d8
kernel/riscv64:Added support for omatcopy on riscv64.
2025-05-15 13:29:14 +08:00
guoyuanplct
9a7e3f102b
kernel/riscv64:Fixed the bug of openblas_utest_ext failing in c/zgemv and some c/zgbmv tests:
2025-05-14 00:09:26 +08:00
guoyuanplct
11ffc8680e
Format the code
2025-04-25 00:27:27 +08:00
guoyuanplct
7616c42095
Optimized RVV_ZVL256B Implementation of zgemv_n
...
The implementation of zgemv_n using RVV_ZVL256B has been optimized.
Compared to the previous implementation, it has achieved a 1.5x
performance improvement.
2025-04-25 00:05:15 +08:00
lglglglgy
1ff303f36e
Optimizing the Implementation of GEMV on the RISC-V V Extension
...
Specialized some scenarios, performed loop unrolling, and reduced the
number of multiplications.
2025-04-08 21:18:00 +08:00
Martin Kroeker
180ba5e7d0
Merge pull request #5069 from tingboliao/dev_rotm_20250107
...
Further rearranged the rotm kernel for the different architectures.
2025-01-23 10:16:43 +01:00
tingbo.liao
3c8df6358f
Further rearranged the rotm kernel for the different architectures.
...
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com >
2025-01-22 11:41:12 +08:00
tingbo.liao
ef7f54b357
Optimized the gemm_tcopy_8_rvv to be compatible with the vlens 128 and 256.
...
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com >
2025-01-15 11:31:28 +08:00
tingbo.liao
0a5dbf13d3
Optimize the omatcopy_cn and zomatcopy_cn kernels with RVV 1.0 intrinsic.
...
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com >
2025-01-08 11:00:35 +08:00
tingbo.liao
c37509c213
Optimize the nrm2_rvv function to further improve performance.
...
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com >
2024-12-31 08:46:55 +08:00
tingbo.liao
0bea1cfd9d
Optimize the zgemm_tcopy_4_rvv function to be compatible with the situations where the vector lengths(vlens) are 128 and 256.
...
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com >
2024-12-24 10:33:27 +08:00
tingbo.liao
d00cc400b1
Replaced the __riscv_vid_v_i32m2 and __riscv_vid_v_i64m2 with __riscv_vid_v_u32m2 and __riscv_vid_v_u64m2 for riscv64-unknown-linux-gnu-gcc compiling.
...
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com >
2024-12-18 08:38:30 +08:00
Martin Kroeker
a875304eb0
fix inverted conditional for NAN handling
2024-07-26 09:50:20 +02:00
Martin Kroeker
f5d04318e3
Merge branch 'OpenMathLib:develop' into scalfixes
2024-07-21 13:43:43 +02:00