guoyuanplct
11ffc8680e
Format the code
2025-04-25 00:27:27 +08:00
guoyuanplct
7616c42095
Optimized RVV_ZVL256B Implementation of zgemv_n
...
The implementation of zgemv_n using RVV_ZVL256B has been optimized.
Compared to the previous implementation, it has achieved a 1.5x
performance improvement.
2025-04-25 00:05:15 +08:00
Martin Kroeker
dd38b4e811
Merge pull request #5225 from annop-w/gemv_n
...
Improve performance for SGEMVN on NEONVERSEN1
2025-04-17 01:54:10 -07:00
Martin Kroeker
0241d516f6
Merge pull request #5220 from iha-taisei/sdgemv_n_unroll
...
Further performance improvements to non-transposed [SD]GEMV kernels for A64FX and Neoverse V1.
2025-04-16 12:55:55 -07:00
Annop Wongwathanarat
d535728803
Improve performance for SGEMVN on NEONVERSEN1
2025-04-16 09:54:30 +00:00
Usui, Tetsuzo
d711906e3e
Add symv kernels for arm64
2025-04-11 20:39:52 +09:00
Iha, Taisei
f1e628b889
Further performance improvements to [SD]GEMV.
2025-04-11 20:00:33 +09:00
Martin Kroeker
b30dc9701f
Merge pull request #5215 from annop-w/gemv_t
...
Use SVE kernel for S/DGEMVT for SVE machines
2025-04-10 13:06:07 -07:00
Martin Kroeker
2893d0add4
Merge pull request #5211 from guoyuanplct/develop
...
Optimizing the Implementation of GEMV on the RISC-V V Extension
2025-04-10 09:43:03 -07:00
Annop Wongwathanarat
ec146157d3
Use SVE kernel for S/DGEMVT for SVE machines
2025-04-09 20:38:14 +00:00
Martin Kroeker
70865a894e
Merge pull request #5180 from ywwry66/openmp_use_cmake
...
CMake: Pass `OpenMP` compiler and linker flags through CMake targets
2025-04-08 13:16:07 -07:00
lglglglgy
1ff303f36e
Optimizing the Implementation of GEMV on the RISC-V V Extension
...
Specialized some scenarios, performed loop unrolling, and reduced the
number of multiplications.
2025-04-08 21:18:00 +08:00
ColumbusAI
7bf848454d
Update zsum.c -- fixed spelling error to successfully compile
...
spelling error where zsum_kernel is used and it should be zasum_kernel. Will not compile without fix.
2025-04-05 09:57:53 -07:00
Egbert Eich
ea6515c4b3
On zarch don't produce objects from assembler with a writable stack section
...
On z-series, the current version of the GNU toolchain produces warnings
such as:
```
/usr/lib64/gcc/[...]/s390x-suse-linux/bin/ld: warning: ztrmm_kernel_RC_Z14.o: missing .note.GNU-stack section implies
executable stack
/usr/lib64/[...]/s390x-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
```
To prevent this message and make sure we are future proof, add
```
.section .note.GNU-stack,"",@progbits
```
Also add the `.size` bit to give the asm defined functions a proper size
in the symbol table.
Signed-off-by: Egbert Eich <eich@suse.com >
2025-03-28 18:47:48 +01:00
Ruiyang Wu
02fd1df10b
CMake: Pass OpenMP compiler and linker flags through CMake targets
...
Using `OpenMP::OpenMP_LANG` targets for CMake is less error-prone than
passing the compiler and linker flags manually. Furthermore, it allows
the user to customize those flags by setting `OpenMP_LANG_FLAGS`,
`OpenMP_LANG_LIB_NAMES`, and `OpenMP_omp_LIBRARY`.
2025-03-26 23:09:54 -04:00
Ye Tao
f27ba5efd1
fix bugs in aarch64 sbgemv_n kernel
2025-03-14 17:55:40 +00:00
Annop Wongwathanarat
edef2e4441
Fix bug in ARM64 sbgemv_t
2025-03-13 20:55:31 +00:00
Martin Kroeker
b55ca71d5b
Merge pull request #5182 from annop-w/sgemm_ncopy
...
Optimize aarch64 sgemm_ncopy
2025-03-13 16:04:39 +01:00
Martin Kroeker
2f778554b8
Merge pull request #5181 from taoye9/change_sbgemn_cast_bf16
...
replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16
2025-03-13 13:50:26 +01:00
Annop Wongwathanarat
9807f56580
Optimize aarch64 sgemm_ncopy
2025-03-13 10:17:43 +00:00
Martin Kroeker
a3e7b16072
Merge pull request #5157 from manaalmj/feature
...
Optimize gemv_n_sve kernel
2025-03-12 21:08:23 +01:00
Ye Tao
4c00099ed6
replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16
2025-03-12 16:20:15 +00:00
Annop Wongwathanarat
a085b6c9ec
Fix aarch64 sbgemv_t compilation error for GCC < 13
2025-03-12 14:52:42 +00:00
manjam01
5c4e38ab17
Optimize gemv_n_sve kernel
2025-03-10 16:39:20 +00:00
Martin Kroeker
1d5ed5c46b
Merge pull request #5168 from taoye9/add_sbgemvn_on_neonversen2
...
Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
2025-03-04 16:39:22 +01:00
Ye Tao
6b8b35cdf2
fix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c
2025-03-03 11:55:27 +00:00
Ye Tao
38ee7c9301
Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
2025-03-03 11:32:05 +00:00
Martin Kroeker
2b941c44b5
Merge branch 'develop' into sbgemv_n_neon
2025-03-02 22:39:32 +01:00
Ye Tao
35bdbca153
Add sbgemv_n_neon kernel for arm64.
2025-02-28 14:37:06 +00:00
Annop Wongwathanarat
edaf51dd99
Add sbgemv_t_bfdot kernel for ARM64
...
This improves performance for sbgemv_t by up to 100x on NEOVERSEV1.
The geometric mean speedup is ~61x for M=N=[2,512].
2025-02-28 12:31:50 +00:00
Martin Kroeker
77fba0f400
Fix "dummy2" flag handling
2025-02-22 20:09:21 +01:00
Martin Kroeker
eb84aac7ad
Merge pull request #5084 from quic/topic/sgemm_direct_sme1
...
Support for SGEMM_DIRECT Kernel based on SME1
2025-02-19 10:56:49 +01:00
Martin Kroeker
b9ae246f20
define USE_TRMM for RISCV64 targets as well
2025-02-16 23:18:04 +01:00
Vaisakh K V
f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1
2025-02-13 14:54:37 +05:30
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
...
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
2025-02-13 14:51:21 +05:30
Martin Kroeker
8d487ef6eb
Merge pull request #5124 from XiWeiGu/LoongArch64-LA264-lapack-fixed
...
LoongArch64: Fixed lapack test for LA264
2025-02-12 14:58:30 +01:00
Martin Kroeker
81eed868b6
Restore the non-vectorized code from before PR4880 for POWER8
2025-02-12 09:07:20 +01:00
Martin Kroeker
98b5ef929c
Restore the non-vectorized code from before PR4880 for POWER8
2025-02-12 09:04:22 +01:00
gxw
2c4a5cc6e6
LoongArch64: Fixed snrm2_lsx.S and cnrm2_lsx.S
...
When the data type is single-precision real or single-precision complex,
converting it to double precision does not prevent overflow (as exposed in LAPACK tests).
The only solution is to follow C's approach: find the maximum value in the
array and divide each element by that maximum to avoid this issue
2025-02-12 15:48:01 +08:00
gxw
9e75d6b3d1
LoongArch64: Fixed swap_lsx.S
...
Fixed the error when the stride is zero
2025-02-12 14:57:35 +08:00
gxw
e8c740368c
LoongArch64: Fixed rot_lsx.S ane crot_lsx.S
...
Do not check whether the input parameters c and s are zero,
as this may cause errors with special values (same as scal).
Although OpenBLAS's own test suite doesn't catch this, it will
cause LAPACK test cases to fail.
2025-02-12 14:52:49 +08:00
Hao Chen
c2212d0abd
LoongArch64: Fixed copy_lsx.S
...
Fixed incorrect store operation
Signed-off-by: gxw <guxiwei-hf@loongson.cn >
2025-02-12 14:52:20 +08:00
Hao Chen
7f1ebc7ae6
LoongArch64: Fixed iamax_lsx.S
...
Fixed index retrieval issue when there are
identical maximum absolute values
Signed-off-by: Hao Chen <chenhao@loongson.cn >
Signed-off-by: gxw <guxiwei-hf@loongson.cn >
2025-02-12 14:44:44 +08:00
Hao Chen
31d326f895
LoongArch64: Fixed dot_lsx.S
...
Fixed incorrect register usage in instructions
Signed-off-by: gxw <guxiwei-hf@loongson.cn >
2025-02-12 14:44:11 +08:00
Hao Chen
5d6356bc16
LoongArch64: Fixed amax_lsx.S
...
Fixed register zeroing operation
Signed-off-by: Hao Chen <chenhao@loongson.cn >
Signed-off-by: gxw <guxiwei-hf@loongson.cn >
2025-02-12 14:39:29 +08:00
Ye Tao
c748e6a338
optimized sbgemm kernel for neoverse-v1 (sve-256)
...
Signed-off-by: Ye Tao <ye.tao@arm.com >
2025-02-05 10:06:37 +00:00
Aditya Tewari
4379a6fbe3
* checkpoint sbgemm for SVE-256
2025-02-03 12:49:49 +00:00
Martin Kroeker
d7036cfd74
Remove trailing blanks that break the cmake parser
2025-01-27 09:32:17 +01:00
Martin Kroeker
6e393a5599
Merge branch 'develop' into gemv_t
2025-01-25 12:54:04 +01:00
Martin Kroeker
876ba58e28
Merge pull request #5091 from goplanid/develop
...
Small gemm kernel improvements for AArch64
2025-01-24 10:59:16 +01:00