Commit Graph

2399 Commits

Author SHA1 Message Date
Ye Tao
c748e6a338 optimized sbgemm kernel for neoverse-v1 (sve-256)
Signed-off-by: Ye Tao <ye.tao@arm.com>
2025-02-05 10:06:37 +00:00
Aditya Tewari
4379a6fbe3 * checkpoint sbgemm for SVE-256 2025-02-03 12:49:49 +00:00
Martin Kroeker
d7036cfd74 Remove trailing blanks that break the cmake parser 2025-01-27 09:32:17 +01:00
Martin Kroeker
6e393a5599 Merge branch 'develop' into gemv_t 2025-01-25 12:54:04 +01:00
Martin Kroeker
876ba58e28 Merge pull request #5091 from goplanid/develop
Small gemm kernel improvements for AArch64
2025-01-24 10:59:16 +01:00
Martin Kroeker
180ba5e7d0 Merge pull request #5069 from tingboliao/dev_rotm_20250107
Further rearranged the rotm kernel for the different architectures.
2025-01-23 10:16:43 +01:00
Deeksha Goplani
d1bfa979f7 small gemm kernel packing modifications 2025-01-23 09:41:45 +05:30
Martin Kroeker
1a6a9fb22f add another generator line for rotm 2025-01-23 00:17:04 +01:00
Martin Kroeker
4924319c50 fix position of srotm, qrotm 2025-01-22 16:07:35 +01:00
Martin Kroeker
b58cba9eb6 fix qrotm build rules 2025-01-22 15:51:49 +01:00
tingbo.liao
3c8df6358f Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2025-01-22 11:41:12 +08:00
Annop Wongwathanarat
c0318cea6e Simplify gemv_t_sve_v1x3 kernel 2025-01-21 13:40:17 +00:00
Martin Kroeker
87083fdbf6 [WIP] Work around assembler limitations in current LLVM for Windows on Arm (#5076)
* Protect align directives in assembly files that are currently problematic with LLVM on WoA

* use the armv8 zdot on WoA to work around other LLVM issues
2025-01-18 16:45:56 +01:00
tingbo.liao
ef7f54b357 Optimized the gemm_tcopy_8_rvv to be compatible with the vlens 128 and 256.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2025-01-15 11:31:28 +08:00
gxw
e0a8216554 LoongArch64: Update dsymv LSX version 2025-01-14 19:45:42 +08:00
gxw
a9070ba3f9 LoongArch64: Update ssymv LSX version 2025-01-14 09:06:59 +00:00
Xi Ruoyao
af10c132b8 LoongArch64: Fix dsymv and ssymv LASX version
"fmov.d $f2, $f4" leaves all the bits higher than the 63-th bit
unpredictable but it's obvious that the following code uses the value of
those high bits.  We actually want to replicate the lower 64 bits here,
so we should use xvreplve0.d instead.

LA464 (Loongson 3[A-Z]-5000) happens to replicate them for us due to
some uarch internal details so the issue was not detected, but for LA664
(Loongson 3[A-Z]-6000) and future uarch we need to do things correctly
or we end up getting a lot of test failures.

Closes: https://bbs.aosc.io/t/topic/302
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2025-01-13 22:16:00 +08:00
Martin Kroeker
d74eb02954 Merge pull request #5057 from martin-frbg/issue5050
Replace while loop in generic C/ZGEMM_BETA to avoid going out of bounds
2025-01-11 11:33:56 -08:00
Martin Kroeker
30f7a4120b Merge pull request #5056 from tingboliao/dev_omatcopy_20250108
Optimize the omatcopy_cn/zomatcopy_cn kernels with RVV 1.0 intrinsic.
2025-01-11 09:42:57 -08:00
gxw
20a8e48f25 LoongArch64: Update ssymv LASX version 2025-01-10 16:02:54 +08:00
gxw
e0748588b8 LoongArch64: Update dsymv LASX version 2025-01-10 14:52:57 +08:00
Martin Kroeker
d91d4fa6e9 convert the beta=0 branch to a for loop as well 2025-01-09 23:11:26 +01:00
Martin Kroeker
09e75f1588 fix absurd typo 2025-01-09 00:52:14 +01:00
Martin Kroeker
2891fd8d6d Replace while loop with for 2025-01-08 23:17:45 +01:00
tingbo.liao
0a5dbf13d3 Optimize the omatcopy_cn and zomatcopy_cn kernels with RVV 1.0 intrinsic.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2025-01-08 11:00:35 +08:00
Sergey Fedorov
229efa42ff scal.S: use r11 on 32-bit Darwin on powerpc 2025-01-05 00:31:27 +08:00
Sergey Fedorov
81e1be8d90 Revert "temporarily disable the default S/DSCAL kernel"
This reverts commit 9b9c0aa5c9.
2025-01-04 22:54:54 +08:00
Martin Kroeker
9b9c0aa5c9 temporarily disable the default S/DSCAL kernel 2025-01-03 21:36:46 +01:00
tingbo.liao
c37509c213 Optimize the nrm2_rvv function to further improve performance.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2024-12-31 08:46:55 +08:00
tingbo.liao
0bea1cfd9d Optimize the zgemm_tcopy_4_rvv function to be compatible with the situations where the vector lengths(vlens) are 128 and 256.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2024-12-24 10:33:27 +08:00
tingbo.liao
d00cc400b1 Replaced the __riscv_vid_v_i32m2 and __riscv_vid_v_i64m2 with __riscv_vid_v_u32m2 and __riscv_vid_v_u64m2 for riscv64-unknown-linux-gnu-gcc compiling.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2024-12-18 08:38:30 +08:00
Martin Kroeker
229d8a025e Merge pull request #4959 from CDAC-Bengaluru/level-1-sve
SVE Implementation for Level-1 BLAS Routines
2024-12-13 05:20:51 -08:00
SushilPratap04
3368a4e697 Update swap_kernel_sve.c 2024-12-13 16:47:58 +05:30
CDAC-SSDG
dd71e4234a Added Updated swap and rot sve kernels. 2024-12-13 11:15:29 +05:30
CDAC-SSDG
06ffd411a5 Update KERNEL.ARMV8SVE 2024-12-13 11:05:47 +05:30
CDAC-SSDG
765850194e Delete kernel/arm64/swap_kernel_sve.c 2024-12-13 11:02:01 +05:30
CDAC-SSDG
c17c19fbcf Delete kernel/arm64/swap_kernel_c.c 2024-12-13 11:01:46 +05:30
CDAC-SSDG
f6416c0e37 Delete kernel/arm64/swap.c 2024-12-13 11:01:32 +05:30
CDAC-SSDG
3b7b74664c Delete kernel/arm64/scal_kernel_sve.c 2024-12-13 11:01:03 +05:30
CDAC-SSDG
95a97012e8 Delete kernel/arm64/scal_kernel_c.c 2024-12-13 11:00:45 +05:30
CDAC-SSDG
5540f2121e Delete kernel/arm64/scal.c 2024-12-13 11:00:12 +05:30
CDAC-SSDG
f62519cc87 Delete kernel/arm64/rot_kernel_sve.c 2024-12-13 10:59:35 +05:30
CDAC-SSDG
10857c9df4 Delete kernel/arm64/rot_kernel_c.c 2024-12-13 10:58:51 +05:30
CDAC-SSDG
b9f51a5cf7 Delete kernel/arm64/rot.c 2024-12-13 10:58:06 +05:30
Martin Kroeker
81666de4ef Merge pull request #5007 from martin-frbg/issue5006
Revert the NRM2 kernels for NeoverseN2 and ARMV8SVE targets to the generic NEON version
2024-12-05 14:43:03 -08:00
Martin Kroeker
3345007d8f retire the thunderx2 NRM2 kernels due to reported inaccuracies and NAN 2024-12-05 21:12:06 +01:00
Martin Kroeker
5fe983db29 retire the thunderx2 nrm2 kernels for now due to NAN and inaccuracies 2024-12-05 21:09:53 +01:00
Iha, Taisei
4918beecbe Loop-unrolled transposed [SD]GEMV kernels for A64FX and Neoverse V1 2024-12-02 18:46:00 +09:00
Juliya32
3b2421cba0 Add files via upload 2024-10-30 14:23:42 +05:30
Juliya32
012fe4da36 Delete kernel/arm64/rot_kernel_sve.c 2024-10-30 14:23:15 +05:30