Commit Graph

2375 Commits

Author SHA1 Message Date
Martin Kroeker
2891fd8d6d Replace while loop with for 2025-01-08 23:17:45 +01:00
Sergey Fedorov
229efa42ff scal.S: use r11 on 32-bit Darwin on powerpc 2025-01-05 00:31:27 +08:00
Sergey Fedorov
81e1be8d90 Revert "temporarily disable the default S/DSCAL kernel"
This reverts commit 9b9c0aa5c9.
2025-01-04 22:54:54 +08:00
Martin Kroeker
9b9c0aa5c9 temporarily disable the default S/DSCAL kernel 2025-01-03 21:36:46 +01:00
tingbo.liao
c37509c213 Optimize the nrm2_rvv function to further improve performance.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2024-12-31 08:46:55 +08:00
tingbo.liao
0bea1cfd9d Optimize the zgemm_tcopy_4_rvv function to be compatible with the situations where the vector lengths(vlens) are 128 and 256.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2024-12-24 10:33:27 +08:00
tingbo.liao
d00cc400b1 Replaced the __riscv_vid_v_i32m2 and __riscv_vid_v_i64m2 with __riscv_vid_v_u32m2 and __riscv_vid_v_u64m2 for riscv64-unknown-linux-gnu-gcc compiling.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2024-12-18 08:38:30 +08:00
Martin Kroeker
229d8a025e Merge pull request #4959 from CDAC-Bengaluru/level-1-sve
SVE Implementation for Level-1 BLAS Routines
2024-12-13 05:20:51 -08:00
SushilPratap04
3368a4e697 Update swap_kernel_sve.c 2024-12-13 16:47:58 +05:30
CDAC-SSDG
dd71e4234a Added Updated swap and rot sve kernels. 2024-12-13 11:15:29 +05:30
CDAC-SSDG
06ffd411a5 Update KERNEL.ARMV8SVE 2024-12-13 11:05:47 +05:30
CDAC-SSDG
765850194e Delete kernel/arm64/swap_kernel_sve.c 2024-12-13 11:02:01 +05:30
CDAC-SSDG
c17c19fbcf Delete kernel/arm64/swap_kernel_c.c 2024-12-13 11:01:46 +05:30
CDAC-SSDG
f6416c0e37 Delete kernel/arm64/swap.c 2024-12-13 11:01:32 +05:30
CDAC-SSDG
3b7b74664c Delete kernel/arm64/scal_kernel_sve.c 2024-12-13 11:01:03 +05:30
CDAC-SSDG
95a97012e8 Delete kernel/arm64/scal_kernel_c.c 2024-12-13 11:00:45 +05:30
CDAC-SSDG
5540f2121e Delete kernel/arm64/scal.c 2024-12-13 11:00:12 +05:30
CDAC-SSDG
f62519cc87 Delete kernel/arm64/rot_kernel_sve.c 2024-12-13 10:59:35 +05:30
CDAC-SSDG
10857c9df4 Delete kernel/arm64/rot_kernel_c.c 2024-12-13 10:58:51 +05:30
CDAC-SSDG
b9f51a5cf7 Delete kernel/arm64/rot.c 2024-12-13 10:58:06 +05:30
Martin Kroeker
81666de4ef Merge pull request #5007 from martin-frbg/issue5006
Revert the NRM2 kernels for NeoverseN2 and ARMV8SVE targets to the generic NEON version
2024-12-05 14:43:03 -08:00
Martin Kroeker
3345007d8f retire the thunderx2 NRM2 kernels due to reported inaccuracies and NAN 2024-12-05 21:12:06 +01:00
Martin Kroeker
5fe983db29 retire the thunderx2 nrm2 kernels for now due to NAN and inaccuracies 2024-12-05 21:09:53 +01:00
Iha, Taisei
4918beecbe Loop-unrolled transposed [SD]GEMV kernels for A64FX and Neoverse V1 2024-12-02 18:46:00 +09:00
Juliya32
3b2421cba0 Add files via upload 2024-10-30 14:23:42 +05:30
Juliya32
012fe4da36 Delete kernel/arm64/rot_kernel_sve.c 2024-10-30 14:23:15 +05:30
Juliya32
d90ee00f85 Delete kernel/arm64/rot_kernel_c.c 2024-10-30 14:22:51 +05:30
Juliya32
668e28adc4 Delete kernel/arm64/rot.c 2024-10-30 14:22:31 +05:30
SushilPratap04
fa880ab1cf Update KERNEL.ARMV8SVE
updated KERNEL.ARMV8SVE for level 1 sve (swap, rot and scal) kernels.
2024-10-30 14:09:37 +05:30
SushilPratap04
7822ae9617 Added sve kernels for rot routine. 2024-10-30 14:05:21 +05:30
SushilPratap04
b8bc2a752e Added sve optimized kernels for swap routine 2024-10-30 14:02:57 +05:30
CDAC-SSDG
0667cf6c92 Added optimized scal routine files 2024-10-30 14:01:09 +05:30
gxw
73c6a28073 x86_64: opt somatcopy_ct with AVX 2024-10-29 07:06:15 +00:00
Ayappan Perumal
020cce1068 Fix build issues with gcc compiler as well 2024-10-23 04:24:06 -05:00
Ayappan Perumal
b6ec73e77c Fix AIX build 2024-10-21 07:38:03 -05:00
Martin Kroeker
016bdb9b0b Merge pull request #4946 from XiWeiGu/la64_omatcopy_lasx
LoongArch64: Opt somatcopy with LASX
2024-10-18 14:03:06 +02:00
Chip Kerchner
ab71a1edf2 Better VSX. 2024-10-17 08:25:02 -05:00
gxw
bb31bbef52 LoongArch64: Opt somatcopy_ct with LASX 2024-10-17 11:45:13 +00:00
gxw
b37129341b LoongArch64: Opt somatcopy_cn with LASX 2024-10-17 11:27:55 +00:00
gxw
acf6cab304 LoongArch64: Opt somatcopy_rn with LASX 2024-10-17 09:50:02 +00:00
gxw
15edb441bf LoongArch64: Opt somatcopy_rt with LASX 2024-10-17 09:15:42 +00:00
Chip Kerchner
36bd3eeddf Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power). 2024-10-13 13:46:11 -05:00
Martin Kroeker
e52d9b4cf1 Merge pull request #4928 from austinpagan/czgemm_in_c
CGEMM & ZGEMM using C code, Power only, P10 only.
2024-10-09 20:26:21 +02:00
Gordon Fossum
0b7fb5c791 CGEMM & ZGEMM using C code. 2024-10-09 09:42:23 -05:00
Martin Kroeker
9783dd07ab Rename KERNEL.LOONGSONGENERIC to KERNEL.LA64_GENERIC 2024-10-06 22:43:11 +02:00
Martin Kroeker
c9e92348a6 Handle inf/nan if dummy2 flag is set 2024-10-06 19:57:17 +02:00
Martin Kroeker
d714013ab9 change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds 2024-10-03 22:04:20 +02:00
Martin Kroeker
de421b7764 Merge pull request #4904 from XiWeiGu/la64_cross_cmake
LoongArch64: Enable cmake cross-compilation
2024-10-03 15:53:57 +02:00
gxw
30af9278dc LoongArch64: Enable cmake cross-compilation 2024-09-29 10:13:30 +08:00
gxw
48698b2b1d LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00