Martin Kroeker
|
d74eb02954
|
Merge pull request #5057 from martin-frbg/issue5050
Replace while loop in generic C/ZGEMM_BETA to avoid going out of bounds
|
2025-01-11 11:33:56 -08:00 |
|
Martin Kroeker
|
30f7a4120b
|
Merge pull request #5056 from tingboliao/dev_omatcopy_20250108
Optimize the omatcopy_cn/zomatcopy_cn kernels with RVV 1.0 intrinsic.
|
2025-01-11 09:42:57 -08:00 |
|
gxw
|
20a8e48f25
|
LoongArch64: Update ssymv LASX version
|
2025-01-10 16:02:54 +08:00 |
|
gxw
|
e0748588b8
|
LoongArch64: Update dsymv LASX version
|
2025-01-10 14:52:57 +08:00 |
|
Martin Kroeker
|
d91d4fa6e9
|
convert the beta=0 branch to a for loop as well
|
2025-01-09 23:11:26 +01:00 |
|
Martin Kroeker
|
09e75f1588
|
fix absurd typo
|
2025-01-09 00:52:14 +01:00 |
|
Martin Kroeker
|
2891fd8d6d
|
Replace while loop with for
|
2025-01-08 23:17:45 +01:00 |
|
tingbo.liao
|
0a5dbf13d3
|
Optimize the omatcopy_cn and zomatcopy_cn kernels with RVV 1.0 intrinsic.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
|
2025-01-08 11:00:35 +08:00 |
|
Sergey Fedorov
|
229efa42ff
|
scal.S: use r11 on 32-bit Darwin on powerpc
|
2025-01-05 00:31:27 +08:00 |
|
Sergey Fedorov
|
81e1be8d90
|
Revert "temporarily disable the default S/DSCAL kernel"
This reverts commit 9b9c0aa5c9.
|
2025-01-04 22:54:54 +08:00 |
|
Martin Kroeker
|
9b9c0aa5c9
|
temporarily disable the default S/DSCAL kernel
|
2025-01-03 21:36:46 +01:00 |
|
tingbo.liao
|
c37509c213
|
Optimize the nrm2_rvv function to further improve performance.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
|
2024-12-31 08:46:55 +08:00 |
|
tingbo.liao
|
0bea1cfd9d
|
Optimize the zgemm_tcopy_4_rvv function to be compatible with the situations where the vector lengths(vlens) are 128 and 256.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
|
2024-12-24 10:33:27 +08:00 |
|
tingbo.liao
|
d00cc400b1
|
Replaced the __riscv_vid_v_i32m2 and __riscv_vid_v_i64m2 with __riscv_vid_v_u32m2 and __riscv_vid_v_u64m2 for riscv64-unknown-linux-gnu-gcc compiling.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
|
2024-12-18 08:38:30 +08:00 |
|
Martin Kroeker
|
229d8a025e
|
Merge pull request #4959 from CDAC-Bengaluru/level-1-sve
SVE Implementation for Level-1 BLAS Routines
|
2024-12-13 05:20:51 -08:00 |
|
SushilPratap04
|
3368a4e697
|
Update swap_kernel_sve.c
|
2024-12-13 16:47:58 +05:30 |
|
CDAC-SSDG
|
dd71e4234a
|
Added Updated swap and rot sve kernels.
|
2024-12-13 11:15:29 +05:30 |
|
CDAC-SSDG
|
06ffd411a5
|
Update KERNEL.ARMV8SVE
|
2024-12-13 11:05:47 +05:30 |
|
CDAC-SSDG
|
765850194e
|
Delete kernel/arm64/swap_kernel_sve.c
|
2024-12-13 11:02:01 +05:30 |
|
CDAC-SSDG
|
c17c19fbcf
|
Delete kernel/arm64/swap_kernel_c.c
|
2024-12-13 11:01:46 +05:30 |
|
CDAC-SSDG
|
f6416c0e37
|
Delete kernel/arm64/swap.c
|
2024-12-13 11:01:32 +05:30 |
|
CDAC-SSDG
|
3b7b74664c
|
Delete kernel/arm64/scal_kernel_sve.c
|
2024-12-13 11:01:03 +05:30 |
|
CDAC-SSDG
|
95a97012e8
|
Delete kernel/arm64/scal_kernel_c.c
|
2024-12-13 11:00:45 +05:30 |
|
CDAC-SSDG
|
5540f2121e
|
Delete kernel/arm64/scal.c
|
2024-12-13 11:00:12 +05:30 |
|
CDAC-SSDG
|
f62519cc87
|
Delete kernel/arm64/rot_kernel_sve.c
|
2024-12-13 10:59:35 +05:30 |
|
CDAC-SSDG
|
10857c9df4
|
Delete kernel/arm64/rot_kernel_c.c
|
2024-12-13 10:58:51 +05:30 |
|
CDAC-SSDG
|
b9f51a5cf7
|
Delete kernel/arm64/rot.c
|
2024-12-13 10:58:06 +05:30 |
|
Martin Kroeker
|
81666de4ef
|
Merge pull request #5007 from martin-frbg/issue5006
Revert the NRM2 kernels for NeoverseN2 and ARMV8SVE targets to the generic NEON version
|
2024-12-05 14:43:03 -08:00 |
|
Martin Kroeker
|
3345007d8f
|
retire the thunderx2 NRM2 kernels due to reported inaccuracies and NAN
|
2024-12-05 21:12:06 +01:00 |
|
Martin Kroeker
|
5fe983db29
|
retire the thunderx2 nrm2 kernels for now due to NAN and inaccuracies
|
2024-12-05 21:09:53 +01:00 |
|
Iha, Taisei
|
4918beecbe
|
Loop-unrolled transposed [SD]GEMV kernels for A64FX and Neoverse V1
|
2024-12-02 18:46:00 +09:00 |
|
Juliya32
|
3b2421cba0
|
Add files via upload
|
2024-10-30 14:23:42 +05:30 |
|
Juliya32
|
012fe4da36
|
Delete kernel/arm64/rot_kernel_sve.c
|
2024-10-30 14:23:15 +05:30 |
|
Juliya32
|
d90ee00f85
|
Delete kernel/arm64/rot_kernel_c.c
|
2024-10-30 14:22:51 +05:30 |
|
Juliya32
|
668e28adc4
|
Delete kernel/arm64/rot.c
|
2024-10-30 14:22:31 +05:30 |
|
SushilPratap04
|
fa880ab1cf
|
Update KERNEL.ARMV8SVE
updated KERNEL.ARMV8SVE for level 1 sve (swap, rot and scal) kernels.
|
2024-10-30 14:09:37 +05:30 |
|
SushilPratap04
|
7822ae9617
|
Added sve kernels for rot routine.
|
2024-10-30 14:05:21 +05:30 |
|
SushilPratap04
|
b8bc2a752e
|
Added sve optimized kernels for swap routine
|
2024-10-30 14:02:57 +05:30 |
|
CDAC-SSDG
|
0667cf6c92
|
Added optimized scal routine files
|
2024-10-30 14:01:09 +05:30 |
|
gxw
|
73c6a28073
|
x86_64: opt somatcopy_ct with AVX
|
2024-10-29 07:06:15 +00:00 |
|
Ayappan Perumal
|
020cce1068
|
Fix build issues with gcc compiler as well
|
2024-10-23 04:24:06 -05:00 |
|
Ayappan Perumal
|
b6ec73e77c
|
Fix AIX build
|
2024-10-21 07:38:03 -05:00 |
|
Martin Kroeker
|
016bdb9b0b
|
Merge pull request #4946 from XiWeiGu/la64_omatcopy_lasx
LoongArch64: Opt somatcopy with LASX
|
2024-10-18 14:03:06 +02:00 |
|
Chip Kerchner
|
ab71a1edf2
|
Better VSX.
|
2024-10-17 08:25:02 -05:00 |
|
gxw
|
bb31bbef52
|
LoongArch64: Opt somatcopy_ct with LASX
|
2024-10-17 11:45:13 +00:00 |
|
gxw
|
b37129341b
|
LoongArch64: Opt somatcopy_cn with LASX
|
2024-10-17 11:27:55 +00:00 |
|
gxw
|
acf6cab304
|
LoongArch64: Opt somatcopy_rn with LASX
|
2024-10-17 09:50:02 +00:00 |
|
gxw
|
15edb441bf
|
LoongArch64: Opt somatcopy_rt with LASX
|
2024-10-17 09:15:42 +00:00 |
|
Chip Kerchner
|
36bd3eeddf
|
Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power).
|
2024-10-13 13:46:11 -05:00 |
|
Martin Kroeker
|
e52d9b4cf1
|
Merge pull request #4928 from austinpagan/czgemm_in_c
CGEMM & ZGEMM using C code, Power only, P10 only.
|
2024-10-09 20:26:21 +02:00 |
|