CDAC-SSDG
c17c19fbcf
Delete kernel/arm64/swap_kernel_c.c
2024-12-13 11:01:46 +05:30
CDAC-SSDG
f6416c0e37
Delete kernel/arm64/swap.c
2024-12-13 11:01:32 +05:30
CDAC-SSDG
3b7b74664c
Delete kernel/arm64/scal_kernel_sve.c
2024-12-13 11:01:03 +05:30
CDAC-SSDG
95a97012e8
Delete kernel/arm64/scal_kernel_c.c
2024-12-13 11:00:45 +05:30
CDAC-SSDG
5540f2121e
Delete kernel/arm64/scal.c
2024-12-13 11:00:12 +05:30
CDAC-SSDG
f62519cc87
Delete kernel/arm64/rot_kernel_sve.c
2024-12-13 10:59:35 +05:30
CDAC-SSDG
10857c9df4
Delete kernel/arm64/rot_kernel_c.c
2024-12-13 10:58:51 +05:30
CDAC-SSDG
b9f51a5cf7
Delete kernel/arm64/rot.c
2024-12-13 10:58:06 +05:30
Juliya32
3b2421cba0
Add files via upload
2024-10-30 14:23:42 +05:30
Juliya32
012fe4da36
Delete kernel/arm64/rot_kernel_sve.c
2024-10-30 14:23:15 +05:30
Juliya32
d90ee00f85
Delete kernel/arm64/rot_kernel_c.c
2024-10-30 14:22:51 +05:30
Juliya32
668e28adc4
Delete kernel/arm64/rot.c
2024-10-30 14:22:31 +05:30
SushilPratap04
fa880ab1cf
Update KERNEL.ARMV8SVE
...
updated KERNEL.ARMV8SVE for level 1 sve (swap, rot and scal) kernels.
2024-10-30 14:09:37 +05:30
SushilPratap04
7822ae9617
Added sve kernels for rot routine.
2024-10-30 14:05:21 +05:30
SushilPratap04
b8bc2a752e
Added sve optimized kernels for swap routine
2024-10-30 14:02:57 +05:30
CDAC-SSDG
0667cf6c92
Added optimized scal routine files
2024-10-30 14:01:09 +05:30
Ayappan Perumal
020cce1068
Fix build issues with gcc compiler as well
2024-10-23 04:24:06 -05:00
Ayappan Perumal
b6ec73e77c
Fix AIX build
2024-10-21 07:38:03 -05:00
Martin Kroeker
016bdb9b0b
Merge pull request #4946 from XiWeiGu/la64_omatcopy_lasx
...
LoongArch64: Opt somatcopy with LASX
2024-10-18 14:03:06 +02:00
Chip Kerchner
ab71a1edf2
Better VSX.
2024-10-17 08:25:02 -05:00
gxw
bb31bbef52
LoongArch64: Opt somatcopy_ct with LASX
2024-10-17 11:45:13 +00:00
gxw
b37129341b
LoongArch64: Opt somatcopy_cn with LASX
2024-10-17 11:27:55 +00:00
gxw
acf6cab304
LoongArch64: Opt somatcopy_rn with LASX
2024-10-17 09:50:02 +00:00
gxw
15edb441bf
LoongArch64: Opt somatcopy_rt with LASX
2024-10-17 09:15:42 +00:00
Chip Kerchner
36bd3eeddf
Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power).
2024-10-13 13:46:11 -05:00
Martin Kroeker
e52d9b4cf1
Merge pull request #4928 from austinpagan/czgemm_in_c
...
CGEMM & ZGEMM using C code, Power only, P10 only.
2024-10-09 20:26:21 +02:00
Gordon Fossum
0b7fb5c791
CGEMM & ZGEMM using C code.
2024-10-09 09:42:23 -05:00
Martin Kroeker
9783dd07ab
Rename KERNEL.LOONGSONGENERIC to KERNEL.LA64_GENERIC
2024-10-06 22:43:11 +02:00
Martin Kroeker
c9e92348a6
Handle inf/nan if dummy2 flag is set
2024-10-06 19:57:17 +02:00
Martin Kroeker
d714013ab9
change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds
2024-10-03 22:04:20 +02:00
Martin Kroeker
de421b7764
Merge pull request #4904 from XiWeiGu/la64_cross_cmake
...
LoongArch64: Enable cmake cross-compilation
2024-10-03 15:53:57 +02:00
gxw
30af9278dc
LoongArch64: Enable cmake cross-compilation
2024-09-29 10:13:30 +08:00
gxw
48698b2b1d
LoongArch64: Rename core
...
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Deeksha Goplani
4894c54055
Improve TN case with further unrolling
2024-09-02 22:22:49 +05:30
Martin Kroeker
e05d98d00a
expressly use fld.d/fst.d for floating point registers instead of LD/ST macros
2024-08-15 22:14:29 +02:00
Chip Kerchner
a0aeba631d
Merge branch 'develop' into betterPowerGEMVTail
2024-08-15 08:00:00 -05:00
Chip Kerchner
083faf7556
Merge branch 'develop' into betterPowerGEMVTail
2024-08-14 15:56:03 -05:00
Chip Kerchner
75472b830a
Merge branch 'develop' into betterPowerGEMVTail
2024-08-14 10:52:46 -05:00
Henry Chen
ef94b96530
Use ldc1 and sdc1 for the prologue and epilogue on LOONGSON3A
...
This fix is similar to
2d8064174c .
2024-08-14 18:05:11 +08:00
Martin Kroeker
7ca835a82c
address clang array overflow warning
2024-08-10 13:44:56 +02:00
Martin Kroeker
46e331a917
remove the unworkable GEMM3M restriction from GENERIC again
2024-08-07 19:41:10 +02:00
Martin Kroeker
ccc23338d7
have the dummy GEMM3M kernel at least forward to regular GEMM
2024-08-07 19:39:02 +02:00
Martin Kroeker
f1c9803f9a
add proper return statement
2024-08-04 00:14:31 +02:00
Martin Kroeker
60abcc3991
add proper return statement
2024-08-04 00:13:31 +02:00
Chip Kerchner
1a7b8c650d
Merge branch 'develop' into betterPowerGEMVTail
2024-08-01 14:59:12 -05:00
Martin Kroeker
9afd0c8afd
Merge pull request #4814 from Mousius/gemv-proxy
...
Forward GEMM to GEMV when one argument is actually a vector
2024-07-31 23:18:01 +02:00
Martin Kroeker
edbf093c98
Update zarch SCAL kernels to handle INF and NAN arguments ( #4829 )
...
* handle INF and NAN in input (for S/D only if DUMMY2 argument is set)
2024-07-31 19:45:15 +02:00
Chris Sidebottom
ba2e989c67
Add accumulators to AArch64 GEMV Kernels
...
This helps to reduce values going missing as we accumulate.
2024-07-31 13:09:14 +01:00
Martin Kroeker
a875304eb0
fix inverted conditional for NAN handling
2024-07-26 09:50:20 +02:00
Martin Kroeker
24acdd6bbb
correct offset
2024-07-26 09:49:24 +02:00