Commit Graph

290 Commits

Author SHA1 Message Date
Martin Kroeker
229d8a025e Merge pull request #4959 from CDAC-Bengaluru/level-1-sve
SVE Implementation for Level-1 BLAS Routines
2024-12-13 05:20:51 -08:00
SushilPratap04
3368a4e697 Update swap_kernel_sve.c 2024-12-13 16:47:58 +05:30
CDAC-SSDG
dd71e4234a Added Updated swap and rot sve kernels. 2024-12-13 11:15:29 +05:30
CDAC-SSDG
06ffd411a5 Update KERNEL.ARMV8SVE 2024-12-13 11:05:47 +05:30
CDAC-SSDG
765850194e Delete kernel/arm64/swap_kernel_sve.c 2024-12-13 11:02:01 +05:30
CDAC-SSDG
c17c19fbcf Delete kernel/arm64/swap_kernel_c.c 2024-12-13 11:01:46 +05:30
CDAC-SSDG
f6416c0e37 Delete kernel/arm64/swap.c 2024-12-13 11:01:32 +05:30
CDAC-SSDG
3b7b74664c Delete kernel/arm64/scal_kernel_sve.c 2024-12-13 11:01:03 +05:30
CDAC-SSDG
95a97012e8 Delete kernel/arm64/scal_kernel_c.c 2024-12-13 11:00:45 +05:30
CDAC-SSDG
5540f2121e Delete kernel/arm64/scal.c 2024-12-13 11:00:12 +05:30
CDAC-SSDG
f62519cc87 Delete kernel/arm64/rot_kernel_sve.c 2024-12-13 10:59:35 +05:30
CDAC-SSDG
10857c9df4 Delete kernel/arm64/rot_kernel_c.c 2024-12-13 10:58:51 +05:30
CDAC-SSDG
b9f51a5cf7 Delete kernel/arm64/rot.c 2024-12-13 10:58:06 +05:30
Martin Kroeker
81666de4ef Merge pull request #5007 from martin-frbg/issue5006
Revert the NRM2 kernels for NeoverseN2 and ARMV8SVE targets to the generic NEON version
2024-12-05 14:43:03 -08:00
Martin Kroeker
3345007d8f retire the thunderx2 NRM2 kernels due to reported inaccuracies and NAN 2024-12-05 21:12:06 +01:00
Martin Kroeker
5fe983db29 retire the thunderx2 nrm2 kernels for now due to NAN and inaccuracies 2024-12-05 21:09:53 +01:00
Iha, Taisei
4918beecbe Loop-unrolled transposed [SD]GEMV kernels for A64FX and Neoverse V1 2024-12-02 18:46:00 +09:00
Juliya32
3b2421cba0 Add files via upload 2024-10-30 14:23:42 +05:30
Juliya32
012fe4da36 Delete kernel/arm64/rot_kernel_sve.c 2024-10-30 14:23:15 +05:30
Juliya32
d90ee00f85 Delete kernel/arm64/rot_kernel_c.c 2024-10-30 14:22:51 +05:30
Juliya32
668e28adc4 Delete kernel/arm64/rot.c 2024-10-30 14:22:31 +05:30
SushilPratap04
fa880ab1cf Update KERNEL.ARMV8SVE
updated KERNEL.ARMV8SVE for level 1 sve (swap, rot and scal) kernels.
2024-10-30 14:09:37 +05:30
SushilPratap04
7822ae9617 Added sve kernels for rot routine. 2024-10-30 14:05:21 +05:30
SushilPratap04
b8bc2a752e Added sve optimized kernels for swap routine 2024-10-30 14:02:57 +05:30
CDAC-SSDG
0667cf6c92 Added optimized scal routine files 2024-10-30 14:01:09 +05:30
Deeksha Goplani
4894c54055 Improve TN case with further unrolling 2024-09-02 22:22:49 +05:30
Chris Sidebottom
ba2e989c67 Add accumulators to AArch64 GEMV Kernels
This helps to reduce values going missing as we accumulate.
2024-07-31 13:09:14 +01:00
Martin Kroeker
fb7c53c5e5 Merge pull request #4807 from martin-frbg/scalfixes
[WIP]Make NAN handling in the SCAL kernels depend on the dummy2 parameter
2024-07-25 23:42:50 +02:00
Martin Kroeker
a4e56e0452 Merge pull request #4806 from Mousius/small-gemm
Small GEMM for AArch64 with SVE
2024-07-25 21:50:04 +02:00
yamazaki-mitsufumi
88caf02f62 Fix ambiguous error on Mac OS 2024-07-25 22:43:13 +09:00
Chris Sidebottom
ea4ab3b310 Better header guard around bridge 2024-07-20 14:39:57 +01:00
Chris Sidebottom
7311d93016 Unroll TT further 2024-07-19 17:51:20 +01:00
Chris Sidebottom
a9edddb695 Unroll TN further 2024-07-18 20:04:15 +01:00
Chris Sidebottom
9984c5ce9d Clean up k2 removal more and unroll SGEMM more 2024-07-18 18:35:43 +01:00
Chris Sidebottom
b1c9fafabb Remove k2 loop from DGEMM TN and use a more conservative heuristic for SGEMM 2024-07-18 17:37:18 +01:00
Martin Kroeker
eb4879e04c make NAN handling depend on the dummy2 parameter 2024-07-17 23:24:19 +02:00
iha fujitsu
0985fdc82b A64FX: Add support for SVE to SGEMV/DGEMV kernels. 2024-07-16 17:31:33 +09:00
Martin Kroeker
3677b3886c Merge pull request #4702 from bashimao/detect-nv-grace
Correctly detect ARM Neoverse V2 CPUs.
2024-06-30 22:48:48 +02:00
Chris Sidebottom
8c472ef7e3 Further tweak small GEMM for AArch64 2024-06-24 10:47:47 +01:00
Martin Kroeker
a2ee4b1966 Merge branch 'OpenMathLib:develop' into issue4728 2024-06-21 09:35:56 +02:00
Martin Kroeker
3ec59922b6 Add a clobber list to fix utest errors seen with gcc13 on Apple M 2024-06-20 16:19:32 +02:00
Martin Kroeker
3d8054fb16 add clobber list 2024-06-14 22:07:44 +02:00
Martin Kroeker
c7cacd9b38 disable the shortcut for da=0 to ensure proper handling of INF and NAN 2024-06-07 13:48:56 +02:00
Matthias Langer
0050a9660b Correctly detect ARM Neoverse V2 CPUs. 2024-05-16 09:59:52 +00:00
Martin Kroeker
7cfd433d0c revert the C/Z NRM2 kernels to the base NEON kernel as well 2024-04-12 15:34:04 +02:00
Martin Kroeker
441c81026e Add support for Cortex-A76 2024-04-02 19:41:44 +02:00
Martin Kroeker
9ead81bd39 Revert S/DNRM2 to the base NEON kernel to fix precision loss 2024-04-02 15:59:20 +02:00
Martin Kroeker
552c521353 remove another early exit for incx < 0 2024-03-12 18:49:27 +01:00
Martin Kroeker
ed532dc75b remove another early exit for incx < 0 2024-03-12 18:47:00 +01:00
Martin Kroeker
e41d01bad9 remove early exit on negative inc_x 2024-03-11 22:53:54 +01:00