Commit Graph

2481 Commits

Author SHA1 Message Date
Martin Kroeker
669c847ceb support extra flag for NaN handling 2025-05-23 05:52:48 -07:00
Martin Kroeker
0b0bb9951d Merge pull request #5265 from guoyuanplct/develop
kernel/riscv64:Added support for omatcopy on RISCV64_ZVL256B
2025-05-17 05:08:47 -07:00
guoyuanplct
be9f7550b5 Format Code 2025-05-15 18:55:47 +08:00
guoyuanplct
4d213653d8 kernel/riscv64:Added support for omatcopy on riscv64. 2025-05-15 13:29:14 +08:00
Martin Kroeker
8afddc1a81 Merge pull request #5262 from guoyuanplct/develop
kernel/riscv64:Fixed the bug of openblas_utest_ext failing in c/zgemv and some c/zgbmv tests:
2025-05-14 02:40:32 -07:00
guoyuanplct
9a7e3f102b kernel/riscv64:Fixed the bug of openblas_utest_ext failing in c/zgemv and some c/zgbmv tests: 2025-05-14 00:09:26 +08:00
pengxu
a978ad3180 Loongarch64: add C functions of zgemm_ncopy_16 2025-05-13 16:09:12 +08:00
pengxu
0ccb050583 Loongarch64: fixed cgemm_ncopy_16_lasx 2025-05-13 16:08:33 +08:00
Martin Kroeker
5141a90993 Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222)
* Fix ARMV9SME target and add support_sme1 code for MacOS
* make sgemm_direct unconditionally available on all arm64
* build a (dummy) sgemm_direct kernel on all arm64





* Update dynamic_arm64.c
2025-05-10 22:39:32 +02:00
Martin Kroeker
151b74284e Merge pull request #5203 from quic/fix-sgemmdirect-sme1
Add vector registers to clobber list to prevent compiler optimization.
2025-05-09 05:39:47 -07:00
Martin Kroeker
cba32d001a Merge pull request #5245 from guoyuanplct/develop
Optimized RVV_ZVL256B Implementation of zgemv_n
2025-05-01 03:04:38 -07:00
pengxu
f19e72c402 Loongarch64: fixed swap_lasx 2025-04-30 16:42:52 +08:00
pengxu
b471fa337b Loongarch64: fixed snrm2_lasx 2025-04-30 16:42:36 +08:00
pengxu
57bb46bedf Loongarch64: fixed rot_lasx 2025-04-30 16:42:22 +08:00
pengxu
6dc4ca2391 Loongarch64: fixed icamax_lasx 2025-04-30 16:42:12 +08:00
pengxu
b528b1b8ea Loongarch64: fixed iamax_lasx 2025-04-30 16:41:58 +08:00
pengxu
ba9569e382 Loongarch64: fixed dot_lasx 2025-04-30 16:41:48 +08:00
pengxu
dc5fa29851 Loongarch64: fixed cscal_lasx 2025-04-30 16:41:39 +08:00
pengxu
a98dd6d911 Loongarch64: fixed copy_lasx 2025-04-30 16:41:28 +08:00
pengxu
d49319c2d2 Loongarch64: fixed cnrm2_lasx 2025-04-30 16:41:18 +08:00
pengxu
74c97ef814 Loongarch64: fixed cdot_lasx 2025-04-30 16:41:05 +08:00
pengxu
be525521ad Loongarch64: fixed asum_lasx 2025-04-30 16:40:55 +08:00
pengxu
0cd5ca5527 Loongarch64: fixed amax_lasx 2025-04-30 16:40:44 +08:00
guoyuanplct
11ffc8680e Format the code 2025-04-25 00:27:27 +08:00
guoyuanplct
7616c42095 Optimized RVV_ZVL256B Implementation of zgemv_n
The implementation of zgemv_n using RVV_ZVL256B has been optimized.
Compared to the previous implementation, it has achieved a 1.5x
performance improvement.
2025-04-25 00:05:15 +08:00
abhishek-fujitsu
9c02cdb073 optimise dot using thread throttling for NEOVERSE V1 2025-04-23 22:35:05 +05:30
Martin Kroeker
d0e8fd6d40 Merge pull request #5239 from annop-w/gemv_n_sve
Use SVE kernel for S/DGEMVN for SVE machines
2025-04-22 10:19:49 -07:00
Iha, Taisei
08b5c18d70 fixed a potential out-of-bounds on gemv. 2025-04-22 19:56:44 +09:00
Annop Wongwathanarat
e11744a411 Use SVE kernel for S/DGEMVN for SVE machines 2025-04-22 09:40:13 +00:00
Martin Kroeker
db0abfa907 Merge pull request #5238 from martin-frbg/revert5125
remove non-vectorized SGEMV transpose reduce path for POWER8, restoring optimizations frpm PR4880
2025-04-22 02:12:19 -07:00
Martin Kroeker
7389b6c483 Merge pull request #5237 from martin-frbg/revert5219
Fix and reinstate the Cooper Lake/Sapphire Rapids microkernel for non-transpose SBGEMV
2025-04-21 23:36:23 -07:00
Martin Kroeker
4ec62d7f73 remove non-vectorized code path for power8, restoring PR4880 2025-04-21 23:14:10 +02:00
Martin Kroeker
1df8738f27 Merge pull request #5235 from quickwritereader/issue_unaligned_ppc64le
Explicit unaligned vector load/stores in PPC64LE GEMV kernels
2025-04-21 14:03:56 -07:00
Martin Kroeker
99d9f1ff38 Fix conditional 2025-04-21 22:55:45 +02:00
Martin Kroeker
96d80801bc Reinstate the CooperLake microkernel 2025-04-21 22:53:26 +02:00
Martin Kroeker
2e4309315c Merge pull request #5219 from martin-frbg/sbgemvn_cooper
Temporarily disable the Cooper Lake/Sapphire Rapids microkernel for non-transpose SBGEMV
2025-04-20 07:29:20 -07:00
Ubuntu
0cc2485594 Explicit unaligned vector load/stores in PPC64LE GEMV kernels 2025-04-20 08:00:29 +00:00
Martin Kroeker
dd38b4e811 Merge pull request #5225 from annop-w/gemv_n
Improve performance for SGEMVN on NEONVERSEN1
2025-04-17 01:54:10 -07:00
Martin Kroeker
0241d516f6 Merge pull request #5220 from iha-taisei/sdgemv_n_unroll
Further performance improvements to non-transposed [SD]GEMV kernels for A64FX and Neoverse V1.
2025-04-16 12:55:55 -07:00
Annop Wongwathanarat
d535728803 Improve performance for SGEMVN on NEONVERSEN1 2025-04-16 09:54:30 +00:00
Usui, Tetsuzo
d711906e3e Add symv kernels for arm64 2025-04-11 20:39:52 +09:00
Iha, Taisei
f1e628b889 Further performance improvements to [SD]GEMV. 2025-04-11 20:00:33 +09:00
Martin Kroeker
211dfd0754 disable the CooperLake microkernel as it produces wrong results 2025-04-10 22:21:57 +02:00
Martin Kroeker
b30dc9701f Merge pull request #5215 from annop-w/gemv_t
Use SVE kernel for S/DGEMVT for SVE machines
2025-04-10 13:06:07 -07:00
Martin Kroeker
2893d0add4 Merge pull request #5211 from guoyuanplct/develop
Optimizing the Implementation of GEMV on the RISC-V V Extension
2025-04-10 09:43:03 -07:00
Annop Wongwathanarat
ec146157d3 Use SVE kernel for S/DGEMVT for SVE machines 2025-04-09 20:38:14 +00:00
Martin Kroeker
70865a894e Merge pull request #5180 from ywwry66/openmp_use_cmake
CMake: Pass `OpenMP` compiler and linker flags through CMake targets
2025-04-08 13:16:07 -07:00
lglglglgy
1ff303f36e Optimizing the Implementation of GEMV on the RISC-V V Extension
Specialized some scenarios, performed loop unrolling, and reduced the
number of multiplications.
2025-04-08 21:18:00 +08:00
ColumbusAI
7bf848454d Update zsum.c -- fixed spelling error to successfully compile
spelling error where zsum_kernel is used and it should be zasum_kernel. Will not compile without fix.
2025-04-05 09:57:53 -07:00
Vaisakh K V
04915be829 Add vector registers to clobber list to prevent compiler optimization.
SME based SGEMMDIRECT kernel uses the vector registers (z) and adding
    clobber list informs compiler not to optimize these registers.
2025-04-03 12:18:43 +05:30