Martin Kroeker
fd37406817
Merge branch 'develop' into optimized_gemv_n_1x3
2025-07-08 21:05:30 +02:00
Iha, Taisei
f7ad906b49
Performance improvements of [SD]DOT with loop-unrolling on A64FX
2025-07-04 22:57:44 +09:00
Martin Kroeker
ee26caffb3
Merge pull request #5309 from davidz-ampere/dev-ampereone
...
Add support for Ampere AmpereOne processors
2025-06-24 12:27:08 +02:00
davidz-ampere
aa90ab4142
Add support for Ampere AmpereOne processors
2025-06-24 00:12:34 -04:00
Ian McInerney
badef1d32e
Update sbgemm_tcopy_4_neoversev1 kernel to use standard C types
2025-06-19 14:26:16 +01:00
davidz-ampere
84730068af
reduce duplicate kernel code
2025-06-17 03:05:34 -04:00
davidz-ampere
be68ef03b4
Add support for Ampere processors
2025-06-15 22:00:40 -04:00
Martin Kroeker
58eeb9041c
fix handling of dummy2
2025-06-12 03:03:01 -07:00
Martin Kroeker
1589d0b21e
Merge pull request #5281 from martin-frbg/zscal_arm64
...
kernel/arm64: fixed cscal and zscal
2025-06-12 01:04:18 -07:00
Sharif Inamdar
8279e68805
Optimize gemv_n_sve_v1x3 kernel
...
- Calculate predicate outside the loop
- Divide matrix in blocks of 3
2025-06-11 10:16:56 +00:00
Arne Juul
5442aff218
Accumulate results in output register explicitly
2025-06-09 19:03:22 +00:00
Martin Kroeker
28f8fdaf0f
support flag for NaN/Inf handling and fix scaling of NaN/Inf values
2025-05-23 14:59:59 +02:00
Martin Kroeker
5141a90993
Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS ( #5222 )
...
* Fix ARMV9SME target and add support_sme1 code for MacOS
* make sgemm_direct unconditionally available on all arm64
* build a (dummy) sgemm_direct kernel on all arm64
* Update dynamic_arm64.c
2025-05-10 22:39:32 +02:00
Martin Kroeker
151b74284e
Merge pull request #5203 from quic/fix-sgemmdirect-sme1
...
Add vector registers to clobber list to prevent compiler optimization.
2025-05-09 05:39:47 -07:00
abhishek-fujitsu
9c02cdb073
optimise dot using thread throttling for NEOVERSE V1
2025-04-23 22:35:05 +05:30
Martin Kroeker
d0e8fd6d40
Merge pull request #5239 from annop-w/gemv_n_sve
...
Use SVE kernel for S/DGEMVN for SVE machines
2025-04-22 10:19:49 -07:00
Iha, Taisei
08b5c18d70
fixed a potential out-of-bounds on gemv.
2025-04-22 19:56:44 +09:00
Annop Wongwathanarat
e11744a411
Use SVE kernel for S/DGEMVN for SVE machines
2025-04-22 09:40:13 +00:00
Martin Kroeker
dd38b4e811
Merge pull request #5225 from annop-w/gemv_n
...
Improve performance for SGEMVN on NEONVERSEN1
2025-04-17 01:54:10 -07:00
Martin Kroeker
0241d516f6
Merge pull request #5220 from iha-taisei/sdgemv_n_unroll
...
Further performance improvements to non-transposed [SD]GEMV kernels for A64FX and Neoverse V1.
2025-04-16 12:55:55 -07:00
Annop Wongwathanarat
d535728803
Improve performance for SGEMVN on NEONVERSEN1
2025-04-16 09:54:30 +00:00
Usui, Tetsuzo
d711906e3e
Add symv kernels for arm64
2025-04-11 20:39:52 +09:00
Iha, Taisei
f1e628b889
Further performance improvements to [SD]GEMV.
2025-04-11 20:00:33 +09:00
Annop Wongwathanarat
ec146157d3
Use SVE kernel for S/DGEMVT for SVE machines
2025-04-09 20:38:14 +00:00
Vaisakh K V
04915be829
Add vector registers to clobber list to prevent compiler optimization.
...
SME based SGEMMDIRECT kernel uses the vector registers (z) and adding
clobber list informs compiler not to optimize these registers.
2025-04-03 12:18:43 +05:30
Ye Tao
f27ba5efd1
fix bugs in aarch64 sbgemv_n kernel
2025-03-14 17:55:40 +00:00
Annop Wongwathanarat
edef2e4441
Fix bug in ARM64 sbgemv_t
2025-03-13 20:55:31 +00:00
Martin Kroeker
b55ca71d5b
Merge pull request #5182 from annop-w/sgemm_ncopy
...
Optimize aarch64 sgemm_ncopy
2025-03-13 16:04:39 +01:00
Martin Kroeker
2f778554b8
Merge pull request #5181 from taoye9/change_sbgemn_cast_bf16
...
replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16
2025-03-13 13:50:26 +01:00
Annop Wongwathanarat
9807f56580
Optimize aarch64 sgemm_ncopy
2025-03-13 10:17:43 +00:00
Martin Kroeker
a3e7b16072
Merge pull request #5157 from manaalmj/feature
...
Optimize gemv_n_sve kernel
2025-03-12 21:08:23 +01:00
Ye Tao
4c00099ed6
replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16
2025-03-12 16:20:15 +00:00
Annop Wongwathanarat
a085b6c9ec
Fix aarch64 sbgemv_t compilation error for GCC < 13
2025-03-12 14:52:42 +00:00
manjam01
5c4e38ab17
Optimize gemv_n_sve kernel
2025-03-10 16:39:20 +00:00
Martin Kroeker
1d5ed5c46b
Merge pull request #5168 from taoye9/add_sbgemvn_on_neonversen2
...
Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
2025-03-04 16:39:22 +01:00
Ye Tao
6b8b35cdf2
fix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c
2025-03-03 11:55:27 +00:00
Ye Tao
38ee7c9301
Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
2025-03-03 11:32:05 +00:00
Martin Kroeker
2b941c44b5
Merge branch 'develop' into sbgemv_n_neon
2025-03-02 22:39:32 +01:00
Ye Tao
35bdbca153
Add sbgemv_n_neon kernel for arm64.
2025-02-28 14:37:06 +00:00
Annop Wongwathanarat
edaf51dd99
Add sbgemv_t_bfdot kernel for ARM64
...
This improves performance for sbgemv_t by up to 100x on NEOVERSEV1.
The geometric mean speedup is ~61x for M=N=[2,512].
2025-02-28 12:31:50 +00:00
Vaisakh K V
f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1
2025-02-13 14:54:37 +05:30
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
...
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
2025-02-13 14:51:21 +05:30
Ye Tao
c748e6a338
optimized sbgemm kernel for neoverse-v1 (sve-256)
...
Signed-off-by: Ye Tao <ye.tao@arm.com >
2025-02-05 10:06:37 +00:00
Aditya Tewari
4379a6fbe3
* checkpoint sbgemm for SVE-256
2025-02-03 12:49:49 +00:00
Martin Kroeker
6e393a5599
Merge branch 'develop' into gemv_t
2025-01-25 12:54:04 +01:00
Martin Kroeker
876ba58e28
Merge pull request #5091 from goplanid/develop
...
Small gemm kernel improvements for AArch64
2025-01-24 10:59:16 +01:00
Martin Kroeker
180ba5e7d0
Merge pull request #5069 from tingboliao/dev_rotm_20250107
...
Further rearranged the rotm kernel for the different architectures.
2025-01-23 10:16:43 +01:00
Deeksha Goplani
d1bfa979f7
small gemm kernel packing modifications
2025-01-23 09:41:45 +05:30
tingbo.liao
3c8df6358f
Further rearranged the rotm kernel for the different architectures.
...
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com >
2025-01-22 11:41:12 +08:00
Annop Wongwathanarat
c0318cea6e
Simplify gemv_t_sve_v1x3 kernel
2025-01-21 13:40:17 +00:00