OpenBLAS

mirror of https://github.com/OpenMathLib/OpenBLAS synced 2026-06-15 07:51:43 +08:00

Author	SHA1	Message	Date
Yichao Yu	b94e9b92ad	Fix compilation on ARM Define a dummy function if SME is not supported, following what sgemm does	2025-10-11 20:28:59 -04:00
Martin Kroeker	e40714cabd	Merge pull request #5450 from quic/topic/strmm_direct_sme1 Support for SME1 based strmm_direct kernel for cblas_strmm level 3 API	2025-10-11 15:20:19 -07:00
changjua	644ea07ef9	Support for SME1 based strmm_direct kernel for cblas_strmm level 3 API	2025-10-10 10:48:27 +08:00
Chris Sidebottom	578e7dae85	Fix bf16->f32 conversion for NEOVERSEV1 and NEOVERSEN2 targets This fixes an issue originally introduced with the BGEMM kernel. I've updated the tests to run with `beta=1.0` so as to test loading and updating from C. Alongside this, the tests now return sensible return values to reduce the risk of them being ignored. Also fixed a bug in `generic/gemv_t.c` resulting in weird outputs for `bgemv`.	2025-10-06 18:05:58 +00:00
Rajendra Prasad Matcha	19268471cc	Support for SME1 based ssymm_direct kernel for cblas_ssymm level 3 API	2025-09-30 15:05:33 +05:30
h-motoki	855945befb	Implementing SVE in [SD]AXPY Kernels for A64FX and Graviton3E	2025-08-21 20:56:58 +09:00
Martin Kroeker	f3b2a15fad	Merge pull request #5420 from yuanjia111/develop Move the value assignment of vector x in gemv_n_sve.c to the outermos…	2025-08-16 12:06:53 -07:00
yuanjia	803e8d4838	Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval. 1.Verify correctness using BLAS-Tester 2.Using the built-in benchmark to verify performance, the performance of float and doule type improved by about 60% and about 40% respectively.The test command is: export OMP_NUM_THREADS=1;numactl -C 10 -l ./sgemv.goto 3000 4000 100 export OMP_NUM_THREADS=1;numactl -C 10 -l ./dgemv.goto 3000 4000 100	2025-08-12 18:03:16 +08:00
Chris Sidebottom	5f47b872f1	Remove older kernels for BGEMM on NEOVERSEV1	2025-08-11 09:25:19 +00:00
Chris Sidebottom	114316f361	Optimize SBGEMM / BGEMM for NEOVERSEV1 further This changes the kernels to pack full SVE vectors and reduces the overall complexity of the inner GEMM loop.	2025-08-11 09:25:13 +00:00
Martin Kroeker	f1ee61ea30	Include NEON header for the bfloat conversion functions	2025-08-04 00:21:39 -07:00
Martin Kroeker	b3ffd5524a	Include NEON header for the bfloat conversion functions	2025-08-04 00:20:28 -07:00
Martin Kroeker	a5e7c0e3e0	Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16 ARM64: Enable bfloat16 kernels by default	2025-07-28 13:39:08 -07:00
abhishek-fujitsu	0bc79da587	add neon header	2025-07-25 11:10:20 +05:30
Chris Sidebottom	ea2faf0c9a	Add optimized BGEMM for NEOVERSEN2 target This re-uses the existing NEOVERSEN2 8x4 `sbgemm` kernel to implement `bgemm`.	2025-07-24 10:59:28 +00:00
Chris Sidebottom	2c3cdaf74e	Optimized BGEMV for NEOVERSEV1 target - Adds bgemv T based off of sbgemv T kernel - Adds bgemv N which is slightly alterated to not use Y as an accumulator due to the output being bf16 which results in loss of precision - Enables BGEMM_GEMV_FORWARD to proxy BGEMM to BGEMV with new kernels	2025-07-23 10:51:41 +01:00
Martin Kroeker	39c90f9859	Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta SME1 based direct kernel (with alpha and beta) for cblas_sgemm level 3	2025-07-18 23:23:39 +02:00
Rajendra Prasad Matcha	eae0abfdb6	SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API.	2025-07-17 16:14:31 +05:30
Chris Sidebottom	740efd71c4	Add optimized BGEMM kernel for NEOVERSEV1 target This also improves the testing and generic kernel by re-using the BF16 conversion functions. Built on top of https://github.com/OpenMathLib/OpenBLAS/pull/5357 and derived from https://github.com/OpenMathLib/OpenBLAS/pull/5287 Co-authored-by: Ye Tao <ye.tao@arm.com>	2025-07-10 23:23:27 +00:00
Martin Kroeker	fd37406817	Merge branch 'develop' into optimized_gemv_n_1x3	2025-07-08 21:05:30 +02:00
Iha, Taisei	f7ad906b49	Performance improvements of [SD]DOT with loop-unrolling on A64FX	2025-07-04 22:57:44 +09:00
Martin Kroeker	ee26caffb3	Merge pull request #5309 from davidz-ampere/dev-ampereone Add support for Ampere AmpereOne processors	2025-06-24 12:27:08 +02:00
davidz-ampere	aa90ab4142	Add support for Ampere AmpereOne processors	2025-06-24 00:12:34 -04:00
Ian McInerney	badef1d32e	Update sbgemm_tcopy_4_neoversev1 kernel to use standard C types	2025-06-19 14:26:16 +01:00
davidz-ampere	84730068af	reduce duplicate kernel code	2025-06-17 03:05:34 -04:00
davidz-ampere	be68ef03b4	Add support for Ampere processors	2025-06-15 22:00:40 -04:00
Martin Kroeker	58eeb9041c	fix handling of dummy2	2025-06-12 03:03:01 -07:00
Martin Kroeker	1589d0b21e	Merge pull request #5281 from martin-frbg/zscal_arm64 kernel/arm64: fixed cscal and zscal	2025-06-12 01:04:18 -07:00
Sharif Inamdar	8279e68805	Optimize gemv_n_sve_v1x3 kernel - Calculate predicate outside the loop - Divide matrix in blocks of 3	2025-06-11 10:16:56 +00:00
Arne Juul	5442aff218	Accumulate results in output register explicitly	2025-06-09 19:03:22 +00:00
Martin Kroeker	28f8fdaf0f	support flag for NaN/Inf handling and fix scaling of NaN/Inf values	2025-05-23 14:59:59 +02:00
Martin Kroeker	5141a90993	Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222 ) * Fix ARMV9SME target and add support_sme1 code for MacOS * make sgemm_direct unconditionally available on all arm64 * build a (dummy) sgemm_direct kernel on all arm64 * Update dynamic_arm64.c	2025-05-10 22:39:32 +02:00
Martin Kroeker	151b74284e	Merge pull request #5203 from quic/fix-sgemmdirect-sme1 Add vector registers to clobber list to prevent compiler optimization.	2025-05-09 05:39:47 -07:00
abhishek-fujitsu	9c02cdb073	optimise dot using thread throttling for NEOVERSE V1	2025-04-23 22:35:05 +05:30
Martin Kroeker	d0e8fd6d40	Merge pull request #5239 from annop-w/gemv_n_sve Use SVE kernel for S/DGEMVN for SVE machines	2025-04-22 10:19:49 -07:00
Iha, Taisei	08b5c18d70	fixed a potential out-of-bounds on gemv.	2025-04-22 19:56:44 +09:00
Annop Wongwathanarat	e11744a411	Use SVE kernel for S/DGEMVN for SVE machines	2025-04-22 09:40:13 +00:00
Martin Kroeker	dd38b4e811	Merge pull request #5225 from annop-w/gemv_n Improve performance for SGEMVN on NEONVERSEN1	2025-04-17 01:54:10 -07:00
Martin Kroeker	0241d516f6	Merge pull request #5220 from iha-taisei/sdgemv_n_unroll Further performance improvements to non-transposed [SD]GEMV kernels for A64FX and Neoverse V1.	2025-04-16 12:55:55 -07:00
Annop Wongwathanarat	d535728803	Improve performance for SGEMVN on NEONVERSEN1	2025-04-16 09:54:30 +00:00
Usui, Tetsuzo	d711906e3e	Add symv kernels for arm64	2025-04-11 20:39:52 +09:00
Iha, Taisei	f1e628b889	Further performance improvements to [SD]GEMV.	2025-04-11 20:00:33 +09:00
Annop Wongwathanarat	ec146157d3	Use SVE kernel for S/DGEMVT for SVE machines	2025-04-09 20:38:14 +00:00
Vaisakh K V	04915be829	Add vector registers to clobber list to prevent compiler optimization. SME based SGEMMDIRECT kernel uses the vector registers (z) and adding clobber list informs compiler not to optimize these registers.	2025-04-03 12:18:43 +05:30
Ye Tao	f27ba5efd1	fix bugs in aarch64 sbgemv_n kernel	2025-03-14 17:55:40 +00:00
Annop Wongwathanarat	edef2e4441	Fix bug in ARM64 sbgemv_t	2025-03-13 20:55:31 +00:00
Martin Kroeker	b55ca71d5b	Merge pull request #5182 from annop-w/sgemm_ncopy Optimize aarch64 sgemm_ncopy	2025-03-13 16:04:39 +01:00
Martin Kroeker	2f778554b8	Merge pull request #5181 from taoye9/change_sbgemn_cast_bf16 replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16	2025-03-13 13:50:26 +01:00
Annop Wongwathanarat	9807f56580	Optimize aarch64 sgemm_ncopy	2025-03-13 10:17:43 +00:00
Martin Kroeker	a3e7b16072	Merge pull request #5157 from manaalmj/feature Optimize gemv_n_sve kernel	2025-03-12 21:08:23 +01:00

1 2 3 4 5 ...

360 Commits