OpenBLAS

mirror of https://github.com/OpenMathLib/OpenBLAS synced 2026-06-15 07:51:43 +08:00

Author	SHA1	Message	Date
Rajendra Prasad Matcha	19268471cc	Support for SME1 based ssymm_direct kernel for cblas_ssymm level 3 API	2025-09-30 15:05:33 +05:30
Martin Kroeker	e2f9f57433	Merge pull request #5432 from markdryan/markdyan/fix-rvv-detection fix RVV 1.0 detection code	2025-09-05 13:40:23 -07:00
Martin Kroeker	c31861ea62	Merge pull request #5435 from martin-frbg/update_rvv_ci Update the riscv-collab llvm toolchain in CI to its latest nightly build	2025-09-02 14:11:16 -07:00
Martin Kroeker	57c2936a43	Merge branch 'OpenMathLib:develop' into update_rvv_ci	2025-09-02 12:09:30 -07:00
Martin Kroeker	6d070820fc	Merge pull request #5436 from martin-frbg/update_osx_ci Update Mac CI jobs as cmake is preinstalled in the runner images now	2025-09-02 12:09:09 -07:00
Martin Kroeker	1c7251ca20	remove the -llto_library option for any osx fortran compiler	2025-09-02 18:36:02 +02:00
Martin Kroeker	a1331406a3	drop (re)installation of cmake on osx runners	2025-09-02 15:39:08 +02:00
Martin Kroeker	c42fccccb5	Drop installation of cmake	2025-09-02 15:36:32 +02:00
Martin Kroeker	4c1a4e60a6	Update toolchain to its latest nightly build	2025-09-02 14:54:08 +02:00
Mark Ryan	7fcad02dc2	fix RVV 1.0 detection code There were a couple of issues with the detection code used to check for RVV 1.0 on kernels that do not support hwprobe. 1. The vtype clobber was missing 2. The wrong form of vsetvli was being used. The vsetvli x0, x0 form is inappropriate for this use case as it can only be safely used in code where the value of vtype is known. The use of vsetvli x0, x0 here can lead to a failure to detect RVV 1.0, if, for example, the vill bit happens to be set before detect_riscv64_rvv100 is called. We fix both issues by adding the missing clobber and replacing the first parameter to vsetvli with t0 (which we add to our clobbers).	2025-08-28 14:20:37 +00:00
Martin Kroeker	06c09deee9	Merge pull request #5426 from hideaki-motoki/issue5417_axpy_sve Implementing SVE in `[SD]AXPY` Kernels for `A64FX` and `Graviton3E`	2025-08-26 01:10:14 -07:00
Martin Kroeker	da7d0f4a38	Merge pull request #5427 from yuanjia111/develop Optimize the gemv_t_vector.c kernel for RISCV64_ZVL256B target	2025-08-25 06:45:44 -07:00
yuanjia	c2cc7a3602	riscv64: optimize gemv_t_vector.c	2025-08-22 16:14:14 +08:00
h-motoki	e23f9c6642	Merge remote-tracking branch 'upstream/develop' into issue5417_axpy_sve	2025-08-21 22:16:28 +09:00
Martin Kroeker	b3f247ae5a	Merge pull request #5425 from martin-frbg/fixup5389 Increase L2 defaults for RISCV X280 / ZVL256B and ARM SVE targets in CMake cross-compilation	2025-08-21 05:13:34 -07:00
h-motoki	855945befb	Implementing SVE in [SD]AXPY Kernels for A64FX and Graviton3E	2025-08-21 20:56:58 +09:00
Martin Kroeker	7c1839899e	Increase assumed L2 sizes for RISCV X280 / ZVL256B and for SVE-capable ARM64	2025-08-21 11:57:07 +02:00
Martin Kroeker	9c43301b6d	Merge pull request #5421 from reibax-marcus/develop fix: broken cblas installation when using makefile based builds	2025-08-17 03:03:05 -07:00
Martin Kroeker	9d6df1dd3e	Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking Add and use vectorized packing in ZVL128B and ZVL256B for RISCV	2025-08-16 13:45:35 -07:00
Martin Kroeker	f3b2a15fad	Merge pull request #5420 from yuanjia111/develop Move the value assignment of vector x in gemv_n_sve.c to the outermos…	2025-08-16 12:06:53 -07:00
Chip Kerchner	64401b4417	Disable vectorized packing for DGEMM - since it is slower than scalar.	2025-08-13 13:41:12 +00:00
Martin Kroeker	5e43ba948c	Merge pull request #5419 from Mousius/bgemm-optimisation Optimize SBGEMM / BGEMM for NEOVERSEV1 further	2025-08-13 02:10:20 -07:00
Chip Kerchner	c00afc86a6	Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions.	2025-08-12 17:18:56 +00:00
Xabier Marquiegui	3a6b79c50f	fix: broken cblas installation when using makefile based builds Fix cblas.h missing from target directory if NO_CBLAS is defined but has a value that indicates you do want cblas built and installed.	2025-08-12 14:41:15 +02:00
yuanjia	803e8d4838	Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval. 1.Verify correctness using BLAS-Tester 2.Using the built-in benchmark to verify performance, the performance of float and doule type improved by about 60% and about 40% respectively.The test command is: export OMP_NUM_THREADS=1;numactl -C 10 -l ./sgemv.goto 3000 4000 100 export OMP_NUM_THREADS=1;numactl -C 10 -l ./dgemv.goto 3000 4000 100	2025-08-12 18:03:16 +08:00
Chris Sidebottom	5f47b872f1	Remove older kernels for BGEMM on NEOVERSEV1	2025-08-11 09:25:19 +00:00
Chris Sidebottom	114316f361	Optimize SBGEMM / BGEMM for NEOVERSEV1 further This changes the kernels to pack full SVE vectors and reduces the overall complexity of the inner GEMM loop.	2025-08-11 09:25:13 +00:00
Martin Kroeker	75c6ab4036	CI: Update WoA job to use LLVM 20.1.8 and avoid stray preinstalled LLVM19 (#5411 ) * Update to 20.1.8 * fix PATH to avoid the obsolete LLVM19 that appeared in the preinstalled msvc folder hierarchy	2025-08-09 12:28:24 +02:00
Martin Kroeker	5c5f852ee3	Merge pull request #5415 from martin-frbg/Fixum-5399 Fix compilation of the NeoverseN2 SBGEMM kernel	2025-08-04 04:29:26 -07:00
Martin Kroeker	f1ee61ea30	Include NEON header for the bfloat conversion functions	2025-08-04 00:21:39 -07:00
Martin Kroeker	b3ffd5524a	Include NEON header for the bfloat conversion functions	2025-08-04 00:20:28 -07:00
Martin Kroeker	d23680b81d	Merge pull request #5407 from nakagawa-fj/feature/gemm_divide_rate_for_neoversev1 Multi-thread Performance Improvement of GEMM on NeoverseV1 with DIVIDE_RATE=1	2025-07-30 13:19:50 -07:00
Martin Kroeker	b4cc4be2ce	Merge pull request #5410 from martin-frbg/issue5404 Adjust multithreading threshold in S/DGER and add an intermediate step	2025-07-30 12:16:05 -07:00
Martin Kroeker	0968dddf1a	Merge pull request #5409 from martin-frbg/issue5372 Work around gcc15.1 on POWER misoptimizing DGEMV at -O3	2025-07-30 10:36:39 -07:00
Martin Kroeker	eddfe1e6b3	Merge pull request #5408 from ChipKerchner/fixRISCV64GEMVInitializationAndWarnings Fix bad vector zero initializer and other compiler warnings for RISC-V.	2025-07-30 08:43:08 -07:00
Martin Kroeker	30d11bc92c	Adjust multithreading threshold and add an intermediate step	2025-07-30 08:13:33 -07:00
Martin Kroeker	a3b9c933c5	mark xbuffer as volatile to work around gcc15.1 optimizer bug	2025-07-30 17:05:36 +02:00
Chip Kerchner	72f082f31d	Fix bad vector zero initializer and other compiler warnings for RISC-V.	2025-07-30 14:04:43 +00:00
Masato Nakagawa	7e29f11396	Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1)	2025-07-29 18:54:36 +09:00
Martin Kroeker	9a64b32b44	Merge pull request #5406 from martin-frbg/fixbgemmtest Fix building of bgemm tests on GEMM3M-capable (x86) targets	2025-07-28 23:17:29 -07:00
Martin Kroeker	b66a01f909	Fix building of bgemm tests on GEMM3M-capable (x86) targets	2025-07-28 22:43:28 +02:00
Martin Kroeker	a5e7c0e3e0	Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16 ARM64: Enable bfloat16 kernels by default	2025-07-28 13:39:08 -07:00
abhishek-fujitsu	6356190d06	fix gfortran link path in dynamic_arch.yml	2025-07-28 14:37:51 +05:30
abhishek-fujitsu	4c8dcb3a8f	Darwin/arm64: disable SVE/SME and fix gfortran link path	2025-07-26 16:59:46 +05:30
Martin Kroeker	33b50548eb	Merge pull request #5403 from martin-frbg/issue5402 Introduce a (crude) threshold to multithreading in STRMV/DTRMV	2025-07-25 20:10:47 +02:00
Martin Kroeker	c504aedca1	Merge pull request #5400 from Mousius/neoversev2-target Add NEOVERSEV2 target support	2025-07-25 15:47:06 +02:00
Martin Kroeker	b9e107932a	add NeoverseV2	2025-07-25 15:44:34 +02:00
Martin Kroeker	2f89a5970e	fix NeoverseV2 typo	2025-07-25 15:43:37 +02:00
Martin Kroeker	a9e8fa06bf	Introduce a (crude) threshold to multithreading	2025-07-25 15:15:46 +02:00
Martin Kroeker	b4c2b34a45	Merge pull request #5401 from martin-frbg/followup-5397 Include float-bfloat conversion functions in ONLY_CBLAS builds as well	2025-07-25 13:56:13 +02:00

1 2 3 4 5 ...

9530 Commits