OpenBLAS

mirror of https://github.com/OpenMathLib/OpenBLAS synced 2026-05-31 00:45:48 +08:00

Author	SHA1	Message	Date
Martin Kroeker	b2b9abc20b	Revert "Enhancing Core Utilization in BLAS Calls: A Scalable Architecture"	2025-10-22 15:44:43 +02:00
Martin Kroeker	5b640b1cbc	add bgemm_thread_xx	2025-10-16 10:03:04 -07:00
Martin Kroeker	a9a152ebc7	fix bgemv build	2025-10-16 10:00:41 -07:00
Martin Kroeker	a387217a07	Add BGEMV	2025-10-16 05:02:24 -07:00
Martin Kroeker	c92bac1524	Add SHGEMV	2025-10-16 04:57:18 -07:00
Chris Sidebottom	37fc3bbca0	Add Infrastructure for SHGEMV This adds all the relevant bits and pieces to add a `shgemv` path as well as a future `hgemm`/`hgemv` path in a similar model to `sb` and `b` interfaces. I've also fixed a few bits and pieces around `shgemm` which didn't build in a few situations.	2025-10-07 15:03:24 +00:00
Chip Kerchner	fc7d6e65a1	Change BF16 warning message.	2025-09-23 20:26:44 +00:00
Chip Kerchner	9427eaf4c4	Reduce flags for BF16 to only needed ones.	2025-09-23 20:24:40 +00:00
Chip Kerchner	3116749717	Disable bf16 flags on RISC-V unless BUILD_BFLOAT16=1	2025-09-23 15:02:20 +00:00
Martin Kroeker	66cc27e75f	Add message for fallback due to unavailable Zfh extension	2025-09-10 10:57:02 +02:00
Martin Kroeker	5ab143736f	Merge pull request #5431 from markdryan/markdryan/riscv-hf16-fix disable fp16 flags on RISC-V unless BUILD_HFLOAT16=1	2025-09-09 14:51:47 -07:00
Martin Kroeker	2fee943edb	Add CMake build support for IBM Z (#5440 ) * Add ZARCH support, including DYNAMIC_ARCH	2025-09-09 22:18:51 +02:00
Martin Kroeker	7b1f9bedf8	clean up duplicate assignment of cpus newer than POWER10	2025-09-08 15:11:57 +02:00
Mark Ryan	7fcad02dc2	fix RVV 1.0 detection code There were a couple of issues with the detection code used to check for RVV 1.0 on kernels that do not support hwprobe. 1. The vtype clobber was missing 2. The wrong form of vsetvli was being used. The vsetvli x0, x0 form is inappropriate for this use case as it can only be safely used in code where the value of vtype is known. The use of vsetvli x0, x0 here can lead to a failure to detect RVV 1.0, if, for example, the vill bit happens to be set before detect_riscv64_rvv100 is called. We fix both issues by adding the missing clobber and replacing the first parameter to vsetvli with t0 (which we add to our clobbers).	2025-08-28 14:20:37 +00:00
Mark Ryan	ce79fe12fd	disable fp16 flags on RISC-V unless BUILD_HFLOAT16=1 The compiler options that enable 16 bit floating point instructions should not be enabled by default when building the RISCV64_ZVL128B and RISCV64_ZVL256B targets. The zfh and zvfh extensions are not part of the 'V' extension and are not required by any of the RVA profiles. There's no guarantee that kernels built with zfh and zvfh will work correctly on fully compliant RVA23U64 devices. To fix the issue we only build the RISCV64_ZVL128B and RISCV64_ZVL256B kernels with the half float flags if BUILD_HFLOAT16=1. We also update the RISC-V dynamic detection code to disable the RISCV64_ZVL128B and RISCV64_ZVL256B kernels at runtime if we've built with DYNAMIC_ARCH=1 and BUILD_HFLOAT16=1 and are running on a device that does not support both Zfh and Zvfh. Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/5428	2025-08-28 09:41:07 +00:00
Masato Nakagawa	7e29f11396	Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1)	2025-07-29 18:54:36 +09:00
youcai	41f9701ebc	Fix cmake building with cblas_bgemm	2025-07-23 22:10:53 +08:00
Chris Sidebottom	e105411460	Add infrastructure for bgemv/bscal - Sets up all the various entrypoints for `bgemv` - Adds `bscal` for use in the `bgemv` interface - Adds test cases for comparing `sgemv` and `bgemv` - Adds generic kernels for `bgemv_n` and `bgemv_t` which are accurate enough to pass above tests	2025-07-15 14:48:57 +01:00
Martin Kroeker	b37516add6	Add BGEMM parameters	2025-07-10 14:59:01 +02:00
Chris Sidebottom	48394384ef	Use correct constants for per-target BGEMM/SBGEMM This fixes the build and tests on `NEOVERSEV1` target, which was failing with specific constants for `SBGEMM` Co-authored-by: Ye Tao <ye.tao@arm.com>	2025-07-08 16:23:27 +01:00
Chris Sidebottom	f95e7b0e32	Add infrastructure for BGEMM Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places. Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287 Co-authored-by: Ye Tao <ye.tao@arm.com>	2025-07-08 16:22:41 +01:00
Martin Kroeker	3d31887073	Merge pull request #5362 from Mousius/fix-bf16 Fix SBGEMM BFLOAT16 build	2025-07-08 14:35:50 +02:00
Martin Kroeker	0ddf8ebd42	Merge pull request #5354 from pratiklp00/p11 Add Support for POWER11	2025-07-08 11:52:18 +02:00
Chris Sidebottom	7a97c4ca97	Rename HALF -> BFLOAT16 in some more places	2025-07-07 10:13:39 +00:00
Masato Nakagawa	5253c8f165	Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for A64FX.	2025-06-30 21:35:16 +09:00
Martin Kroeker	8f0a1a3f82	Merge pull request #5303 from martin-frbg/issue5289 Exit if memory allocation keeps failing, instead of retrying forever	2025-06-29 22:47:56 +02:00
Martin Kroeker	9bcffbd655	Declare the server_lock mutex volatile in addition to static	2025-06-29 15:42:43 +02:00
pratiklp00	1dde4a13c0	p11 changes	2025-06-26 00:03:38 -05:00
zhoupeng	134b21ae60	Fix some hyperthreading errors. When there are multiple NUMA nodes and hyper-threading causes adjacent logical cores to share a physical core (e.g., common -> avail[i] = 0x5555555555555555UL), the numa_mapping function should not use a bitmask for filtering, as this would lead to redundant masking with the subsequent local_cpu_map function.	2025-06-25 09:52:26 +08:00
Martin Kroeker	d96daa220d	Merge pull request #5290 from Srangrang/develop Add support for FP16 to openBLAS and shgemm on RISCV	2025-06-24 23:10:15 +02:00
Martin Kroeker	e541bf68f5	support AmpereOne/OneA as NeoverseN1	2025-06-18 09:54:08 +02:00
Srangrang	9f13b2c6ac	style: modify HALF to BFLOAT16 in benchmark folder	2025-06-15 20:57:05 +08:00
Martin Kroeker	31ef2cbbb3	Exit if memory allocation keeps failing, instead of looping forever	2025-06-13 14:11:03 +02:00
gkdddd	670ec6f757	Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B Added HFLOAT16 support for RISCV64 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16 The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0 Related to issue #5279 Co-authored-by Linjin Li <linjin_li@163.com>	2025-06-03 20:14:30 +08:00
Martin Kroeker	20f2ba0141	Move declaration of i for pre-C99 compilers	2025-05-21 23:44:17 +02:00
Masato Nakagawa	2351a98005	Update 2D thread-partitioned GEMM for M << N case.	2025-05-21 21:21:52 +09:00
Martin Kroeker	5141a90993	Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222 ) * Fix ARMV9SME target and add support_sme1 code for MacOS * make sgemm_direct unconditionally available on all arm64 * build a (dummy) sgemm_direct kernel on all arm64 * Update dynamic_arm64.c	2025-05-10 22:39:32 +02:00
Ruiyang Wu	02fd1df10b	CMake: Pass `OpenMP` compiler and linker flags through CMake targets Using `OpenMP::OpenMP_LANG` targets for CMake is less error-prone than passing the compiler and linker flags manually. Furthermore, it allows the user to customize those flags by setting `OpenMP_LANG_FLAGS`, `OpenMP_LANG_LIB_NAMES`, and `OpenMP_omp_LIBRARY`.	2025-03-26 23:09:54 -04:00
Masato Nakagawa	80d3c2ad95	Add Improving Load Imbalance in Thread-Parallel GEMM	2025-03-11 20:18:20 +09:00
Martin Kroeker	39eb43d441	Improve thread safety of pthreads builds that rely on C11 atomic operations for locking (#5170 ) * Tighten memory orders for C11 atomic operations	2025-03-07 13:48:28 +01:00
Martin Kroeker	1533fe49be	Merge pull request #5144 from taoye9/dispatch_neoversve2_to_neoversven2 dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting	2025-02-24 16:07:06 +01:00
Ye Tao	f0bea79a6e	dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting	2025-02-21 10:30:11 +00:00
Martin Kroeker	eb84aac7ad	Merge pull request #5084 from quic/topic/sgemm_direct_sme1 Support for SGEMM_DIRECT Kernel based on SME1	2025-02-19 10:56:49 +01:00
Martin Kroeker	77c638db67	Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO"	2025-02-15 20:37:48 +01:00
Vaisakh K V	f66ca05b31	Merge branch 'develop' into topic/sgemm_direct_sme1	2025-02-13 14:54:37 +05:30
Vaisakh K V	d23eb3b93e	Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API * Added ARMV9SME target * Added SGEMM_DIRECT kernel based on SME1	2025-02-13 14:51:21 +05:30
John Hein	6cd9bbe531	fix signedness of pointer to integer type passed to blas_lock()	2025-02-01 17:22:57 -07:00
Martin Kroeker	a182251284	fix typo	2025-01-02 00:04:33 +01:00
Martin Kroeker	ed95791618	fix conflicting variables	2025-01-01 23:27:38 +01:00
Martin Kroeker	3c3d1c4849	Identify all cores and select the most performant one as TARGET	2025-01-01 22:21:29 +01:00

1 2 3 4 5 ...

814 Commits