Commit Graph

800 Commits

Author SHA1 Message Date
Martin Kroeker
18f9582f3e Add VORTEXM4 2025-08-18 01:54:09 -07:00
Masato Nakagawa
7e29f11396 Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1) 2025-07-29 18:54:36 +09:00
youcai
41f9701ebc Fix cmake building with cblas_bgemm 2025-07-23 22:10:53 +08:00
Chris Sidebottom
e105411460 Add infrastructure for bgemv/bscal
- Sets up all the various entrypoints for `bgemv`
- Adds `bscal` for use in the `bgemv` interface
- Adds test cases for comparing `sgemv` and `bgemv`
- Adds generic kernels for `bgemv_n` and `bgemv_t` which are accurate
enough to pass above tests
2025-07-15 14:48:57 +01:00
Martin Kroeker
b37516add6 Add BGEMM parameters 2025-07-10 14:59:01 +02:00
Chris Sidebottom
48394384ef Use correct constants for per-target BGEMM/SBGEMM
This fixes the build and tests on `NEOVERSEV1` target, which was failing
with specific constants for `SBGEMM`

Co-authored-by: Ye Tao <ye.tao@arm.com>
2025-07-08 16:23:27 +01:00
Chris Sidebottom
f95e7b0e32 Add infrastructure for BGEMM
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.

Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287

Co-authored-by: Ye Tao <ye.tao@arm.com>
2025-07-08 16:22:41 +01:00
Martin Kroeker
3d31887073 Merge pull request #5362 from Mousius/fix-bf16
Fix SBGEMM BFLOAT16 build
2025-07-08 14:35:50 +02:00
Martin Kroeker
0ddf8ebd42 Merge pull request #5354 from pratiklp00/p11
Add Support for POWER11
2025-07-08 11:52:18 +02:00
Chris Sidebottom
7a97c4ca97 Rename HALF -> BFLOAT16 in some more places 2025-07-07 10:13:39 +00:00
Masato Nakagawa
5253c8f165 Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for
A64FX.
2025-06-30 21:35:16 +09:00
Martin Kroeker
8f0a1a3f82 Merge pull request #5303 from martin-frbg/issue5289
Exit if memory allocation keeps failing, instead of retrying forever
2025-06-29 22:47:56 +02:00
Martin Kroeker
9bcffbd655 Declare the server_lock mutex volatile in addition to static 2025-06-29 15:42:43 +02:00
pratiklp00
1dde4a13c0 p11 changes 2025-06-26 00:03:38 -05:00
zhoupeng
134b21ae60 Fix some hyperthreading errors.
When there are multiple NUMA nodes and hyper-threading causes adjacent logical cores to share a physical core (e.g., common -> avail[i] = 0x5555555555555555UL), the numa_mapping function should not use a bitmask for filtering, as this would lead to redundant masking with the subsequent local_cpu_map function.
2025-06-25 09:52:26 +08:00
Martin Kroeker
d96daa220d Merge pull request #5290 from Srangrang/develop
Add support for FP16 to openBLAS and shgemm on RISCV
2025-06-24 23:10:15 +02:00
Martin Kroeker
e541bf68f5 support AmpereOne/OneA as NeoverseN1 2025-06-18 09:54:08 +02:00
Srangrang
9f13b2c6ac style: modify HALF to BFLOAT16 in benchmark folder 2025-06-15 20:57:05 +08:00
Martin Kroeker
31ef2cbbb3 Exit if memory allocation keeps failing, instead of looping forever 2025-06-13 14:11:03 +02:00
gkdddd
670ec6f757 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B
Added HFLOAT16 support for RISCV64
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16
The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0

Related to issue #5279
Co-authored-by Linjin Li <linjin_li@163.com>
2025-06-03 20:14:30 +08:00
Martin Kroeker
20f2ba0141 Move declaration of i for pre-C99 compilers 2025-05-21 23:44:17 +02:00
Masato Nakagawa
2351a98005 Update 2D thread-partitioned GEMM for M << N case. 2025-05-21 21:21:52 +09:00
Martin Kroeker
5141a90993 Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222)
* Fix ARMV9SME target and add support_sme1 code for MacOS
* make sgemm_direct unconditionally available on all arm64
* build a (dummy) sgemm_direct kernel on all arm64





* Update dynamic_arm64.c
2025-05-10 22:39:32 +02:00
Ruiyang Wu
02fd1df10b CMake: Pass OpenMP compiler and linker flags through CMake targets
Using `OpenMP::OpenMP_LANG` targets for CMake is less error-prone than
passing the compiler and linker flags manually. Furthermore, it allows
the user to customize those flags by setting `OpenMP_LANG_FLAGS`,
`OpenMP_LANG_LIB_NAMES`, and `OpenMP_omp_LIBRARY`.
2025-03-26 23:09:54 -04:00
Masato Nakagawa
80d3c2ad95 Add Improving Load Imbalance in Thread-Parallel GEMM 2025-03-11 20:18:20 +09:00
Martin Kroeker
39eb43d441 Improve thread safety of pthreads builds that rely on C11 atomic operations for locking (#5170)
* Tighten memory orders for C11 atomic operations
2025-03-07 13:48:28 +01:00
Martin Kroeker
1533fe49be Merge pull request #5144 from taoye9/dispatch_neoversve2_to_neoversven2
dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting
2025-02-24 16:07:06 +01:00
Ye Tao
f0bea79a6e dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting 2025-02-21 10:30:11 +00:00
Martin Kroeker
eb84aac7ad Merge pull request #5084 from quic/topic/sgemm_direct_sme1
Support for SGEMM_DIRECT Kernel based on SME1
2025-02-19 10:56:49 +01:00
Martin Kroeker
77c638db67 Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO" 2025-02-15 20:37:48 +01:00
Vaisakh K V
f66ca05b31 Merge branch 'develop' into topic/sgemm_direct_sme1 2025-02-13 14:54:37 +05:30
Vaisakh K V
d23eb3b93e Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
2025-02-13 14:51:21 +05:30
John Hein
6cd9bbe531 fix signedness of pointer to integer type passed to blas_lock() 2025-02-01 17:22:57 -07:00
Martin Kroeker
a182251284 fix typo 2025-01-02 00:04:33 +01:00
Martin Kroeker
ed95791618 fix conflicting variables 2025-01-01 23:27:38 +01:00
Martin Kroeker
3c3d1c4849 Identify all cores and select the most performant one as TARGET 2025-01-01 22:21:29 +01:00
Ralf Gommers
765ad8bcd2 Fix guard around alloc_hugetlb, fixes compile warning
The warning was:
```
/home/rgommers/code/pixi-dev-scipystack/openblas/OpenBLAS/driver/others/memory.c: At top level:
/home/rgommers/code/pixi-dev-scipystack/openblas/OpenBLAS/driver/others/memory.c:2565:14: warning: 'alloc_hugetlb' defined but not used [-Wunused-function]
 2565 | static void *alloc_hugetlb(void *address){
      |              ^~~~~~~~~~~~~
```

The added define is the same as is already present in the TLS part of
`memory.c`. This follows up on gh-4681.
2024-12-18 09:42:05 +01:00
Ralf Gommers
48caf2303d Fix build warning about discarding volatile qualifier in memory.c
The warning was:
```
[4339/5327] Building C object driver/others/CMakeFiles/driver_others.dir/memory.c.o
/home/rgommers/code/pixi-dev-scipystack/openblas/OpenBLAS/driver/others/memory.c: In function 'blas_shutdown':
/home/rgommers/code/pixi-dev-scipystack/openblas/OpenBLAS/driver/others/memory.c:3257:10: warning: passing argument 1 of 'free' discards 'volatile' qualifier from pointer target type [-Wdiscarded-qualifiers]
 3257 |     free(newmemory);
      |          ^~~~~~~~~
In file included from /home/rgommers/code/pixi-dev-scipystack/openblas/OpenBLAS/common.h:83,
                 from /home/rgommers/code/pixi-dev-scipystack/openblas/OpenBLAS/driver/others/memory.c:74:
/home/rgommers/code/pixi-dev-scipystack/openblas/.pixi/envs/default/x86_64-conda-linux-gnu/sysroot/usr/include/stdlib.h:482:25: note: expected 'void *' but argument is of type 'volatile struct newmemstruct *'
  482 | extern void free (void *__ptr) __THROW;
      |                   ~~~~~~^~~~~
```

The use of `volatile` for `newmemstruct` seems on purpose, and there are
more such constructs in this file. The warning appeared after gh-4451
and is correct. The `free` prototype doesn't expect a volatile pointer,
hence this change adds a cast to silence the warning.
2024-12-18 08:53:29 +01:00
Martin Kroeker
4060dd43e3 Add dummy implementations of openblas_get/set_affinity 2024-11-15 15:16:17 -08:00
Martin Kroeker
8a1710dd0d don't apply switch_ratio to tail of loop 2024-10-06 20:03:32 +02:00
Martin Kroeker
de421b7764 Merge pull request #4904 from XiWeiGu/la64_cross_cmake
LoongArch64: Enable cmake cross-compilation
2024-10-03 15:53:57 +02:00
gxw
30af9278dc LoongArch64: Enable cmake cross-compilation 2024-09-29 10:13:30 +08:00
gxw
48698b2b1d LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Martin Kroeker
3ee9e9d8d0 Merge pull request #4879 from martin-frbg/issue4868-2
Ensure a memory buffer has been allocated for each thread before invoking it (take 2)
2024-08-15 22:06:54 +02:00
Martin Kroeker
a8d6b0219a Merge pull request #4877 from XiWeiGu/fixed_undefined_blas_set_parameter
Fixed the undefined reference to blas_set_parameter
2024-08-15 15:35:26 +02:00
Martin Kroeker
d24b3cf393 properly fix buffer allocation and assignment 2024-08-15 15:32:58 +02:00
gxw
fd033467ac Fixed the undefined reference to blas_set_parameter
Fixed the undefined reference to blas_set_parameter when
enabling USE_OPENMP and DYNAMIC_ARCH.
2024-08-15 16:48:48 +08:00
Martin Kroeker
23b5d66a86 Ensure a memory buffer has been allocated for each thread before invoking it 2024-08-14 10:35:44 +02:00
Martin Kroeker
753c7ebe17 Merge pull request #4835 from martin-frbg/revertwin4359
Temporarily revert to the coarse-grained locking in the Windows thread server
2024-08-07 14:09:32 +02:00
Martin Kroeker
50397e017a Merge pull request #4838 from martin-frbg/fix4662-3
fix invalid ifdef syntax in HUGETLB handling
2024-08-04 11:32:10 +02:00