Commit Graph

781 Commits

Author SHA1 Message Date
Martin Kroeker
e541bf68f5 support AmpereOne/OneA as NeoverseN1 2025-06-18 09:54:08 +02:00
Martin Kroeker
20f2ba0141 Move declaration of i for pre-C99 compilers 2025-05-21 23:44:17 +02:00
Masato Nakagawa
2351a98005 Update 2D thread-partitioned GEMM for M << N case. 2025-05-21 21:21:52 +09:00
Martin Kroeker
5141a90993 Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222)
* Fix ARMV9SME target and add support_sme1 code for MacOS
* make sgemm_direct unconditionally available on all arm64
* build a (dummy) sgemm_direct kernel on all arm64





* Update dynamic_arm64.c
2025-05-10 22:39:32 +02:00
Ruiyang Wu
02fd1df10b CMake: Pass OpenMP compiler and linker flags through CMake targets
Using `OpenMP::OpenMP_LANG` targets for CMake is less error-prone than
passing the compiler and linker flags manually. Furthermore, it allows
the user to customize those flags by setting `OpenMP_LANG_FLAGS`,
`OpenMP_LANG_LIB_NAMES`, and `OpenMP_omp_LIBRARY`.
2025-03-26 23:09:54 -04:00
Masato Nakagawa
80d3c2ad95 Add Improving Load Imbalance in Thread-Parallel GEMM 2025-03-11 20:18:20 +09:00
Martin Kroeker
39eb43d441 Improve thread safety of pthreads builds that rely on C11 atomic operations for locking (#5170)
* Tighten memory orders for C11 atomic operations
2025-03-07 13:48:28 +01:00
Martin Kroeker
1533fe49be Merge pull request #5144 from taoye9/dispatch_neoversve2_to_neoversven2
dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting
2025-02-24 16:07:06 +01:00
Ye Tao
f0bea79a6e dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting 2025-02-21 10:30:11 +00:00
Martin Kroeker
eb84aac7ad Merge pull request #5084 from quic/topic/sgemm_direct_sme1
Support for SGEMM_DIRECT Kernel based on SME1
2025-02-19 10:56:49 +01:00
Martin Kroeker
77c638db67 Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO" 2025-02-15 20:37:48 +01:00
Vaisakh K V
f66ca05b31 Merge branch 'develop' into topic/sgemm_direct_sme1 2025-02-13 14:54:37 +05:30
Vaisakh K V
d23eb3b93e Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
2025-02-13 14:51:21 +05:30
John Hein
6cd9bbe531 fix signedness of pointer to integer type passed to blas_lock() 2025-02-01 17:22:57 -07:00
Martin Kroeker
a182251284 fix typo 2025-01-02 00:04:33 +01:00
Martin Kroeker
ed95791618 fix conflicting variables 2025-01-01 23:27:38 +01:00
Martin Kroeker
3c3d1c4849 Identify all cores and select the most performant one as TARGET 2025-01-01 22:21:29 +01:00
Ralf Gommers
765ad8bcd2 Fix guard around alloc_hugetlb, fixes compile warning
The warning was:
```
/home/rgommers/code/pixi-dev-scipystack/openblas/OpenBLAS/driver/others/memory.c: At top level:
/home/rgommers/code/pixi-dev-scipystack/openblas/OpenBLAS/driver/others/memory.c:2565:14: warning: 'alloc_hugetlb' defined but not used [-Wunused-function]
 2565 | static void *alloc_hugetlb(void *address){
      |              ^~~~~~~~~~~~~
```

The added define is the same as is already present in the TLS part of
`memory.c`. This follows up on gh-4681.
2024-12-18 09:42:05 +01:00
Ralf Gommers
48caf2303d Fix build warning about discarding volatile qualifier in memory.c
The warning was:
```
[4339/5327] Building C object driver/others/CMakeFiles/driver_others.dir/memory.c.o
/home/rgommers/code/pixi-dev-scipystack/openblas/OpenBLAS/driver/others/memory.c: In function 'blas_shutdown':
/home/rgommers/code/pixi-dev-scipystack/openblas/OpenBLAS/driver/others/memory.c:3257:10: warning: passing argument 1 of 'free' discards 'volatile' qualifier from pointer target type [-Wdiscarded-qualifiers]
 3257 |     free(newmemory);
      |          ^~~~~~~~~
In file included from /home/rgommers/code/pixi-dev-scipystack/openblas/OpenBLAS/common.h:83,
                 from /home/rgommers/code/pixi-dev-scipystack/openblas/OpenBLAS/driver/others/memory.c:74:
/home/rgommers/code/pixi-dev-scipystack/openblas/.pixi/envs/default/x86_64-conda-linux-gnu/sysroot/usr/include/stdlib.h:482:25: note: expected 'void *' but argument is of type 'volatile struct newmemstruct *'
  482 | extern void free (void *__ptr) __THROW;
      |                   ~~~~~~^~~~~
```

The use of `volatile` for `newmemstruct` seems on purpose, and there are
more such constructs in this file. The warning appeared after gh-4451
and is correct. The `free` prototype doesn't expect a volatile pointer,
hence this change adds a cast to silence the warning.
2024-12-18 08:53:29 +01:00
Martin Kroeker
4060dd43e3 Add dummy implementations of openblas_get/set_affinity 2024-11-15 15:16:17 -08:00
Martin Kroeker
8a1710dd0d don't apply switch_ratio to tail of loop 2024-10-06 20:03:32 +02:00
Martin Kroeker
de421b7764 Merge pull request #4904 from XiWeiGu/la64_cross_cmake
LoongArch64: Enable cmake cross-compilation
2024-10-03 15:53:57 +02:00
gxw
30af9278dc LoongArch64: Enable cmake cross-compilation 2024-09-29 10:13:30 +08:00
gxw
48698b2b1d LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Martin Kroeker
3ee9e9d8d0 Merge pull request #4879 from martin-frbg/issue4868-2
Ensure a memory buffer has been allocated for each thread before invoking it (take 2)
2024-08-15 22:06:54 +02:00
Martin Kroeker
a8d6b0219a Merge pull request #4877 from XiWeiGu/fixed_undefined_blas_set_parameter
Fixed the undefined reference to blas_set_parameter
2024-08-15 15:35:26 +02:00
Martin Kroeker
d24b3cf393 properly fix buffer allocation and assignment 2024-08-15 15:32:58 +02:00
gxw
fd033467ac Fixed the undefined reference to blas_set_parameter
Fixed the undefined reference to blas_set_parameter when
enabling USE_OPENMP and DYNAMIC_ARCH.
2024-08-15 16:48:48 +08:00
Martin Kroeker
23b5d66a86 Ensure a memory buffer has been allocated for each thread before invoking it 2024-08-14 10:35:44 +02:00
Martin Kroeker
753c7ebe17 Merge pull request #4835 from martin-frbg/revertwin4359
Temporarily revert to the coarse-grained locking in the Windows thread server
2024-08-07 14:09:32 +02:00
Martin Kroeker
50397e017a Merge pull request #4838 from martin-frbg/fix4662-3
fix invalid ifdef syntax in HUGETLB handling
2024-08-04 11:32:10 +02:00
Martin Kroeker
5257f807a9 fix invalid ifdef syntax in HUGETLB handling 2024-08-04 00:03:17 +02:00
Martin Kroeker
2aed90171a Add riscv sources for DYNAMIC_ARCH 2024-08-03 23:58:10 +02:00
Martin Kroeker
6468dc1142 restore the coarse locking of the pre-4359 version 2024-08-02 16:39:47 +02:00
yamazaki-mitsufumi
821ef34635 Add A64FX to the list of CPUs supported by DYNAMIC_ARCH 2024-07-23 20:44:39 +09:00
Martin Kroeker
a815594fd1 Merge pull request #4801 from markdryan/markdryan/riscv-dynamic-arch
Add autodetection for riscv64
2024-07-19 17:12:07 +02:00
Martin Kroeker
a373d0f107 Improve the error message for thread creation failure 2024-07-15 18:32:21 +02:00
Mark Ryan
3b715e6162 Add autodetection for riscv64
Implement DYNAMIC_ARCH support for riscv64.  Three cpu types are
supported, riscv64_generic, riscv64_zvl256b, riscv64_zvl128b.
The two non-generic kernels require CPU support for RVV 1.0 to
function correctly.  Detecting that a riscv64 device supports
RVV 1.0 is a little complicated as there are some boards on the
market that advertise support for V via hwcap but only support
RVV 0.7.1, which is not binary compatible with RVV 1.0.  The
approach taken is to first try hwprobe.  If hwprobe is not
available, we fall back to hwcap + an additional check to distinguish
between RVV 1.0 and RVV 0.7.1.

Tested on a VM with VLEN=256, a CanMV K230 with VLEN=128 (with only
the big core enabled), a Lichee Pi with RVV 0.7.1 and a VF2 with no
vector.

A compiler with RVV 1.0 support must be used to build OpenBLAS for
riscv64 when DYNAMIC_ARCH=1.

Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:22 +00:00
Martin Kroeker
d0b9948b23 Guard against invalid thread_status.queue 2024-06-30 19:31:15 +02:00
Martin Kroeker
7e9a4ba427 Merge pull request #4741 from shivammonaka/Pthread_Scalability_Improvement
Enhancing Core Utilization in BLAS Calls: A Scalable Architecture
2024-06-20 13:36:23 +02:00
Martin Kroeker
9b2a0c79cb Add Zhaoxin KX7000 2024-06-20 09:23:08 +02:00
shivammonaka
9e22d70957 Dynamic locking in Pthread Backend to allow multiple BLAS calls to be executed parallelly 2024-06-07 08:40:17 +05:30
Martin Kroeker
db070a9223 add gemm_batch drivers 2024-05-31 18:29:27 +02:00
Martin Kroeker
d0794f88dc add gemm_batch driver 2024-05-29 15:49:20 +02:00
Martin Kroeker
0073affe63 Merge pull request #4693 from goplanid/locks-improvement
Lock Management Improvements for Memory Allocation Efficiency
2024-05-26 23:14:52 +02:00
Martin Kroeker
6ca9ffa7f5 Merge pull request #4655 from yamazakimitsufumi/update_2d_thread_distribution
Expanding the scope of 2D thread distribution to improve multi-threaded DGEMM performance
2024-05-14 18:12:43 +02:00
Deeksha Goplani
0dc80a5c8d locks improvement 2024-05-13 22:17:23 +05:30
Martin Kroeker
8da6f7e5f2 Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6
Loongarch64: Improving the Performance and Stability of dgemm
2024-05-10 11:29:12 +02:00
gxw
637c650f4f loongarch64: Add buffer offset for target LOONGSON3R5 2024-05-10 11:42:53 +08:00
Martin Kroeker
5500b4ab26 Merge pull request #4680 from theAeon/develop
Expose whether locking is enabled in get_config
2024-05-08 19:03:57 +02:00