Commit Graph

631 Commits

Author SHA1 Message Date
Martin Kroeker
fc516af155 Merge branch 'develop' into issue5414 2025-10-01 14:12:59 -07:00
Chip Kerchner
fc7d6e65a1 Change BF16 warning message. 2025-09-23 20:26:44 +00:00
Chip Kerchner
9427eaf4c4 Reduce flags for BF16 to only needed ones. 2025-09-23 20:24:40 +00:00
Chip Kerchner
3116749717 Disable bf16 flags on RISC-V unless BUILD_BFLOAT16=1 2025-09-23 15:02:20 +00:00
Martin Kroeker
66cc27e75f Add message for fallback due to unavailable Zfh extension 2025-09-10 10:57:02 +02:00
Martin Kroeker
5ab143736f Merge pull request #5431 from markdryan/markdryan/riscv-hf16-fix
disable fp16 flags on RISC-V unless BUILD_HFLOAT16=1
2025-09-09 14:51:47 -07:00
Martin Kroeker
2fee943edb Add CMake build support for IBM Z (#5440)
* Add ZARCH support, including DYNAMIC_ARCH
2025-09-09 22:18:51 +02:00
Martin Kroeker
7b1f9bedf8 clean up duplicate assignment of cpus newer than POWER10 2025-09-08 15:11:57 +02:00
Mark Ryan
7fcad02dc2 fix RVV 1.0 detection code
There were a couple of issues with the detection code used to check
for RVV 1.0 on kernels that do not support hwprobe.

1. The vtype clobber was missing
2. The wrong form of vsetvli was being used. The vsetvli x0, x0 form
   is inappropriate for this use case as it can only be safely used
   in code where the value of vtype is known.  The use of vsetvli
   x0, x0 here can lead to a failure to detect RVV 1.0, if,
   for example, the vill bit happens to be set before
   detect_riscv64_rvv100 is called.

We fix both issues by adding the missing clobber and replacing the
first parameter to vsetvli with t0 (which we add to our clobbers).
2025-08-28 14:20:37 +00:00
Mark Ryan
ce79fe12fd disable fp16 flags on RISC-V unless BUILD_HFLOAT16=1
The compiler options that enable 16 bit floating point instructions
should not be enabled by default when building the RISCV64_ZVL128B
and RISCV64_ZVL256B targets.  The zfh and zvfh extensions are not part
of the 'V' extension and are not required by any of the RVA profiles.
There's no guarantee that kernels built with zfh and zvfh will work
correctly on fully compliant RVA23U64 devices.

To fix the issue we only build the RISCV64_ZVL128B and RISCV64_ZVL256B
kernels with the half float flags if BUILD_HFLOAT16=1.  We also update
the RISC-V dynamic detection code to disable the RISCV64_ZVL128B and
RISCV64_ZVL256B kernels at runtime if we've built with DYNAMIC_ARCH=1
and BUILD_HFLOAT16=1 and are running on a device that does not support
both Zfh and Zvfh.

Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/5428
2025-08-28 09:41:07 +00:00
Martin Kroeker
18f9582f3e Add VORTEXM4 2025-08-18 01:54:09 -07:00
Martin Kroeker
b37516add6 Add BGEMM parameters 2025-07-10 14:59:01 +02:00
Martin Kroeker
0ddf8ebd42 Merge pull request #5354 from pratiklp00/p11
Add Support for POWER11
2025-07-08 11:52:18 +02:00
Martin Kroeker
8f0a1a3f82 Merge pull request #5303 from martin-frbg/issue5289
Exit if memory allocation keeps failing, instead of retrying forever
2025-06-29 22:47:56 +02:00
Martin Kroeker
9bcffbd655 Declare the server_lock mutex volatile in addition to static 2025-06-29 15:42:43 +02:00
pratiklp00
1dde4a13c0 p11 changes 2025-06-26 00:03:38 -05:00
zhoupeng
134b21ae60 Fix some hyperthreading errors.
When there are multiple NUMA nodes and hyper-threading causes adjacent logical cores to share a physical core (e.g., common -> avail[i] = 0x5555555555555555UL), the numa_mapping function should not use a bitmask for filtering, as this would lead to redundant masking with the subsequent local_cpu_map function.
2025-06-25 09:52:26 +08:00
Martin Kroeker
d96daa220d Merge pull request #5290 from Srangrang/develop
Add support for FP16 to openBLAS and shgemm on RISCV
2025-06-24 23:10:15 +02:00
Martin Kroeker
e541bf68f5 support AmpereOne/OneA as NeoverseN1 2025-06-18 09:54:08 +02:00
Martin Kroeker
31ef2cbbb3 Exit if memory allocation keeps failing, instead of looping forever 2025-06-13 14:11:03 +02:00
gkdddd
670ec6f757 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B
Added HFLOAT16 support for RISCV64
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16
The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0

Related to issue #5279
Co-authored-by Linjin Li <linjin_li@163.com>
2025-06-03 20:14:30 +08:00
Martin Kroeker
5141a90993 Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222)
* Fix ARMV9SME target and add support_sme1 code for MacOS
* make sgemm_direct unconditionally available on all arm64
* build a (dummy) sgemm_direct kernel on all arm64





* Update dynamic_arm64.c
2025-05-10 22:39:32 +02:00
Ruiyang Wu
02fd1df10b CMake: Pass OpenMP compiler and linker flags through CMake targets
Using `OpenMP::OpenMP_LANG` targets for CMake is less error-prone than
passing the compiler and linker flags manually. Furthermore, it allows
the user to customize those flags by setting `OpenMP_LANG_FLAGS`,
`OpenMP_LANG_LIB_NAMES`, and `OpenMP_omp_LIBRARY`.
2025-03-26 23:09:54 -04:00
Martin Kroeker
39eb43d441 Improve thread safety of pthreads builds that rely on C11 atomic operations for locking (#5170)
* Tighten memory orders for C11 atomic operations
2025-03-07 13:48:28 +01:00
Martin Kroeker
1533fe49be Merge pull request #5144 from taoye9/dispatch_neoversve2_to_neoversven2
dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting
2025-02-24 16:07:06 +01:00
Ye Tao
f0bea79a6e dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting 2025-02-21 10:30:11 +00:00
Vaisakh K V
f66ca05b31 Merge branch 'develop' into topic/sgemm_direct_sme1 2025-02-13 14:54:37 +05:30
Vaisakh K V
d23eb3b93e Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
2025-02-13 14:51:21 +05:30
Martin Kroeker
a182251284 fix typo 2025-01-02 00:04:33 +01:00
Martin Kroeker
ed95791618 fix conflicting variables 2025-01-01 23:27:38 +01:00
Martin Kroeker
3c3d1c4849 Identify all cores and select the most performant one as TARGET 2025-01-01 22:21:29 +01:00
Ralf Gommers
765ad8bcd2 Fix guard around alloc_hugetlb, fixes compile warning
The warning was:
```
/home/rgommers/code/pixi-dev-scipystack/openblas/OpenBLAS/driver/others/memory.c: At top level:
/home/rgommers/code/pixi-dev-scipystack/openblas/OpenBLAS/driver/others/memory.c:2565:14: warning: 'alloc_hugetlb' defined but not used [-Wunused-function]
 2565 | static void *alloc_hugetlb(void *address){
      |              ^~~~~~~~~~~~~
```

The added define is the same as is already present in the TLS part of
`memory.c`. This follows up on gh-4681.
2024-12-18 09:42:05 +01:00
Ralf Gommers
48caf2303d Fix build warning about discarding volatile qualifier in memory.c
The warning was:
```
[4339/5327] Building C object driver/others/CMakeFiles/driver_others.dir/memory.c.o
/home/rgommers/code/pixi-dev-scipystack/openblas/OpenBLAS/driver/others/memory.c: In function 'blas_shutdown':
/home/rgommers/code/pixi-dev-scipystack/openblas/OpenBLAS/driver/others/memory.c:3257:10: warning: passing argument 1 of 'free' discards 'volatile' qualifier from pointer target type [-Wdiscarded-qualifiers]
 3257 |     free(newmemory);
      |          ^~~~~~~~~
In file included from /home/rgommers/code/pixi-dev-scipystack/openblas/OpenBLAS/common.h:83,
                 from /home/rgommers/code/pixi-dev-scipystack/openblas/OpenBLAS/driver/others/memory.c:74:
/home/rgommers/code/pixi-dev-scipystack/openblas/.pixi/envs/default/x86_64-conda-linux-gnu/sysroot/usr/include/stdlib.h:482:25: note: expected 'void *' but argument is of type 'volatile struct newmemstruct *'
  482 | extern void free (void *__ptr) __THROW;
      |                   ~~~~~~^~~~~
```

The use of `volatile` for `newmemstruct` seems on purpose, and there are
more such constructs in this file. The warning appeared after gh-4451
and is correct. The `free` prototype doesn't expect a volatile pointer,
hence this change adds a cast to silence the warning.
2024-12-18 08:53:29 +01:00
Martin Kroeker
4060dd43e3 Add dummy implementations of openblas_get/set_affinity 2024-11-15 15:16:17 -08:00
Martin Kroeker
de421b7764 Merge pull request #4904 from XiWeiGu/la64_cross_cmake
LoongArch64: Enable cmake cross-compilation
2024-10-03 15:53:57 +02:00
gxw
30af9278dc LoongArch64: Enable cmake cross-compilation 2024-09-29 10:13:30 +08:00
gxw
48698b2b1d LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Martin Kroeker
3ee9e9d8d0 Merge pull request #4879 from martin-frbg/issue4868-2
Ensure a memory buffer has been allocated for each thread before invoking it (take 2)
2024-08-15 22:06:54 +02:00
Martin Kroeker
a8d6b0219a Merge pull request #4877 from XiWeiGu/fixed_undefined_blas_set_parameter
Fixed the undefined reference to blas_set_parameter
2024-08-15 15:35:26 +02:00
Martin Kroeker
d24b3cf393 properly fix buffer allocation and assignment 2024-08-15 15:32:58 +02:00
gxw
fd033467ac Fixed the undefined reference to blas_set_parameter
Fixed the undefined reference to blas_set_parameter when
enabling USE_OPENMP and DYNAMIC_ARCH.
2024-08-15 16:48:48 +08:00
Martin Kroeker
23b5d66a86 Ensure a memory buffer has been allocated for each thread before invoking it 2024-08-14 10:35:44 +02:00
Martin Kroeker
753c7ebe17 Merge pull request #4835 from martin-frbg/revertwin4359
Temporarily revert to the coarse-grained locking in the Windows thread server
2024-08-07 14:09:32 +02:00
Martin Kroeker
50397e017a Merge pull request #4838 from martin-frbg/fix4662-3
fix invalid ifdef syntax in HUGETLB handling
2024-08-04 11:32:10 +02:00
Martin Kroeker
5257f807a9 fix invalid ifdef syntax in HUGETLB handling 2024-08-04 00:03:17 +02:00
Martin Kroeker
2aed90171a Add riscv sources for DYNAMIC_ARCH 2024-08-03 23:58:10 +02:00
Martin Kroeker
6468dc1142 restore the coarse locking of the pre-4359 version 2024-08-02 16:39:47 +02:00
yamazaki-mitsufumi
821ef34635 Add A64FX to the list of CPUs supported by DYNAMIC_ARCH 2024-07-23 20:44:39 +09:00
Martin Kroeker
a815594fd1 Merge pull request #4801 from markdryan/markdryan/riscv-dynamic-arch
Add autodetection for riscv64
2024-07-19 17:12:07 +02:00
Martin Kroeker
a373d0f107 Improve the error message for thread creation failure 2024-07-15 18:32:21 +02:00