Commit Graph

831 Commits

Author SHA1 Message Date
Martin Kroeker
ebc3eaf80b Need strings.h for strncasecmp prototype 2026-01-29 22:21:46 +01:00
Martin Kroeker
80995622dd Rename the DllMain copy used in static linking to OpenBLASDllMain 2026-01-22 11:19:04 +01:00
lujiaweics
1f3b81e562 Serialize accesses to parallelized syrk functions from multiple callers, like it was already done for GEMM in level3_thread.c and GEMM3M in level3_gemm3m_thread.c 2026-01-20 21:31:35 +08:00
Martin Kroeker
14594773a0 Merge pull request #5615 from martin-frbg/issue5607
Some checks failed
apple m / build (cmake, gfortran, 0, 0) (push) Has been cancelled
apple m / build (cmake, gfortran, 0, 1) (push) Has been cancelled
apple m / build (cmake, gfortran, 1, 0) (push) Has been cancelled
apple m / build (cmake, gfortran, 1, 1) (push) Has been cancelled
apple m / build (make, gfortran, 0, 0) (push) Has been cancelled
apple m / build (make, gfortran, 0, 1) (push) Has been cancelled
apple m / build (make, gfortran, 1, 0) (push) Has been cancelled
apple m / build (make, gfortran, 1, 1) (push) Has been cancelled
arm64 graviton cirun / build (cmake, gfortran) (push) Has been cancelled
arm64 graviton cirun / build (make, gfortran) (push) Has been cancelled
c910v qemu test / TEST (riscv64-linux-gnu, NO_SHARED=1 TARGET=C910V, C910V, riscv64-unknown-linux-gnu) (push) Has been cancelled
c910v qemu test / TEST (riscv64-linux-gnu, NO_SHARED=1 TARGET=RISCV64_GENERIC, RISCV64_GENERIC, riscv64-linux-gnu) (push) Has been cancelled
Run codspeed benchmarks / benchmarks (make, gfortran, ubuntu-22.04, 3.12) (push) Has been cancelled
Publish docs via GitHub Pages / Deploy docs (push) Has been cancelled
continuous build / build (cmake, clang, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, macos-latest) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, clang, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang-21, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, clang-21, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, clang-21, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, gcc, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (cmake, gcc, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (cmake, gcc, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang, gfortran, macos-latest) (push) Has been cancelled
continuous build / build (make, clang, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, clang, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang-21, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, clang-21, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, clang-21, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, gcc, flang, ubuntu-latest) (push) Has been cancelled
continuous build / build (make, gcc, gfortran, ubuntu-24.04-arm) (push) Has been cancelled
continuous build / build (make, gcc, gfortran, ubuntu-latest) (push) Has been cancelled
continuous build / msys2 (None, fc, int32, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, CLANG64, mingw-w64-clang-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, MINGW32, mingw-w64-i686) (push) Has been cancelled
continuous build / msys2 (Release, fc, int32, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int64, -DBINARY=64 -DINTERFACE64=1, CLANG64, mingw-w64-clang-x86_64) (push) Has been cancelled
continuous build / msys2 (Release, fc, int64, -DBINARY=64 -DINTERFACE64=1, UCRT64, mingw-w64-ucrt-x86_64) (push) Has been cancelled
continuous build / cross_build (DYNAMIC_ARCH=1 TARGET=GENERIC, mips64el, mips64el-linux-gnuabi64) (push) Has been cancelled
continuous build / cross_build (TARGET=EV4, alpha, alpha-linux-gnu) (push) Has been cancelled
continuous build / cross_build (TARGET=MIPS1004K, mipsel, mipsel-linux-gnu) (push) Has been cancelled
continuous build / cross_build (TARGET=RISCV64_GENERIC, riscv64, riscv64-linux-gnu) (push) Has been cancelled
continuous build / neoverse_build (push) Has been cancelled
harmonyos / build (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=GENERIC, DYNAMIC_ARCH, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA264, LA264, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA464, LA464, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA64_GENERIC, LA64_GENERIC, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON2K1000, LOONGSON2K1000, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON3R5, LOONGSON3R5, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSONGENERIC, LOONGSONGENERIC, loongarch64-linux-gnu) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=GENERIC, DYNAMIC_ARCH) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA264, LA264) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA464, LA464) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LA64_GENERIC, LA64_GENERIC) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON2K1000, LOONGSON2K1000) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSON3R5, LOONGSON3R5) (push) Has been cancelled
loongarch64 clang qemu test / TEST (NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=LOONGSONGENERIC, LOONGSONGENERIC) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=I6400, I6400, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=I6500, I6500, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=MIPS64_GENERIC, MIPS64_GENERIC, mips64el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=P6600, P6600, mipsisa64r6el-linux-gnuabi64) (push) Has been cancelled
mips64 qemu test / TEST (NO_SHARED=1 TARGET=SICORTEX, SICORTEX, mips64el-linux-gnuabi64) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_GENERIC BINARY=64 ARCH=riscv64 DYNAMIC_ARCH=1, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64, DYNAMIC_ARCH=1) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_ZVL128B BINARY=64 ARCH=riscv64, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=128,elen=64, RISCV64_ZVL128B) (push) Has been cancelled
riscv64 zvl256b qemu test / TEST (TARGET=RISCV64_ZVL256B BINARY=64 ARCH=riscv64 BUILD_BFLOAT16=1 BUILD_HFLOAT16=1, rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64,zfh=true,zvfh=true,zvfbfwma=true, RISCV64_ZVL256B) (push) Has been cancelled
Windows ARM64 CI / build (push) Has been cancelled
Nightly-Homebrew-Build / build-OpenBLAS-with-Homebrew (push) Has been cancelled
Fix building without multithreading or LAPACK
2026-01-19 00:35:33 +01:00
Martin Kroeker
8742434212 Include thread callback replacement hook in singlethreaded builds as well 2026-01-18 19:55:40 +01:00
Erik Schnetter
55e853a698 Avoid integer overflow in dynamic_riscv64.c
Closes https://github.com/OpenMathLib/OpenBLAS/issues/5608.
2026-01-16 10:36:53 -05:00
Martin Kroeker
6f225daf94 make VORTEXM4 MacOS-only for now 2026-01-15 19:33:29 +01:00
Martin Kroeker
2d46f1ec65 Merge branch 'develop' into issue5414 2026-01-09 15:04:06 +01:00
Martin Kroeker
eb098f67d3 Revert "[WIP,Testing] remove the lock around the thread shutdown function aga…"
This reverts commit ef6f97624b.
2025-12-13 23:49:46 +01:00
mayeut
396137ff27 revert locks introduced in #5170 2025-11-30 07:27:24 +01:00
Martin Kroeker
4af187080a Only add dedicated VORTEXM4 if building with LLVM 2025-11-24 22:15:45 +01:00
Martin Kroeker
ea85b6696f Merge branch 'OpenMathLib:develop' into issue5414 2025-11-23 10:14:07 +01:00
Martin Kroeker
f00c0d0827 CYGWIN builds currently require blas_server_win32 2025-11-07 12:20:31 +01:00
Martin Kroeker
ef6f97624b [WIP,Testing] remove the lock around the thread shutdown function again (#5479)
* remove the lock around the thread shutdown function - server is locked already here
2025-10-30 19:12:47 +01:00
Martin Kroeker
b2b9abc20b Revert "Enhancing Core Utilization in BLAS Calls: A Scalable Architecture" 2025-10-22 15:44:43 +02:00
Martin Kroeker
5b640b1cbc add bgemm_thread_xx 2025-10-16 10:03:04 -07:00
Martin Kroeker
a9a152ebc7 fix bgemv build 2025-10-16 10:00:41 -07:00
Martin Kroeker
a387217a07 Add BGEMV 2025-10-16 05:02:24 -07:00
Martin Kroeker
c92bac1524 Add SHGEMV 2025-10-16 04:57:18 -07:00
Martin Kroeker
20f5ed1a94 Merge branch 'OpenMathLib:develop' into issue5414 2025-10-08 05:27:28 -07:00
Chris Sidebottom
37fc3bbca0 Add Infrastructure for SHGEMV
This adds all the relevant bits and pieces to add a `shgemv` path as
well as a future `hgemm`/`hgemv` path in a similar model to `sb` and `b`
interfaces.

I've also fixed a few bits and pieces around `shgemm` which didn't build
in a few situations.
2025-10-07 15:03:24 +00:00
Martin Kroeker
fc516af155 Merge branch 'develop' into issue5414 2025-10-01 14:12:59 -07:00
Chip Kerchner
fc7d6e65a1 Change BF16 warning message. 2025-09-23 20:26:44 +00:00
Chip Kerchner
9427eaf4c4 Reduce flags for BF16 to only needed ones. 2025-09-23 20:24:40 +00:00
Chip Kerchner
3116749717 Disable bf16 flags on RISC-V unless BUILD_BFLOAT16=1 2025-09-23 15:02:20 +00:00
Martin Kroeker
66cc27e75f Add message for fallback due to unavailable Zfh extension 2025-09-10 10:57:02 +02:00
Martin Kroeker
5ab143736f Merge pull request #5431 from markdryan/markdryan/riscv-hf16-fix
disable fp16 flags on RISC-V unless BUILD_HFLOAT16=1
2025-09-09 14:51:47 -07:00
Martin Kroeker
2fee943edb Add CMake build support for IBM Z (#5440)
* Add ZARCH support, including DYNAMIC_ARCH
2025-09-09 22:18:51 +02:00
Martin Kroeker
7b1f9bedf8 clean up duplicate assignment of cpus newer than POWER10 2025-09-08 15:11:57 +02:00
Mark Ryan
7fcad02dc2 fix RVV 1.0 detection code
There were a couple of issues with the detection code used to check
for RVV 1.0 on kernels that do not support hwprobe.

1. The vtype clobber was missing
2. The wrong form of vsetvli was being used. The vsetvli x0, x0 form
   is inappropriate for this use case as it can only be safely used
   in code where the value of vtype is known.  The use of vsetvli
   x0, x0 here can lead to a failure to detect RVV 1.0, if,
   for example, the vill bit happens to be set before
   detect_riscv64_rvv100 is called.

We fix both issues by adding the missing clobber and replacing the
first parameter to vsetvli with t0 (which we add to our clobbers).
2025-08-28 14:20:37 +00:00
Mark Ryan
ce79fe12fd disable fp16 flags on RISC-V unless BUILD_HFLOAT16=1
The compiler options that enable 16 bit floating point instructions
should not be enabled by default when building the RISCV64_ZVL128B
and RISCV64_ZVL256B targets.  The zfh and zvfh extensions are not part
of the 'V' extension and are not required by any of the RVA profiles.
There's no guarantee that kernels built with zfh and zvfh will work
correctly on fully compliant RVA23U64 devices.

To fix the issue we only build the RISCV64_ZVL128B and RISCV64_ZVL256B
kernels with the half float flags if BUILD_HFLOAT16=1.  We also update
the RISC-V dynamic detection code to disable the RISCV64_ZVL128B and
RISCV64_ZVL256B kernels at runtime if we've built with DYNAMIC_ARCH=1
and BUILD_HFLOAT16=1 and are running on a device that does not support
both Zfh and Zvfh.

Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/5428
2025-08-28 09:41:07 +00:00
Martin Kroeker
18f9582f3e Add VORTEXM4 2025-08-18 01:54:09 -07:00
Masato Nakagawa
7e29f11396 Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1) 2025-07-29 18:54:36 +09:00
youcai
41f9701ebc Fix cmake building with cblas_bgemm 2025-07-23 22:10:53 +08:00
Chris Sidebottom
e105411460 Add infrastructure for bgemv/bscal
- Sets up all the various entrypoints for `bgemv`
- Adds `bscal` for use in the `bgemv` interface
- Adds test cases for comparing `sgemv` and `bgemv`
- Adds generic kernels for `bgemv_n` and `bgemv_t` which are accurate
enough to pass above tests
2025-07-15 14:48:57 +01:00
Martin Kroeker
b37516add6 Add BGEMM parameters 2025-07-10 14:59:01 +02:00
Chris Sidebottom
48394384ef Use correct constants for per-target BGEMM/SBGEMM
This fixes the build and tests on `NEOVERSEV1` target, which was failing
with specific constants for `SBGEMM`

Co-authored-by: Ye Tao <ye.tao@arm.com>
2025-07-08 16:23:27 +01:00
Chris Sidebottom
f95e7b0e32 Add infrastructure for BGEMM
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.

Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287

Co-authored-by: Ye Tao <ye.tao@arm.com>
2025-07-08 16:22:41 +01:00
Martin Kroeker
3d31887073 Merge pull request #5362 from Mousius/fix-bf16
Fix SBGEMM BFLOAT16 build
2025-07-08 14:35:50 +02:00
Martin Kroeker
0ddf8ebd42 Merge pull request #5354 from pratiklp00/p11
Add Support for POWER11
2025-07-08 11:52:18 +02:00
Chris Sidebottom
7a97c4ca97 Rename HALF -> BFLOAT16 in some more places 2025-07-07 10:13:39 +00:00
Masato Nakagawa
5253c8f165 Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for
A64FX.
2025-06-30 21:35:16 +09:00
Martin Kroeker
8f0a1a3f82 Merge pull request #5303 from martin-frbg/issue5289
Exit if memory allocation keeps failing, instead of retrying forever
2025-06-29 22:47:56 +02:00
Martin Kroeker
9bcffbd655 Declare the server_lock mutex volatile in addition to static 2025-06-29 15:42:43 +02:00
pratiklp00
1dde4a13c0 p11 changes 2025-06-26 00:03:38 -05:00
zhoupeng
134b21ae60 Fix some hyperthreading errors.
When there are multiple NUMA nodes and hyper-threading causes adjacent logical cores to share a physical core (e.g., common -> avail[i] = 0x5555555555555555UL), the numa_mapping function should not use a bitmask for filtering, as this would lead to redundant masking with the subsequent local_cpu_map function.
2025-06-25 09:52:26 +08:00
Martin Kroeker
d96daa220d Merge pull request #5290 from Srangrang/develop
Add support for FP16 to openBLAS and shgemm on RISCV
2025-06-24 23:10:15 +02:00
Martin Kroeker
e541bf68f5 support AmpereOne/OneA as NeoverseN1 2025-06-18 09:54:08 +02:00
Srangrang
9f13b2c6ac style: modify HALF to BFLOAT16 in benchmark folder 2025-06-15 20:57:05 +08:00
Martin Kroeker
31ef2cbbb3 Exit if memory allocation keeps failing, instead of looping forever 2025-06-13 14:11:03 +02:00