Martin Kroeker
d030f81380
Merge pull request #5369 from martin-frbg/lapack1144
...
Fix workspace allocation in LAPACKE strsen/dtrsen (Reference-LAPACK PR 1144)
2025-07-10 10:46:15 +02:00
Martin Kroeker
b746f0eda3
Allocate IWORK to hold at least the one element for workspace queries
2025-07-10 08:58:16 +02:00
Martin Kroeker
b8f66ba0ee
Merge pull request #5367 from Mousius/bgemm-init
...
Temporarily disable test_bgemm
2025-07-10 00:57:41 +02:00
Martin Kroeker
cdebb4fd4b
Merge pull request #5365 from martin-frbg/issue5324
...
Fix arm64 HAVE_SME setting for DYNAMIC_ARCH builds using CMake
2025-07-09 22:50:54 +02:00
Martin Kroeker
ff614575c9
Fix arm64 HAVE_SME setting for DYNAMIC_ARCH builds
2025-07-09 14:44:25 +02:00
Martin Kroeker
0e11537cab
Merge pull request #5357 from Mousius/bgemm-init
...
Add infrastructure for BGEMM
2025-07-09 09:34:58 +02:00
Chris Sidebottom
8cd4be8d47
Temporarily disable test_bgemm
2025-07-09 08:27:18 +01:00
Chris Sidebottom
66d9185ebe
Fix CMake support
2025-07-08 22:49:55 +00:00
Martin Kroeker
98aefb70b4
Merge pull request #5292 from isharif168/optimized_gemv_n_1x3
...
Optimize gemv_n_sve_v1x3 kernel
2025-07-08 21:05:43 +02:00
Martin Kroeker
fd37406817
Merge branch 'develop' into optimized_gemv_n_1x3
2025-07-08 21:05:30 +02:00
Chris Sidebottom
48394384ef
Use correct constants for per-target BGEMM/SBGEMM
...
This fixes the build and tests on `NEOVERSEV1` target, which was failing
with specific constants for `SBGEMM`
Co-authored-by: Ye Tao <ye.tao@arm.com >
2025-07-08 16:23:27 +01:00
Chris Sidebottom
73bf0b941a
Add bgemm to gensymbol
2025-07-08 16:22:43 +01:00
Chris Sidebottom
f95e7b0e32
Add infrastructure for BGEMM
...
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.
Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287
Co-authored-by: Ye Tao <ye.tao@arm.com >
2025-07-08 16:22:41 +01:00
Martin Kroeker
15d6e58510
Merge pull request #5364 from martin-frbg/blashalf
...
change BLAS_HALF to BLAS_BFLOAT16 in parallelized POTRF (another missed rename)
2025-07-08 17:14:50 +02:00
Martin Kroeker
04bb5acd79
change BLAS_HALF to BLAS_BFLOAT16 (another missed rename)
2025-07-08 14:40:22 +02:00
Martin Kroeker
3d31887073
Merge pull request #5362 from Mousius/fix-bf16
...
Fix SBGEMM BFLOAT16 build
2025-07-08 14:35:50 +02:00
Martin Kroeker
0ddf8ebd42
Merge pull request #5354 from pratiklp00/p11
...
Add Support for POWER11
2025-07-08 11:52:18 +02:00
Martin Kroeker
d2ea9bbb6d
Merge pull request #5363 from guoyuanplct/develop
...
Update CONTRIBUTORS.md
2025-07-08 11:47:18 +02:00
guoyuanplct
4ff549a450
Update CONTRIBUTORS.md
2025-07-08 17:16:51 +08:00
guoyuanplct
309c48e327
Update CONTRIBUTORS.md
2025-07-08 17:13:27 +08:00
Chris Sidebottom
552e1c7a7a
Correct compiler flags for NEOVERSEV1 target
2025-07-07 11:26:36 +00:00
Chris Sidebottom
46b9b7a080
Also enable BFLOAT16 for make cirun
2025-07-07 10:41:12 +00:00
Chris Sidebottom
eaaa628af2
Enable BUILD_BFLOAT16 in cirun
2025-07-07 10:20:17 +00:00
Chris Sidebottom
7a97c4ca97
Rename HALF -> BFLOAT16 in some more places
2025-07-07 10:13:39 +00:00
Martin Kroeker
ee6560c89f
Merge pull request #5360 from sertonix/cpuid-arm
...
Fix cpuid.S on arm
2025-07-07 07:41:56 +02:00
Sertonix
8d11e4630c
Fix cpuid.S on arm
...
The ARM assembly syntax differs a bit
Fixes 61b9339d3a getarch/cpuid.S: Fix warning about executable stack
Signed-off-by: Sertonix <sertonix@posteo.net >
2025-07-06 23:48:10 +02:00
Martin Kroeker
03a4afcf14
Merge pull request #5359 from martin-frbg/gitign_isnan
...
update gitignore configuration
2025-07-05 22:26:55 +02:00
Martin Kroeker
901de8f33a
remove lapacke_mangling.h and add la_xisnan.mod
2025-07-05 20:35:16 +02:00
Martin Kroeker
ce6991780a
Merge pull request #5356 from ilina-linaro/ilina-woa
...
Update README.md to include Windows on Arm64
2025-07-05 19:07:45 +02:00
Martin Kroeker
df013c5e28
Merge pull request #5358 from iha-taisei/dot_unroll
...
Performance improvements of [SD]DOT with loop-unrolling on A64FX
2025-07-04 23:38:32 +02:00
Iha, Taisei
f7ad906b49
Performance improvements of [SD]DOT with loop-unrolling on A64FX
2025-07-04 22:57:44 +09:00
Lina Iyer
7f360001f9
Update README.md to include Windows on Arm64
...
Update README.md to indicate that binaries are available for Windows on ARM64
2025-07-03 07:15:20 -06:00
Martin Kroeker
36c2589d3a
Merge pull request #5355 from tetsuzo-usui/add_parallel_laed3
...
Improve [SD]SYEVD performance by parallelizing [SD]LAED3
2025-07-02 09:14:03 +02:00
Usui, Tetsuzo
14107e37d9
Add parallel laed3
2025-07-01 22:12:27 +09:00
Martin Kroeker
a06bcf836b
Merge pull request #5353 from nakagawa-fj/feature/gemm_divide_rate_for_A64FX
...
Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for A64FX
2025-07-01 14:06:53 +02:00
Masato Nakagawa
5253c8f165
Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for
...
A64FX.
2025-06-30 21:35:16 +09:00
Martin Kroeker
8f0a1a3f82
Merge pull request #5303 from martin-frbg/issue5289
...
Exit if memory allocation keeps failing, instead of retrying forever
2025-06-29 22:47:56 +02:00
Martin Kroeker
2c0dd2468e
Merge pull request #5350 from martin-frbg/issue5341
...
Declare the server_lock mutex volatile in addition to static
2025-06-29 21:10:18 +02:00
Martin Kroeker
7ae24d0b85
Merge pull request #5351 from martin-frbg/lapack1140
...
Fix documentation error and ordering bug in ?LAED/?LASD (Reference-LAPACK PR 1140)
2025-06-29 19:20:17 +02:00
Martin Kroeker
5aeca597fe
Fix documentation error and ordering bug (Reference-LAPACK PR 1140)
2025-06-29 17:42:15 +02:00
Martin Kroeker
dcb289539b
Merge pull request #5344 from MaartenBaert/fix-dlasd7
...
LAPACK: Fix documentation error and ordering bug in DLASD7
2025-06-29 17:39:41 +02:00
Martin Kroeker
9bcffbd655
Declare the server_lock mutex volatile in addition to static
2025-06-29 15:42:43 +02:00
Martin Kroeker
334cd242d4
Merge pull request #5348 from hideaki-motoki/issue5343_prefered_size_for_a64fx
...
Setting `GEMM_PREFERED_SIZE` parameter for `A64FX`
2025-06-27 14:57:37 +02:00
h-motoki
bba75d5e45
GEMM_PREFERED_SIZE parameter has been changed for A64FX.
2025-06-27 19:37:36 +09:00
Martin Kroeker
4062c10370
Merge pull request #5345 from OpenMathLib/revert-5251-issue5250
...
Revert "Fix out-of-bounds accesses in ?/SCAL/?GEEV triggered by preceding errrors/invalid inputs"
2025-06-27 09:45:10 +02:00
Martin Kroeker
b78d1dc0ae
Merge pull request #5342 from martin-frbg/cmake_ampere
...
Add CMake build settings for the Ampere One cpu
2025-06-26 18:46:33 +02:00
Martin Kroeker
83a01d29ca
Revert "Fix out-of-bounds accesses in ?/SCAL/?GEEV triggered by preceding errrors/invalid inputs"
2025-06-26 17:47:20 +02:00
Martin Kroeker
560fa88c96
Add cross-build parameters for Ampere One
2025-06-26 10:57:30 +02:00
Martin Kroeker
55bb5ef867
Add compiler options for Ampere One
2025-06-26 10:50:44 +02:00
Maarten Baert
b37889e52d
Merge branch 'OpenMathLib:develop' into fix-dlasd7
2025-06-26 09:29:07 +02:00