Chris Sidebottom
eaaa628af2
Enable BUILD_BFLOAT16 in cirun
2025-07-07 10:20:17 +00:00
Chris Sidebottom
7a97c4ca97
Rename HALF -> BFLOAT16 in some more places
2025-07-07 10:13:39 +00:00
Martin Kroeker
ee6560c89f
Merge pull request #5360 from sertonix/cpuid-arm
...
Fix cpuid.S on arm
2025-07-07 07:41:56 +02:00
Sertonix
8d11e4630c
Fix cpuid.S on arm
...
The ARM assembly syntax differs a bit
Fixes 61b9339d3a getarch/cpuid.S: Fix warning about executable stack
Signed-off-by: Sertonix <sertonix@posteo.net >
2025-07-06 23:48:10 +02:00
Martin Kroeker
03a4afcf14
Merge pull request #5359 from martin-frbg/gitign_isnan
...
update gitignore configuration
2025-07-05 22:26:55 +02:00
Martin Kroeker
901de8f33a
remove lapacke_mangling.h and add la_xisnan.mod
2025-07-05 20:35:16 +02:00
Martin Kroeker
ce6991780a
Merge pull request #5356 from ilina-linaro/ilina-woa
...
Update README.md to include Windows on Arm64
2025-07-05 19:07:45 +02:00
Martin Kroeker
df013c5e28
Merge pull request #5358 from iha-taisei/dot_unroll
...
Performance improvements of [SD]DOT with loop-unrolling on A64FX
2025-07-04 23:38:32 +02:00
Iha, Taisei
f7ad906b49
Performance improvements of [SD]DOT with loop-unrolling on A64FX
2025-07-04 22:57:44 +09:00
Lina Iyer
7f360001f9
Update README.md to include Windows on Arm64
...
Update README.md to indicate that binaries are available for Windows on ARM64
2025-07-03 07:15:20 -06:00
Martin Kroeker
36c2589d3a
Merge pull request #5355 from tetsuzo-usui/add_parallel_laed3
...
Improve [SD]SYEVD performance by parallelizing [SD]LAED3
2025-07-02 09:14:03 +02:00
Usui, Tetsuzo
14107e37d9
Add parallel laed3
2025-07-01 22:12:27 +09:00
Martin Kroeker
a06bcf836b
Merge pull request #5353 from nakagawa-fj/feature/gemm_divide_rate_for_A64FX
...
Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for A64FX
2025-07-01 14:06:53 +02:00
Masato Nakagawa
5253c8f165
Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for
...
A64FX.
2025-06-30 21:35:16 +09:00
Martin Kroeker
8f0a1a3f82
Merge pull request #5303 from martin-frbg/issue5289
...
Exit if memory allocation keeps failing, instead of retrying forever
2025-06-29 22:47:56 +02:00
Martin Kroeker
2c0dd2468e
Merge pull request #5350 from martin-frbg/issue5341
...
Declare the server_lock mutex volatile in addition to static
2025-06-29 21:10:18 +02:00
Martin Kroeker
7ae24d0b85
Merge pull request #5351 from martin-frbg/lapack1140
...
Fix documentation error and ordering bug in ?LAED/?LASD (Reference-LAPACK PR 1140)
2025-06-29 19:20:17 +02:00
Martin Kroeker
5aeca597fe
Fix documentation error and ordering bug (Reference-LAPACK PR 1140)
2025-06-29 17:42:15 +02:00
Martin Kroeker
dcb289539b
Merge pull request #5344 from MaartenBaert/fix-dlasd7
...
LAPACK: Fix documentation error and ordering bug in DLASD7
2025-06-29 17:39:41 +02:00
Martin Kroeker
9bcffbd655
Declare the server_lock mutex volatile in addition to static
2025-06-29 15:42:43 +02:00
Martin Kroeker
334cd242d4
Merge pull request #5348 from hideaki-motoki/issue5343_prefered_size_for_a64fx
...
Setting `GEMM_PREFERED_SIZE` parameter for `A64FX`
2025-06-27 14:57:37 +02:00
h-motoki
bba75d5e45
GEMM_PREFERED_SIZE parameter has been changed for A64FX.
2025-06-27 19:37:36 +09:00
Martin Kroeker
4062c10370
Merge pull request #5345 from OpenMathLib/revert-5251-issue5250
...
Revert "Fix out-of-bounds accesses in ?/SCAL/?GEEV triggered by preceding errrors/invalid inputs"
2025-06-27 09:45:10 +02:00
Martin Kroeker
b78d1dc0ae
Merge pull request #5342 from martin-frbg/cmake_ampere
...
Add CMake build settings for the Ampere One cpu
2025-06-26 18:46:33 +02:00
Martin Kroeker
83a01d29ca
Revert "Fix out-of-bounds accesses in ?/SCAL/?GEEV triggered by preceding errrors/invalid inputs"
2025-06-26 17:47:20 +02:00
Martin Kroeker
560fa88c96
Add cross-build parameters for Ampere One
2025-06-26 10:57:30 +02:00
Martin Kroeker
55bb5ef867
Add compiler options for Ampere One
2025-06-26 10:50:44 +02:00
Maarten Baert
b37889e52d
Merge branch 'OpenMathLib:develop' into fix-dlasd7
2025-06-26 09:29:07 +02:00
Martin Kroeker
11ce79a4f0
Merge pull request #5329 from foxtran/fix/docs
...
Update FAQ
2025-06-25 16:44:44 +02:00
Maarten Baert
0904a42fa4
Fix documentation error and ordering bug in DLASD7
2025-06-25 15:50:41 +02:00
Martin Kroeker
d24195e9a1
Merge pull request #5295 from Pengzhou0810/develop
...
Fix some hyperthreading errors.
2025-06-25 11:09:46 +02:00
zhoupeng
134b21ae60
Fix some hyperthreading errors.
...
When there are multiple NUMA nodes and hyper-threading causes adjacent logical cores to share a physical core (e.g., common -> avail[i] = 0x5555555555555555UL), the numa_mapping function should not use a bitmask for filtering, as this would lead to redundant masking with the subsequent local_cpu_map function.
2025-06-25 09:52:26 +08:00
Martin Kroeker
d96daa220d
Merge pull request #5290 from Srangrang/develop
...
Add support for FP16 to openBLAS and shgemm on RISCV
2025-06-24 23:10:15 +02:00
Martin Kroeker
fdc1c32340
Merge pull request #5336 from martin-frbg/issue5332
...
Use response files on old PPC/Intel Macs in single-target builds too
2025-06-24 21:58:58 +02:00
Martin Kroeker
5aa483e16c
Use response files on old PPC/Intel Macs in single-target builds too
2025-06-24 17:37:34 +02:00
Martin Kroeker
12591caa91
Merge pull request #5334 from azuresky01/develop
...
Fix INTERFACE64 builds on Loongarch64 with LLVM
2025-06-24 16:09:25 +02:00
Martin Kroeker
ee26caffb3
Merge pull request #5309 from davidz-ampere/dev-ampereone
...
Add support for Ampere AmpereOne processors
2025-06-24 12:27:08 +02:00
Martin Kroeker
8b08df5c5a
Merge pull request #5335 from martin-frbg/issue5330
...
Remove non-portable option from objcopy calls in the CMake build
2025-06-24 12:25:46 +02:00
Martin Kroeker
3bba35b8f7
Remove non-portable option from objcopy calls
2025-06-24 09:01:47 +02:00
azuresky01
8953ba9c2f
Fix INTERFACE64 builds on Loongarch64 with LLVM
...
fix https://github.com/OpenMathLib/OpenBLAS/issues/5331
2025-06-24 14:27:15 +08:00
davidz-ampere
aa90ab4142
Add support for Ampere AmpereOne processors
2025-06-24 00:12:34 -04:00
Igor S. Gerasimov
46b0dfef8f
Use links to issues
2025-06-21 11:35:02 +02:00
Igor S. Gerasimov
83efceb3cd
Keep dgemm_snb_1thread.png in repo
2025-06-21 11:24:42 +02:00
Martin Kroeker
b4945057b7
Merge pull request #5319 from imciner2/im/armtypes
...
Update SBGEMM neoversev1 kernel to use standard C types
2025-06-20 06:02:22 -07:00
Martin Kroeker
b3904aeed7
Merge pull request #5323 from imciner2/im/ofast
...
Switch power to use O3 instead of Ofast
2025-06-20 05:55:21 -07:00
Ian McInerney
721c80644b
Switch power to use O3 instead of Ofast
...
Ofast enables possibly unsafe optimizations in addition to O3. This
appears to have been added and then just continually copied into later
Power architectures, and it wasn't included in the CMake build system
when that was introduced.
Replace this with O3 so that the same level of optimization is done by
the compiler.
2025-06-20 09:23:05 +01:00
Ian McInerney
badef1d32e
Update sbgemm_tcopy_4_neoversev1 kernel to use standard C types
2025-06-19 14:26:16 +01:00
Martin Kroeker
4e6da5ed34
Update version to 0.3.30.dev
2025-06-19 11:57:35 +02:00
Martin Kroeker
8dff37827e
Update version to 0.3.30.dev
2025-06-19 11:56:55 +02:00
Martin Kroeker
c055c36b40
Merge pull request #5317 from OpenMathLib/release-0.3.0
...
merge back from 0.3.0 to copy tag
2025-06-19 02:56:01 -07:00