Commit Graph

9530 Commits

Author SHA1 Message Date
Rajendra Prasad Matcha
19268471cc Support for SME1 based ssymm_direct kernel for cblas_ssymm level 3 API 2025-09-30 15:05:33 +05:30
Martin Kroeker
e2f9f57433 Merge pull request #5432 from markdryan/markdyan/fix-rvv-detection
fix RVV 1.0 detection code
2025-09-05 13:40:23 -07:00
Martin Kroeker
c31861ea62 Merge pull request #5435 from martin-frbg/update_rvv_ci
Update the riscv-collab llvm toolchain in CI to its latest nightly build
2025-09-02 14:11:16 -07:00
Martin Kroeker
57c2936a43 Merge branch 'OpenMathLib:develop' into update_rvv_ci 2025-09-02 12:09:30 -07:00
Martin Kroeker
6d070820fc Merge pull request #5436 from martin-frbg/update_osx_ci
Update Mac CI jobs as cmake is preinstalled in the runner images now
2025-09-02 12:09:09 -07:00
Martin Kroeker
1c7251ca20 remove the -llto_library option for any osx fortran compiler 2025-09-02 18:36:02 +02:00
Martin Kroeker
a1331406a3 drop (re)installation of cmake on osx runners 2025-09-02 15:39:08 +02:00
Martin Kroeker
c42fccccb5 Drop installation of cmake 2025-09-02 15:36:32 +02:00
Martin Kroeker
4c1a4e60a6 Update toolchain to its latest nightly build 2025-09-02 14:54:08 +02:00
Mark Ryan
7fcad02dc2 fix RVV 1.0 detection code
There were a couple of issues with the detection code used to check
for RVV 1.0 on kernels that do not support hwprobe.

1. The vtype clobber was missing
2. The wrong form of vsetvli was being used. The vsetvli x0, x0 form
   is inappropriate for this use case as it can only be safely used
   in code where the value of vtype is known.  The use of vsetvli
   x0, x0 here can lead to a failure to detect RVV 1.0, if,
   for example, the vill bit happens to be set before
   detect_riscv64_rvv100 is called.

We fix both issues by adding the missing clobber and replacing the
first parameter to vsetvli with t0 (which we add to our clobbers).
2025-08-28 14:20:37 +00:00
Martin Kroeker
06c09deee9 Merge pull request #5426 from hideaki-motoki/issue5417_axpy_sve
Implementing SVE in `[SD]AXPY` Kernels for `A64FX` and `Graviton3E`
2025-08-26 01:10:14 -07:00
Martin Kroeker
da7d0f4a38 Merge pull request #5427 from yuanjia111/develop
Optimize the gemv_t_vector.c  kernel  for  RISCV64_ZVL256B target
2025-08-25 06:45:44 -07:00
yuanjia
c2cc7a3602 riscv64: optimize gemv_t_vector.c 2025-08-22 16:14:14 +08:00
h-motoki
e23f9c6642 Merge remote-tracking branch 'upstream/develop' into issue5417_axpy_sve 2025-08-21 22:16:28 +09:00
Martin Kroeker
b3f247ae5a Merge pull request #5425 from martin-frbg/fixup5389
Increase L2 defaults for RISCV X280 / ZVL256B and ARM SVE targets in CMake cross-compilation
2025-08-21 05:13:34 -07:00
h-motoki
855945befb Implementing SVE in [SD]AXPY Kernels for A64FX and Graviton3E 2025-08-21 20:56:58 +09:00
Martin Kroeker
7c1839899e Increase assumed L2 sizes for RISCV X280 / ZVL256B and for SVE-capable ARM64 2025-08-21 11:57:07 +02:00
Martin Kroeker
9c43301b6d Merge pull request #5421 from reibax-marcus/develop
fix: broken cblas installation when using makefile based builds
2025-08-17 03:03:05 -07:00
Martin Kroeker
9d6df1dd3e Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking
Add and use vectorized packing in ZVL128B and ZVL256B for RISCV
2025-08-16 13:45:35 -07:00
Martin Kroeker
f3b2a15fad Merge pull request #5420 from yuanjia111/develop
Move the value assignment of vector x in gemv_n_sve.c to the outermos…
2025-08-16 12:06:53 -07:00
Chip Kerchner
64401b4417 Disable vectorized packing for DGEMM - since it is slower than scalar. 2025-08-13 13:41:12 +00:00
Martin Kroeker
5e43ba948c Merge pull request #5419 from Mousius/bgemm-optimisation
Optimize SBGEMM / BGEMM for NEOVERSEV1 further
2025-08-13 02:10:20 -07:00
Chip Kerchner
c00afc86a6 Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions. 2025-08-12 17:18:56 +00:00
Xabier Marquiegui
3a6b79c50f fix: broken cblas installation when using makefile based builds
Fix cblas.h missing from target directory if NO_CBLAS is defined but has
a value that indicates you do want cblas built and installed.
2025-08-12 14:41:15 +02:00
yuanjia
803e8d4838 Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval.
1.Verify correctness using BLAS-Tester
    2.Using the built-in benchmark to verify performance, the performance of float and doule type improved by about 60% and about 40% respectively.The test command is:
     export OMP_NUM_THREADS=1;numactl -C 10 -l ./sgemv.goto 3000 4000 100
     export OMP_NUM_THREADS=1;numactl -C 10 -l ./dgemv.goto 3000 4000 100
2025-08-12 18:03:16 +08:00
Chris Sidebottom
5f47b872f1 Remove older kernels for BGEMM on NEOVERSEV1 2025-08-11 09:25:19 +00:00
Chris Sidebottom
114316f361 Optimize SBGEMM / BGEMM for NEOVERSEV1 further
This changes the kernels to pack full SVE vectors and reduces the
overall complexity of the inner GEMM loop.
2025-08-11 09:25:13 +00:00
Martin Kroeker
75c6ab4036 CI: Update WoA job to use LLVM 20.1.8 and avoid stray preinstalled LLVM19 (#5411)
* Update to 20.1.8

* fix PATH to avoid the obsolete LLVM19 that appeared in the preinstalled msvc folder hierarchy
2025-08-09 12:28:24 +02:00
Martin Kroeker
5c5f852ee3 Merge pull request #5415 from martin-frbg/Fixum-5399
Fix compilation of the NeoverseN2 SBGEMM kernel
2025-08-04 04:29:26 -07:00
Martin Kroeker
f1ee61ea30 Include NEON header for the bfloat conversion functions 2025-08-04 00:21:39 -07:00
Martin Kroeker
b3ffd5524a Include NEON header for the bfloat conversion functions 2025-08-04 00:20:28 -07:00
Martin Kroeker
d23680b81d Merge pull request #5407 from nakagawa-fj/feature/gemm_divide_rate_for_neoversev1
Multi-thread Performance Improvement of GEMM on NeoverseV1 with DIVIDE_RATE=1
2025-07-30 13:19:50 -07:00
Martin Kroeker
b4cc4be2ce Merge pull request #5410 from martin-frbg/issue5404
Adjust multithreading threshold in S/DGER and add an intermediate step
2025-07-30 12:16:05 -07:00
Martin Kroeker
0968dddf1a Merge pull request #5409 from martin-frbg/issue5372
Work around gcc15.1 on POWER misoptimizing DGEMV at -O3
2025-07-30 10:36:39 -07:00
Martin Kroeker
eddfe1e6b3 Merge pull request #5408 from ChipKerchner/fixRISCV64GEMVInitializationAndWarnings
Fix bad vector zero initializer and other compiler warnings for RISC-V.
2025-07-30 08:43:08 -07:00
Martin Kroeker
30d11bc92c Adjust multithreading threshold and add an intermediate step 2025-07-30 08:13:33 -07:00
Martin Kroeker
a3b9c933c5 mark xbuffer as volatile to work around gcc15.1 optimizer bug 2025-07-30 17:05:36 +02:00
Chip Kerchner
72f082f31d Fix bad vector zero initializer and other compiler warnings for RISC-V. 2025-07-30 14:04:43 +00:00
Masato Nakagawa
7e29f11396 Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1) 2025-07-29 18:54:36 +09:00
Martin Kroeker
9a64b32b44 Merge pull request #5406 from martin-frbg/fixbgemmtest
Fix building of bgemm tests on GEMM3M-capable (x86) targets
2025-07-28 23:17:29 -07:00
Martin Kroeker
b66a01f909 Fix building of bgemm tests on GEMM3M-capable (x86) targets 2025-07-28 22:43:28 +02:00
Martin Kroeker
a5e7c0e3e0 Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16
ARM64: Enable bfloat16 kernels by default
2025-07-28 13:39:08 -07:00
abhishek-fujitsu
6356190d06 fix gfortran link path in dynamic_arch.yml 2025-07-28 14:37:51 +05:30
abhishek-fujitsu
4c8dcb3a8f Darwin/arm64: disable SVE/SME and fix gfortran link path 2025-07-26 16:59:46 +05:30
Martin Kroeker
33b50548eb Merge pull request #5403 from martin-frbg/issue5402
Introduce a (crude) threshold to multithreading in STRMV/DTRMV
2025-07-25 20:10:47 +02:00
Martin Kroeker
c504aedca1 Merge pull request #5400 from Mousius/neoversev2-target
Add NEOVERSEV2 target support
2025-07-25 15:47:06 +02:00
Martin Kroeker
b9e107932a add NeoverseV2 2025-07-25 15:44:34 +02:00
Martin Kroeker
2f89a5970e fix NeoverseV2 typo 2025-07-25 15:43:37 +02:00
Martin Kroeker
a9e8fa06bf Introduce a (crude) threshold to multithreading 2025-07-25 15:15:46 +02:00
Martin Kroeker
b4c2b34a45 Merge pull request #5401 from martin-frbg/followup-5397
Include float-bfloat conversion functions in ONLY_CBLAS builds as well
2025-07-25 13:56:13 +02:00