Commit Graph

488 Commits

Author SHA1 Message Date
Martin Kroeker
9bfc3612f9 Merge branch 'OpenMathLib:develop' into issue5414 2025-10-12 09:18:06 -07:00
Martin Kroeker
4291fa2f7a fix misnaming of NVHPC as NVC in ARM64 compiler option selection 2025-10-10 10:51:07 +02:00
Martin Kroeker
20f5ed1a94 Merge branch 'OpenMathLib:develop' into issue5414 2025-10-08 05:27:28 -07:00
Chris Sidebottom
37fc3bbca0 Add Infrastructure for SHGEMV
This adds all the relevant bits and pieces to add a `shgemv` path as
well as a future `hgemm`/`hgemv` path in a similar model to `sb` and `b`
interfaces.

I've also fixed a few bits and pieces around `shgemm` which didn't build
in a few situations.
2025-10-07 15:03:24 +00:00
Martin Kroeker
fc516af155 Merge branch 'develop' into issue5414 2025-10-01 14:12:59 -07:00
Martin Kroeker
4ef70b490c Add support for passing the HFLOAT16 option where required 2025-09-27 22:19:23 +02:00
Martin Kroeker
2fee943edb Add CMake build support for IBM Z (#5440)
* Add ZARCH support, including DYNAMIC_ARCH
2025-09-09 22:18:51 +02:00
Martin Kroeker
7c1839899e Increase assumed L2 sizes for RISCV X280 / ZVL256B and for SVE-capable ARM64 2025-08-21 11:57:07 +02:00
Martin Kroeker
4609732e69 Relax version number requirement for AppleClang 2025-08-18 14:54:20 -07:00
Martin Kroeker
bf98e448eb Add VORTEXM4 to DYNAMIC_ARCH list 2025-08-18 14:43:08 -07:00
Martin Kroeker
426b5f23ed Add compiler options for VORTEXM4 2025-08-18 14:35:36 -07:00
Martin Kroeker
4328c91e27 relax requirements in compiler SME capability check 2025-08-18 14:34:51 -07:00
Martin Kroeker
c794d0a4ce Add VORTEXM4 2025-08-18 14:33:24 -07:00
Martin Kroeker
a4f5fec46e Add compiler options for VORTEXM4 2025-08-18 14:32:07 -07:00
Martin Kroeker
c504aedca1 Merge pull request #5400 from Mousius/neoversev2-target
Add NEOVERSEV2 target support
2025-07-25 15:47:06 +02:00
Martin Kroeker
2f89a5970e fix NeoverseV2 typo 2025-07-25 15:43:37 +02:00
Chris Sidebottom
87247daadc Add NEOVERSEV2 target support
Did a quick run around to make `TARGET=NEVOERSEV2` build successfully.

Fixes #5385
2025-07-24 12:40:31 +01:00
Martin Kroeker
a5b55f6fe3 remove CBLAS restriction on GEMM_GEMV forwarding 2025-07-24 09:30:58 +02:00
Martin Kroeker
82954ba4ca Update ?GEMM-to-?GEMV forwarding settings 2025-07-23 23:24:42 +02:00
Martin Kroeker
b24212f5df fix numbers 2025-07-21 22:54:52 +02:00
Martin Kroeker
6ff06f5483 Add cross-compilation data for RISCV64 targets 2025-07-21 22:42:15 +02:00
Chris Sidebottom
947d7af4c9 Fix CMake references to bscal and bgemv 2025-07-15 15:41:53 +01:00
Chris Sidebottom
72d2ebb4dd Re-add GEMV fallback for Level3 2025-07-15 15:00:20 +01:00
Chris Sidebottom
e105411460 Add infrastructure for bgemv/bscal
- Sets up all the various entrypoints for `bgemv`
- Adds `bscal` for use in the `bgemv` interface
- Adds test cases for comparing `sgemv` and `bgemv`
- Adds generic kernels for `bgemv_n` and `bgemv_t` which are accurate
enough to pass above tests
2025-07-15 14:48:57 +01:00
Chris Sidebottom
66d9185ebe Fix CMake support 2025-07-08 22:49:55 +00:00
Chris Sidebottom
f95e7b0e32 Add infrastructure for BGEMM
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.

Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287

Co-authored-by: Ye Tao <ye.tao@arm.com>
2025-07-08 16:22:41 +01:00
Chris Sidebottom
552e1c7a7a Correct compiler flags for NEOVERSEV1 target 2025-07-07 11:26:36 +00:00
Usui, Tetsuzo
14107e37d9 Add parallel laed3 2025-07-01 22:12:27 +09:00
Martin Kroeker
560fa88c96 Add cross-build parameters for Ampere One 2025-06-26 10:57:30 +02:00
Martin Kroeker
55bb5ef867 Add compiler options for Ampere One 2025-06-26 10:50:44 +02:00
Srangrang
0a967797a1 Add FP16 support for RISCV 2025-05-27 14:34:57 +08:00
Martin Kroeker
f2022c23ac Remove sve capability from NeoverseN1 and specify CortexX2/A?10 as arm8.4a 2025-05-19 16:08:12 +02:00
Martin Kroeker
d9369bda1e Update and amend parameters for Neoverse cpus 2025-04-16 01:09:57 -07:00
Ruiyang Wu
1b0c0f00e9 CMake: Avoid mixed OpenMP linkage 2025-03-26 23:52:13 -04:00
Ruiyang Wu
02fd1df10b CMake: Pass OpenMP compiler and linker flags through CMake targets
Using `OpenMP::OpenMP_LANG` targets for CMake is less error-prone than
passing the compiler and linker flags manually. Furthermore, it allows
the user to customize those flags by setting `OpenMP_LANG_FLAGS`,
`OpenMP_LANG_LIB_NAMES`, and `OpenMP_omp_LIBRARY`.
2025-03-26 23:09:54 -04:00
Martin Kroeker
b34235ca66 Fix inclusion of deprecated interfaces and cgesvdq/strsyl3 2025-03-12 22:41:50 +01:00
Martin Kroeker
f1fa370579 fix missing endif 2025-02-19 15:22:26 +01:00
Martin Kroeker
6d1444be3a Add ARM64 options for NVIDIA HPC 2025-02-19 14:26:43 +01:00
Vaisakh K V
f66ca05b31 Merge branch 'develop' into topic/sgemm_direct_sme1 2025-02-13 14:54:37 +05:30
Vaisakh K V
d23eb3b93e Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
2025-02-13 14:51:21 +05:30
Martin Kroeker
877d5a5be6 Add -O2 to flang flags when building on WoA in Release mode 2025-02-12 17:01:06 +01:00
Martin Kroeker
262018f14c Merge pull request #5092 from XiWeiGu/la64_fixed_cmake
LoongArch64: Fixed cmake
2025-01-23 13:54:27 +01:00
Martin Kroeker
180ba5e7d0 Merge pull request #5069 from tingboliao/dev_rotm_20250107
Further rearranged the rotm kernel for the different architectures.
2025-01-23 10:16:43 +01:00
gxw
1ebcbdbab3 LoongArch64: Fixed the issue of using the old-style TARGET in cmake builds 2025-01-23 09:08:42 +00:00
Martin Kroeker
111c9b0733 Add translations for C_COMPILER and OSNAME 2025-01-22 19:51:43 +01:00
tingbo.liao
3c8df6358f Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2025-01-22 11:41:12 +08:00
Martin Kroeker
fbf594b62f Guard against empty CMAKE_Fortran_COMPILER_ID 2024-12-24 13:34:33 +01:00
Martin Kroeker
d78fbe425c Assume no underline suffixes on symbols when compiling with ifx on Windows 2024-12-23 19:04:50 +01:00
Martin Kroeker
30188a55d1 Don't assume underlined symbols for ifx; make cpuid.S inclusion conditional 2024-12-23 19:02:34 +01:00
Martin Kroeker
32319a33ac Add options for Intel oneAPI 2025.0 ifx on Windows 2024-12-23 19:00:48 +01:00