Commit Graph

389 Commits

Author SHA1 Message Date
Ye Tao
7321444660 enable sbgemm to be forward to sbgemv on arm64 2025-05-12 15:37:32 +00:00
Harmen Stoppels
51ba70f47b test_potrs.c: remove pragma darwin-aarch64 support
Using GCC 14.2.0 on Darwin, the pragma ultimately causes a linker error
"ld: invalid r_symbolnum=". The current workaround is to use the old
linker, but (a) it's deprecated and (b) it can produce libraries that
are subsequently not linkable with the newer linker in dependents: the
new ld64 does not link to libraries with duplicate rpaths created by the
classic linker.
2025-04-10 15:20:34 +02:00
Martin Kroeker
1ed962d259 Fix compilation with xcode16.3/clang17/gcc14 2025-04-06 10:44:48 -07:00
Vaisakh K V
f66ca05b31 Merge branch 'develop' into topic/sgemm_direct_sme1 2025-02-13 14:54:37 +05:30
Vaisakh K V
d23eb3b93e Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
2025-02-13 14:51:21 +05:30
Martin Kroeker
c1258662db Merge branch 'OpenMathLib:develop' into m3m_exprec 2024-12-30 15:58:15 +01:00
Martin Kroeker
9db51f790a Remove any optimization flags from DEBUG builds on POWER architecture 2024-11-17 23:19:58 +01:00
Martin Kroeker
d04686acd8 Re-enable the EXPRECISION option for non-Windows x86/x86_64 2024-11-14 14:09:01 -08:00
Caroline Newcombe
760bf7aa37 Update Fortran return for complex data types (Cray and Nvidia compilers) 2024-11-13 14:05:20 -06:00
Chip Kerchner
36bd3eeddf Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power). 2024-10-13 13:46:11 -05:00
Martin Kroeker
a492181665 filter out Loongarch -mabi options for flang-new 2024-10-03 15:58:47 +02:00
Martin Kroeker
a1073f5eed Merge pull request #4900 from XiWeiGu/la64_core_rename
LoongArch64: Rename core
2024-10-01 15:29:16 +02:00
gxw
48698b2b1d LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Martin Kroeker
969bb949b1 Strip any mtune option from FFLAGS is the compiler is flang-new 2024-09-19 11:10:28 +02:00
Martin Kroeker
383e0b133e remove suppression of gcc14's incompatible pointer error 2024-09-11 22:21:09 +02:00
Martin Kroeker
42d8865234 fix typo 2024-08-01 12:24:45 +02:00
Martin Kroeker
fcb88b9d52 enable GEMM/GEMV forwarding for riscv and ppc 2024-07-31 23:21:35 +02:00
Chris Sidebottom
b26424c6a2 Allow opt into GEMM -> GEMV forwarding 2024-07-31 13:09:14 +01:00
Martin Kroeker
a4e56e0452 Merge pull request #4806 from Mousius/small-gemm
Small GEMM for AArch64 with SVE
2024-07-25 21:50:04 +02:00
yamazaki-mitsufumi
821ef34635 Add A64FX to the list of CPUs supported by DYNAMIC_ARCH 2024-07-23 20:44:39 +09:00
Mark Ryan
3b715e6162 Add autodetection for riscv64
Implement DYNAMIC_ARCH support for riscv64.  Three cpu types are
supported, riscv64_generic, riscv64_zvl256b, riscv64_zvl128b.
The two non-generic kernels require CPU support for RVV 1.0 to
function correctly.  Detecting that a riscv64 device supports
RVV 1.0 is a little complicated as there are some boards on the
market that advertise support for V via hwcap but only support
RVV 0.7.1, which is not binary compatible with RVV 1.0.  The
approach taken is to first try hwprobe.  If hwprobe is not
available, we fall back to hwcap + an additional check to distinguish
between RVV 1.0 and RVV 0.7.1.

Tested on a VM with VLEN=256, a CanMV K230 with VLEN=128 (with only
the big core enabled), a Lichee Pi with RVV 0.7.1 and a VF2 with no
vector.

A compiler with RVV 1.0 support must be used to build OpenBLAS for
riscv64 when DYNAMIC_ARCH=1.

Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:22 +00:00
gxw
8ab2e9ec65 LoongArch: DGEMM small matrix opt 2024-06-04 16:52:45 +08:00
Martin Kroeker
4376b6f7d2 Restore Loongson LA64ARCH handling 2024-05-07 14:42:01 +02:00
Martin Kroeker
fc10673fd3 Merge branch 'develop' into hugetlb-doc 2024-05-07 13:31:39 +02:00
Martin Kroeker
9c4e10fbd1 sort hugetlb and shm alloc options 2024-05-04 14:48:02 +02:00
Martin Kroeker
7c915e64ca Silence a GCC14 warning/error in the f2c-converted LAPACK 2024-04-30 17:48:14 +02:00
Martin Kroeker
ae695d4ca0 Merge pull request #4642 from XiWeiGu/loongarch64_clang
CI: Add clang test for loongarch64
2024-04-23 18:25:49 +02:00
gxw
7cd438a5ac loongarch64: Fixed clang compilation issues 2024-04-23 19:19:11 +08:00
Martin Kroeker
0ec0746ae4 Update Makefile.system 2024-04-18 16:11:20 +02:00
Martin Kroeker
d6b0badc05 Fix declarations for EMBEDDED 2024-04-18 16:06:21 +02:00
Martin Kroeker
00ee5d0367 On ARM, do not assume -marm by default if OS_EMBEDDED=1 2024-04-12 15:59:45 +02:00
Chip Kerchner
1c13cda3fc Remove -openmp flag from XLF (since it doesn't support it). 2024-04-10 15:16:47 -05:00
Martin Kroeker
52b71a1673 Filter out FFLAGS that flang-new from LLVM18 no longer supports (#4569)
* Filter out FFLAGS that flang-new from LLVM18 no longer supports
2024-03-22 17:02:39 +01:00
Martin Kroeker
a14176440a Add version macro for GCC12 2024-03-10 23:22:05 +01:00
Martin Kroeker
56fad407d1 Merge pull request #4527 from ChipKerchner/fixAIXBuildIssues
Fix LAPACK unit testing build issues.
2024-03-05 17:55:08 +01:00
Chris Sidebottom
7a6fa699f2 Small GEMM for AArch64
This is a fairly conservative addition of small matrix kernels using
SVE.
2024-03-04 15:48:47 +00:00
Martin Kroeker
d1409407a0 Omit redundant prefixes or suffixes in library naming 2024-02-27 21:05:59 +01:00
Chip-Kerchner
3e030cc5fe Fix LAPACK unit testing build issues. Limit AIX builds to 32 threads (to eliminate failures of some systems). 2024-02-26 12:46:05 -06:00
Martin Kroeker
2e86faa657 Merge branch 'develop' into issue4468 2024-02-23 11:39:49 +01:00
Ayappan Perumal
892f8ff3e5 Shared library support for AIX 2024-02-22 07:05:37 -06:00
Martin Kroeker
ca6b4961e4 updates to fix option conflicts and config file generation 2024-02-15 14:31:11 +01:00
Martin Kroeker
bb96e466ae Introduce LIBNAMEPREFIX to avoid messing with the internal LIBPREFIX 2024-02-09 15:50:11 +01:00
Martin Kroeker
1ed69ea1c0 improve naming 2024-02-06 23:35:12 +01:00
Martin Kroeker
63fbffddf8 Add option FIXED_LIBNAME to suppress versioning and softlinking 2024-02-05 21:44:03 +01:00
Dirreke
ec89466e14 Add CSKY support 2024-01-16 23:45:06 +08:00
Chris Sidebottom
dc20a78188 Use functionally equivalent dynamic targets
Similar to `drivers/other/dynamic.c`, I've looked for functionally
equivalent targets and mapped them in the default DYNAMIC_ARCH build.
Users can still build specific cores using DYNAMIC_LIST.
2023-12-23 12:45:27 +00:00
Martin Kroeker
47b03fd4b4 Copy XCode15-specific workaround to Fortran flags to fix build of tests 2023-11-18 23:45:02 +01:00
Martin Kroeker
9c3c1cfbd6 Merge pull request #4304 from martin-frbg/issue4277
Move clang/gfortran OpenMP dependency rewriting out of f_check
2023-11-11 20:58:21 +01:00
Martin Kroeker
1a308a0066 Move OpenMP dependency handling for clang/gfortran combo 2023-11-10 15:27:46 +01:00
Chip Kerchner
206e76187e Fix FCOMMON_OPT for power. Error out for certain C and Fortran compiler combos in AIX. 2023-11-07 18:08:57 -06:00