Commit Graph

122 Commits

Author SHA1 Message Date
gkdddd
670ec6f757 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B
Added HFLOAT16 support for RISCV64
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16
The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0

Related to issue #5279
Co-authored-by Linjin Li <linjin_li@163.com>
2025-06-03 20:14:30 +08:00
Ruiyang Wu
02fd1df10b CMake: Pass OpenMP compiler and linker flags through CMake targets
Using `OpenMP::OpenMP_LANG` targets for CMake is less error-prone than
passing the compiler and linker flags manually. Furthermore, it allows
the user to customize those flags by setting `OpenMP_LANG_FLAGS`,
`OpenMP_LANG_LIB_NAMES`, and `OpenMP_omp_LIBRARY`.
2025-03-26 23:09:54 -04:00
Martin Kroeker
2332ea7e7a fix misleading indentation 2024-11-06 18:35:31 +01:00
Martin Kroeker
73e13b0273 flesh out HERK prototype 2024-08-12 14:45:40 +02:00
Martin Kroeker
824306baab flesh out HERK prototype 2024-08-12 14:44:13 +02:00
Mark Ryan
3b715e6162 Add autodetection for riscv64
Implement DYNAMIC_ARCH support for riscv64.  Three cpu types are
supported, riscv64_generic, riscv64_zvl256b, riscv64_zvl128b.
The two non-generic kernels require CPU support for RVV 1.0 to
function correctly.  Detecting that a riscv64 device supports
RVV 1.0 is a little complicated as there are some boards on the
market that advertise support for V via hwcap but only support
RVV 0.7.1, which is not binary compatible with RVV 1.0.  The
approach taken is to first try hwprobe.  If hwprobe is not
available, we fall back to hwcap + an additional check to distinguish
between RVV 1.0 and RVV 0.7.1.

Tested on a VM with VLEN=256, a CanMV K230 with VLEN=128 (with only
the big core enabled), a Lichee Pi with RVV 0.7.1 and a VF2 with no
vector.

A compiler with RVV 1.0 support must be used to build OpenBLAS for
riscv64 when DYNAMIC_ARCH=1.

Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:22 +00:00
Martin Kroeker
2dda40d280 use atomic operations as in the corresponding getrf 2024-03-28 11:33:31 +01:00
Dirreke
ec89466e14 Add CSKY support 2024-01-16 23:45:06 +08:00
Martin Kroeker
1d4aa8d7d5 fix improper function prototypes (empty parentheses) 2023-09-30 13:00:51 +02:00
Martin Kroeker
f4f31fb53b fix improper function prototypes (empty parentheses) 2023-09-30 12:59:44 +02:00
gxw
d15e0a055c LoongArch64: Fixed compilation issues when enable DYNAMIC_ARCH 2023-09-27 10:05:27 +08:00
Martin Kroeker
3b6050ac04 clarify the comment on the out-of-bounds check from #723 2023-08-26 02:00:00 +02:00
Martin Kroeker
22a402bc2c clarify the comment on the out-of-bounds check from #723 2023-08-26 01:58:08 +02:00
Martin Kroeker
437c0bf2b4 Merge pull request #3843 from Mousius/switch-ratio
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
2023-04-19 11:51:54 +02:00
Chris Sidebottom
32f2fafde7 Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
Previously dynamic builds were either using the default SWITCH_RATIO
or one from the higher level architecture; this patch ensures the
dynamic builds can use this parameter as well.
2023-04-17 15:34:12 +01:00
Martin Kroeker
6c431239da Split test condition in LU computation - non-denormal for computation, exact zero for reporting singularity 2023-03-29 22:14:21 +02:00
Martin Kroeker
12aabb9f9b fix conditional 2023-03-29 09:44:33 +02:00
Martin Kroeker
f3d21039ce Improve fix from PR3924 (#3941)
* compare denominator against DBL_MIN rather than a somewhat arbitrary small number near it
2023-03-16 15:09:32 +01:00
Martin Kroeker
3d27cbd9a3 avoid overflow in division 2023-02-26 23:44:14 +01:00
Martin Kroeker
a39ced0551 avoid overflow in division 2023-02-26 23:42:20 +01:00
Martin Kroeker
aa2a2d9c01 Conditionally compile files that may get replaced by ReLAPACK 2022-11-08 12:04:46 +01:00
Martin Kroeker
7656aba00e Merge pull request #3493 from martin-frbg/casts+cleanup
WIP casts and cleanups
2022-02-06 23:55:06 +01:00
Martin Kroeker
40003f8edb Fix pivot offset calculation for negative incx 2022-01-17 00:11:18 +01:00
Martin Kroeker
57e2a72f40 Fix pivot offset calculation for negative incx 2022-01-17 00:10:21 +01:00
Martin Kroeker
3b6293f5a0 Fix offset calculation for negative incx 2022-01-17 00:09:14 +01:00
Martin Kroeker
afa0cece5c Fix pivot offset calculation for negative incx 2022-01-17 00:08:20 +01:00
Martin Kroeker
eca2f50b48 Fix pivot offset calculation for negative incx 2022-01-17 00:07:33 +01:00
Martin Kroeker
0e9e951306 Fix pivot offset calculation for negative incx 2022-01-17 00:06:41 +01:00
Martin Kroeker
1b49ef8dcf Fix pivot index for negative increments 2022-01-17 00:05:33 +01:00
Martin Kroeker
6b407a16cb fix function typecasts 2021-12-21 18:51:28 +01:00
Martin Kroeker
aecb4a5e8d fix function typecasts 2021-12-21 18:50:22 +01:00
Martin Kroeker
c49d46f25f fix function typecast 2021-12-21 18:49:18 +01:00
gxw
af0a69f355 Add support for LOONGARCH64 2021-07-27 15:29:12 +08:00
Zhang Xianyi
d7ba7679b6 Merge branch 'develop' into risc-v 2020-10-16 23:27:38 +08:00
Martin Kroeker
4bb73c0171 Rename "HALF" type to "BFLOAT16" 2020-10-13 20:07:19 +02:00
Martin Kroeker
32733ded04 Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:52:45 +02:00
Martin Kroeker
b27ca78a21 Adapt to having only a subset of variable types supported 2020-10-11 14:46:24 +02:00
Martin Kroeker
93454022a9 Adapt to having only a subset of variable types supported 2020-10-11 14:45:40 +02:00
Martin Kroeker
20cf1d773f Adapt to having only a subset of variable types supported 2020-10-11 14:44:56 +02:00
Martin Kroeker
5c657fffad Adapt to having only a subset of variable types supported 2020-10-11 14:44:13 +02:00
Martin Kroeker
b262058059 Adapt to having only a subset of variable types supported 2020-10-11 14:43:13 +02:00
Martin Kroeker
bc319cee82 Adapt to having only a subset of variable types supported 2020-10-11 14:42:26 +02:00
Martin Kroeker
e5966f8606 Adapt to having only a subset of variable types supported 2020-10-11 14:41:43 +02:00
Martin Kroeker
9df12eb08f Adapt to having only a subset of variable types supported 2020-10-11 14:40:51 +02:00
Martin Kroeker
cf53970bcb Adapt to having only a subset of variable types supported 2020-10-11 14:40:06 +02:00
Martin Kroeker
dcd51d5c72 Adapt to having only a subset of variable types supported 2020-10-11 14:39:19 +02:00
Martin Kroeker
b8f95354c7 Adapt to having only a subset of variable types supported 2020-10-11 14:38:25 +02:00
Martin Kroeker
f194ad59e1 Use _Atomic instead of volatile where available (file moved from ../getrf)
must have misplaced this in ../getrf when I made that change in March 2018 (40160ff)
the only changes since then were 
RFC : Add half precision gemm for bfloat16 in OpenBLAS Rajalakshmi Srinivasaraghavan
Rajalakshmi Srinivasaraghavan committed on 14 Apr 2020 as 7ebbb50

    Change _STDC_VERSION__ to __STDC_VERSION__ 
Zhiyong Dang committed on 11 May 2018 as 3716267
2020-07-25 08:52:24 +02:00
Martin Kroeker
4fda217f99 Delete potrf_parallel.c (moving it to ../potrf) 2020-07-25 06:42:39 +00:00
Martin Kroeker
bbe119ee3b Update conditional for atomics to use HAVE_C11 2020-07-18 17:19:59 +00:00