Martin Kroeker
643a0b53b0
Allow VortexM4 on the direct_SME fast path only for clang-based compilers
2025-10-19 13:37:38 -07:00
Martin Kroeker
20f5ed1a94
Merge branch 'OpenMathLib:develop' into issue5414
2025-10-08 05:27:28 -07:00
Martin Kroeker
fa912ce852
rework definitions of ?FLOAT16_GEMM_GEMV_FORWARD
2025-10-08 11:11:52 +02:00
Chris Sidebottom
37fc3bbca0
Add Infrastructure for SHGEMV
...
This adds all the relevant bits and pieces to add a `shgemv` path as
well as a future `hgemm`/`hgemv` path in a similar model to `sb` and `b`
interfaces.
I've also fixed a few bits and pieces around `shgemm` which didn't build
in a few situations.
2025-10-07 15:03:24 +00:00
Martin Kroeker
1b88c9c742
remove debugging printouts
2025-08-24 13:48:22 -07:00
Martin Kroeker
7f89c6f353
smh-based direct sgemm currently requires leading dimensions to be same as matrix dimension
2025-08-23 14:20:15 -07:00
Martin Kroeker
de91afd2ae
Move SGEMM_DIRECT after the CBLAS parameter check and add sgemm_direct_performant for ARM64
2025-08-18 01:44:21 -07:00
Martin Kroeker
39c90f9859
Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta
...
SME1 based direct kernel (with alpha and beta) for cblas_sgemm level 3
2025-07-18 23:23:39 +02:00
Rajendra Prasad Matcha
eae0abfdb6
SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API.
2025-07-17 16:14:31 +05:30
Chris Sidebottom
66d9185ebe
Fix CMake support
2025-07-08 22:49:55 +00:00
Chris Sidebottom
f95e7b0e32
Add infrastructure for BGEMM
...
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.
Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287
Co-authored-by: Ye Tao <ye.tao@arm.com >
2025-07-08 16:22:41 +01:00
Srangrang
ec14e1648c
fix: resolve non-RISCV host build failed issue
...
- adjust interface to disable "small matrix" pathway
- separate HFLOAT16 from BFLOAT16
- remove SHGEMM_UNROLL_M and SHGEMM_UNROLL_N equal conditions
Related to PR#5290
Co-authored-by Martin
2025-06-15 20:25:15 +08:00
Martin Kroeker
6680e0592f
Fix conditional inclusion of SGEMM_KERNEL_DIRECT
2025-05-17 05:12:15 -07:00
Martin Kroeker
09ba099461
make throttling code conditional on SMP
2025-02-25 12:10:48 +01:00
Marek Michalowski
b723c1b7b7
Add thread throttling profile for SGEMM on NEOVERSEV2
2025-02-20 10:28:21 +00:00
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
...
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
2025-02-13 14:51:21 +05:30
Chris Daley
cb48505251
optimize gemv forwarding on ARM64 systems
2024-10-24 21:05:26 -07:00
Chip Kerchner
36bd3eeddf
Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power).
2024-10-13 13:46:11 -05:00
gxw
48698b2b1d
LoongArch64: Rename core
...
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Martin Kroeker
7878976236
disable forwarding from SBGEMM to SBGEMV for now
2024-08-08 18:03:38 +02:00
Chris Sidebottom
b26424c6a2
Allow opt into GEMM -> GEMV forwarding
2024-07-31 13:09:14 +01:00
Chris Sidebottom
90eb863d4b
Re-add accidental removal
2024-07-31 13:09:14 +01:00
Chris Sidebottom
28b5334f22
Complete implementation of GEMV forwarding
2024-07-31 13:09:14 +01:00
Martin Kroeker
3db5dbc88e
forward to GEMV when one argument is actually a vector
2024-07-31 13:09:14 +01:00
gxw
637c650f4f
loongarch64: Add buffer offset for target LOONGSON3R5
2024-05-10 11:42:53 +08:00
Martin Kroeker
93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
...
loongarch: Optimizing the performance of the GEMM on servers
2024-04-10 14:23:31 +02:00
gxw
d8c4ea8793
loongarch: Optimizing the performance of the GEMM on servers
2024-04-09 09:03:34 -04:00
Martin Kroeker
a3354a7630
Cap the number of parallel threads
2024-03-27 22:00:30 +01:00
Honglin Zhu
71e4125795
Fix syscall error on non-x86 platform
2023-05-22 21:59:59 +08:00
Honglin Zhu
90f041e348
Invoke the syscall to allow the use of amx tiles
2023-05-19 10:48:18 +08:00
Wangyang Guo
4289cf048d
sbgemm: avoid falling into SGEMM_KERNEL_DIRECT
2021-09-07 21:30:46 +08:00
Wangyang Guo
2e44ca0136
sbgemm: add missing cblas_sbgemm definition
2021-08-30 17:40:30 +08:00
Wangyang Guo
1d83ca4bca
Small Matrix: support BFLOAT16 data type
2021-08-30 17:40:20 +08:00
Wangyang Guo
c17d6dacb2
Small Matrix: skip compile in unimplemented data type
2021-08-05 05:46:13 +00:00
Wangyang Guo
aa50185647
Small Matrix: better handle with GEMM3M marco
2021-08-05 02:45:53 +00:00
Wangyang Guo
478d1086c1
Small Matrix: support DYNAMIC_ARCH build
2021-08-04 03:12:41 +00:00
Wangyang Guo
5dc7c3c8e5
Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case
2021-08-02 07:06:54 +00:00
Xianyi Zhang
6022e5629c
Refs #2587 fix small matrix c/zgemm bug.
2021-08-02 07:06:54 +00:00
Xianyi Zhang
57ed58cefe
Refs #2587 Add small matrix optimization reference kernel for c/zgemm.
2021-08-02 07:06:54 +00:00
Xianyi Zhang
17d32a4a82
Change a1b0 gemm to b0 gemm.
2021-08-02 07:06:54 +00:00
Xianyi Zhang
4271cfcc6f
Fix gemm interface bug for small matrix.
2021-08-02 07:06:51 +00:00
Xianyi Zhang
be3349405d
Add alpha=1.0 beta=0.0 for small gemm.
2021-08-02 07:01:47 +00:00
Xianyi Zhang
0a2077901c
Add small marix optimization kernel interface.
...
make SMALL_MATRIX_OPT=1
2021-08-02 07:01:47 +00:00
Martin Kroeker
7bb59fceb7
Clean up some warnings
2021-07-11 16:00:29 +02:00
Gordon Fossum
8b599836db
Add error message token for SBGEMM in gemm.c
2021-05-04 13:55:02 -05:00
Alex Henrie
6f32991eae
Don't define the mode variable when not needed in gemm functions
2021-01-14 19:40:31 -07:00
Martin Kroeker
75eeb265d7
[WIP] Refactor the driver code for direct SGEMM ( #2782 )
...
Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available
(on x86_64 targets only for now) in DYNAMIC_ARCH builds
* Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt
* Add direct_sgemm functions to the gotoblas struct in common_param.h
* Move sgemm_direct_performant helper to separate file
* Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h
* (Conditionally) add sgemm_direct functions in setparam-ref.c
2020-08-19 14:51:09 +02:00
Rajalakshmi Srinivasaraghavan
7eb55504b1
RFC : Add half precision gemm for bfloat16 in OpenBLAS
...
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes). Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.
Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.
This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker
8229c163b7
Use runtime check for AVX512 (sgemm_direct) capability when using DYNAMIC_ARCH
2020-03-26 21:12:56 +01:00
Martin Kroeker
6a14b34c20
Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX
2020-03-22 14:33:16 +01:00