Commit Graph

70 Commits

Author SHA1 Message Date
Chris Sidebottom
e105411460 Add infrastructure for bgemv/bscal
- Sets up all the various entrypoints for `bgemv`
- Adds `bscal` for use in the `bgemv` interface
- Adds test cases for comparing `sgemv` and `bgemv`
- Adds generic kernels for `bgemv_n` and `bgemv_t` which are accurate
enough to pass above tests
2025-07-15 14:48:57 +01:00
Chris Sidebottom
740efd71c4 Add optimized BGEMM kernel for NEOVERSEV1 target
This also improves the testing and generic kernel by re-using the BF16
conversion functions.

Built on top of https://github.com/OpenMathLib/OpenBLAS/pull/5357 and derived from https://github.com/OpenMathLib/OpenBLAS/pull/5287

Co-authored-by: Ye Tao <ye.tao@arm.com>
2025-07-10 23:23:27 +00:00
Chris Sidebottom
f95e7b0e32 Add infrastructure for BGEMM
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.

Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287

Co-authored-by: Ye Tao <ye.tao@arm.com>
2025-07-08 16:22:41 +01:00
pengxu
a978ad3180 Loongarch64: add C functions of zgemm_ncopy_16 2025-05-13 16:09:12 +08:00
tingbo.liao
3c8df6358f Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2025-01-22 11:41:12 +08:00
Martin Kroeker
d91d4fa6e9 convert the beta=0 branch to a for loop as well 2025-01-09 23:11:26 +01:00
Martin Kroeker
09e75f1588 fix absurd typo 2025-01-09 00:52:14 +01:00
Martin Kroeker
2891fd8d6d Replace while loop with for 2025-01-08 23:17:45 +01:00
Martin Kroeker
ccc23338d7 have the dummy GEMM3M kernel at least forward to regular GEMM 2024-08-07 19:39:02 +02:00
gxw
6017ad7146 loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6 2024-05-08 10:10:26 +08:00
pengxu
4787a55c64 Optimized cgemm kernel 16x4 LASX for LoongArch 2024-02-21 15:28:47 +08:00
Sergei Lewis
1093def0d1 Merge branch 'risc-v' into develop 2024-01-29 11:11:39 +00:00
kseniyazaytseva
ff41cf5c49 Fix BLAS, BLAS-like functions and Generic RISC-V kernels
* Fixed gemmt, imatcopy, zimatcopy_cnc functions
* Fixed cblas_cscal testing in ctest
* Removed rotmg unreacheble code
* Added zero size checks
2024-01-18 23:19:52 +03:00
martin-frbg
7976deff80 Fix file permissions (issue 4095) 2023-07-23 20:37:07 +02:00
Martin Kroeker
cfa0a80664 Restore initialization of data variables 2023-07-13 23:23:12 +02:00
Martin Kroeker
9567305e4c Restore initialization of data01,data02 2023-07-13 23:21:18 +02:00
Sergei Lewis
cb0a70e0e2 dot.c early bail fix 2023-03-02 09:51:10 +00:00
Sergei Lewis
2406958629 * update intrinsics to match latest spec at https://github.com/riscv-non-isa/rvv-intrinsic-doc (in particular, __riscv_ prefixes for rvv intrinsics)
* fix multiple numerical stability and corner case issues
* add a script to generate arbitrary gemm kernel shapes
* add a generic zvl256b target to demonstrate large gemm kernel unrolls
2023-02-24 10:45:03 +00:00
Ivan Pribec
802e71bf05 Add const attribute to lsame 2022-08-08 15:15:52 +02:00
Martin Kroeker
ef24712030 Move a conditionally used variable 2021-09-11 14:37:44 +02:00
Wangyang Guo
619588fbab sbgemm: remove unnecessary b0 files 2021-08-30 17:55:01 +08:00
Wangyang Guo
1d83ca4bca Small Matrix: support BFLOAT16 data type 2021-08-30 17:40:20 +08:00
Wangyang Guo
989e6bbdd3 Small Matrix: reduce generic kernel source files 2021-08-13 03:17:38 +00:00
Wangyang Guo
6b58bca18b Small Matrix: disable low performance default kernel 2021-08-03 06:49:03 +00:00
Wangyang Guo
5dc7c3c8e5 Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case 2021-08-02 07:06:54 +00:00
Xianyi Zhang
6022e5629c Refs #2587 fix small matrix c/zgemm bug. 2021-08-02 07:06:54 +00:00
Xianyi Zhang
57ed58cefe Refs #2587 Add small matrix optimization reference kernel for c/zgemm. 2021-08-02 07:06:54 +00:00
Xianyi Zhang
17d32a4a82 Change a1b0 gemm to b0 gemm. 2021-08-02 07:06:54 +00:00
Xianyi Zhang
be3349405d Add alpha=1.0 beta=0.0 for small gemm. 2021-08-02 07:01:47 +00:00
Xianyi Zhang
0a2077901c Add small marix optimization kernel interface.
make SMALL_MATRIX_OPT=1
2021-08-02 07:01:47 +00:00
damonyu
ef8e7d0279 Add the support for RISC-V Vector.
Change-Id: Iae7800a32f5af3903c330882cdf6f292d885f266
2020-10-15 16:09:02 +08:00
Martin Kroeker
756062afa5 Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:56:17 +02:00
Qiyu8
60e6c68e38 Adapt ARM architect 2020-09-29 16:36:14 +08:00
Qiyu8
1b1a757f5f Optimize the performance of dot by using universal intrinsics in X86/ARM 2020-09-28 20:36:53 +08:00
Rajalakshmi Srinivasaraghavan
d23419accc powerpc: Optimized SHGEMM kernel for POWER10
This patch introduces new optimized version of SHGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.

Tested on simulator and there are no new test failures.
2020-06-25 22:19:08 -05:00
Rajalakshmi Srinivasaraghavan
a87793e03c Fix DYNAMIC_ARCH compilation errors 2020-04-15 09:09:50 -05:00
Rajalakshmi Srinivasaraghavan
7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Qiyu8
ff42e68652 Optimize genenal Gemm Beta 2020-01-20 11:49:42 +08:00
Andrew
1e531701b7 fix small typo 2018-09-09 16:52:25 +02:00
Martin Kroeker
7a7619af6d Revert changes from PR#1419
at least one of these changes apparently is an oversimplification, leading to TRMM breakage on some platforms as observed in #1563
2018-05-17 11:40:08 +02:00
Andrew
e5cc3d72c0 core.IdenticalExpr clang501 checker 2018-01-19 23:17:43 +01:00
Andrew
9fa986337d add missing brackets to silence indentation warnings gcc721 2018-01-19 23:11:12 +01:00
Andrew
3eed97f6b9 Initialize values to silence cppcheck 2018-01-12 22:35:00 +01:00
Andrew
d602b99386 LAPACK helpers in C that need care too 2018-01-02 14:38:50 +01:00
Andrew
4d0b005e5b Eliminate remaining unused results in kernels (clang5 analyzer) 2018-01-01 20:54:39 +01:00
Andrew
03e5ff0687 initialize potentially unitialized variables (clang5) 2017-12-26 09:24:24 +01:00
Andrew
47deec2c1a fix couple of dead assignment warnings 2017-12-22 00:56:35 +01:00
Andrew
281a2b952f warning cleanup (#1380)
* dead increments in driver/level2

* dead increments in kernel/generic

* part dead increments in kernel/x86_64
2017-12-05 19:54:10 +01:00
Martin Kroeker
8213385ab8 Work around compiler warnings for unused variables in the generic zgemm3m_Xcopy kernels 2017-12-02 22:51:58 +01:00
Andrew
441a9c8385 more dead increments clang4 scan-build deadcode.deadstores 2017-11-26 17:24:08 +01:00