83 Commits

Author SHA1 Message Date
Martin Kroeker
0e28b427f3 Add slaed3/dlaed3 to complex builds 2026-02-18 19:09:36 +01:00
Chris Sidebottom
37fc3bbca0 Add Infrastructure for SHGEMV
This adds all the relevant bits and pieces to add a `shgemv` path as
well as a future `hgemm`/`hgemv` path in a similar model to `sb` and `b`
interfaces.

I've also fixed a few bits and pieces around `shgemm` which didn't build
in a few situations.
2025-10-07 15:03:24 +00:00
Martin Kroeker
e58f6dc50d Add extensions ?GEMM_BATCH_STRIDED and CBLAS_?GEMM_BATCH_STRIDED (#5458)
* Add ?GEMM_BATCH_STRIDED and CBLAS_?GEMM_BATCH_STRIDED
2025-09-26 14:00:47 +02:00
Martin Kroeker
eb931deb22 Add BLAS interface to ?GEMM_BATCH 2025-09-17 10:23:48 -07:00
Martin Kroeker
965463f177 Include float-bfloat conversion functions in ONLY_CBLAS builds as well 2025-07-24 23:33:20 +02:00
Chris Sidebottom
e105411460 Add infrastructure for bgemv/bscal
- Sets up all the various entrypoints for `bgemv`
- Adds `bscal` for use in the `bgemv` interface
- Adds test cases for comparing `sgemv` and `bgemv`
- Adds generic kernels for `bgemv_n` and `bgemv_t` which are accurate
enough to pass above tests
2025-07-15 14:48:57 +01:00
Chris Sidebottom
740efd71c4 Add optimized BGEMM kernel for NEOVERSEV1 target
This also improves the testing and generic kernel by re-using the BF16
conversion functions.

Built on top of https://github.com/OpenMathLib/OpenBLAS/pull/5357 and derived from https://github.com/OpenMathLib/OpenBLAS/pull/5287

Co-authored-by: Ye Tao <ye.tao@arm.com>
2025-07-10 23:23:27 +00:00
Chris Sidebottom
f95e7b0e32 Add infrastructure for BGEMM
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.

Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287

Co-authored-by: Ye Tao <ye.tao@arm.com>
2025-07-08 16:22:41 +01:00
Usui, Tetsuzo
14107e37d9 Add parallel laed3 2025-07-01 22:12:27 +09:00
gkdddd
670ec6f757 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B
Added HFLOAT16 support for RISCV64
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16
The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0

Related to issue #5279
Co-authored-by Linjin Li <linjin_li@163.com>
2025-06-03 20:14:30 +08:00
Martin Kroeker
ff30ac9666 Update Makefile 2025-02-06 19:51:23 +01:00
Martin Kroeker
09414a4187 Ensure that GEMMTR name appears in XERBLA if gemmt was called as such 2025-02-06 18:52:00 +01:00
Martin Kroeker
0cf656fd3e Add copies of GEMMT under its new name GEMMTR 2024-10-30 12:55:14 +01:00
Martin Kroeker
89c7bbcba6 add cblas_?gemm_batch 2024-05-29 15:47:02 +02:00
Martin Kroeker
d4db6a9f16 Separate the interface for SBGEMMT from GEMMT due to differences in GEMV arguments 2024-02-06 22:23:47 +01:00
Martin Kroeker
47bd064763 Fix names in build rules 2024-01-31 20:49:43 +01:00
Martin Kroeker
b54cda8490 Unify creation of CBLAS interfaces for ?AMIN/?AMAX and C/ZAXPYC between gmake and cmake builds 2024-01-31 16:00:52 +01:00
Martin Kroeker
b926e70ebd Fix typo in build rule of "profiled" sbgemm 2023-09-21 23:07:32 +02:00
Martin Kroeker
912d713b52 redo lost edit 2023-03-28 18:31:04 +02:00
Martin Kroeker
dc15c18efc Fix build failures seen with the NO_LAPACK option - cspr/csymv/csyr belong on the LAPACK list 2023-03-28 16:33:09 +02:00
H. Vetinari
f2659516ef remove unqualified ifdef's for NO_LAPACK(E) 2023-03-28 19:01:31 +11:00
Martin Kroeker
ab32f832a8 fix stray blank on continuation line 2023-03-21 08:29:05 +01:00
Martin Kroeker
e359787e28 restore C/Z SPMV, SPR, SYR,SYMV 2023-03-21 07:43:03 +01:00
Martin Kroeker
c970717157 fix missing t in xgemmt rule
Co-authored-by: Alexis <35051714+amontoison@users.noreply.github.com>
2022-11-01 13:51:20 +01:00
Martin Kroeker
e7fd8d21a6 Add GEMMT based on looped GEMV 2022-10-26 15:33:58 +02:00
Martin Kroeker
a3e02742f2 Add USE_PERL fallback option for create script used with FUNCTION_PROFILE 2022-05-22 18:32:19 +02:00
Martin Kroeker
d2b5fbf80f Exclude some complex (LAPACK) functions when NO_LAPACK is set 2022-01-27 22:02:08 +01:00
Martin Kroeker
bd906e3410 fix copy-paste error in build rules for cblas_crotg and cblas_zrotg 2021-01-30 16:46:25 +01:00
Martin Kroeker
a8f249458d Build CBLAS interfaces for CROTG and ZROTG as well 2021-01-13 00:29:38 +01:00
Martin Kroeker
ac3e2a3fdd Add CBLAS interfaces for csrot and zdrot 2021-01-12 23:22:00 +01:00
Martin Kroeker
857afcc41d Use ifeq instead of ifdef for user-definable build options 2020-11-22 16:31:44 +01:00
Chen, Guobing
a7b1f9b1bb Implementation of BF16 based gemv
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-10-29 02:08:23 +08:00
Martin Kroeker
6a1f3e40af Remove debug printout of object list 2020-10-26 21:37:04 +01:00
Rajalakshmi Srinivasaraghavan
b5d30b390d Fix build issues with bfloat16
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
2020-10-13 11:00:22 -05:00
Martin Kroeker
1e7eb7b7a9 Fix typos in currently unused sections 2020-10-13 09:17:15 +02:00
Martin Kroeker
052f31bc3c Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:02:16 +02:00
Martin Kroeker
0f7d73ff6d Allow supporting only a subset of variable types 2020-10-11 14:53:26 +02:00
Chen, Guobing
deaeb6c5b8 Add bfloat16 based dot and conversion with single/double
1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
     shstobf16 -- convert single float array to bfloat16 array
     shdtobf16 -- convert double float array to bfloat16 array
     sbf16tos  -- convert bfloat16 array to single float array
     dbf16tod  -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-09-04 02:31:25 +08:00
Martin Kroeker
fee361ae64 fix another source of NO_CBLAS=0 surprise 2020-08-11 13:27:19 +02:00
Martin Kroeker
5dd14e3d48 Make building the bfloat16 functions conditional on option BUILD_HALF (#2590)
* make building the bfloat16 BLAS functions conditional on BUILD_HALF

* pass the BUILD_HALF option to gensymbol

* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Rajalakshmi Srinivasaraghavan
7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Guillaume Horel
af9ac0898a fix Makefile 2019-09-08 11:14:49 -04:00
Guillaume Horel
9b2f0323d6 update Makefile 2019-09-08 11:14:49 -04:00
Martin Kroeker
79cfc24a62 Add interface for ?sum (derived from ?asum) 2019-03-30 21:59:18 +01:00
Martin Kroeker
3d1e36d4cb Build CBLAS interfaces for I?MIN and I?MAX 2019-03-30 12:38:41 +01:00
Martin Kroeker
9cf22b7d91 Build cblas_iXamin interfaces 2018-06-23 13:27:30 +02:00
Martin Kroeker
7f546f54fa Add cblas_xerbla 2017-04-26 20:01:34 +02:00
Werner Saar
ae4ac6f984 removed obj-files, that are moved to lapack 3.7.0 2017-01-06 16:14:53 +01:00
Martin Koehler
39cc6b21d3 Add ATLAS-style ?geadd function 2015-02-16 13:46:20 +01:00
wernsaar
9e829ce98f enabled cblas gemm3m functions 2014-09-20 17:20:02 +02:00