OpenBLAS

mirror of https://github.com/OpenMathLib/OpenBLAS synced 2026-06-08 01:15:39 +08:00

Author	SHA1	Message	Date
Martin Kroeker	9bfc3612f9	Merge branch 'OpenMathLib:develop' into issue5414	2025-10-12 09:18:06 -07:00
Martin Kroeker	e40714cabd	Merge pull request #5450 from quic/topic/strmm_direct_sme1 Support for SME1 based strmm_direct kernel for cblas_strmm level 3 API	2025-10-11 15:20:19 -07:00
changjua	644ea07ef9	Support for SME1 based strmm_direct kernel for cblas_strmm level 3 API	2025-10-10 10:48:27 +08:00
Martin Kroeker	20f5ed1a94	Merge branch 'OpenMathLib:develop' into issue5414	2025-10-08 05:27:28 -07:00
Chris Sidebottom	37fc3bbca0	Add Infrastructure for SHGEMV This adds all the relevant bits and pieces to add a `shgemv` path as well as a future `hgemm`/`hgemv` path in a similar model to `sb` and `b` interfaces. I've also fixed a few bits and pieces around `shgemm` which didn't build in a few situations.	2025-10-07 15:03:24 +00:00
Martin Kroeker	fc516af155	Merge branch 'develop' into issue5414	2025-10-01 14:12:59 -07:00
Rajendra Prasad Matcha	19268471cc	Support for SME1 based ssymm_direct kernel for cblas_ssymm level 3 API	2025-09-30 15:05:33 +05:30
Martin Kroeker	e76c39099a	Add sgemm_direct_performant for ARM64	2025-08-18 01:47:17 -07:00
Martin Kroeker	39c90f9859	Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta SME1 based direct kernel (with alpha and beta) for cblas_sgemm level 3	2025-07-18 23:23:39 +02:00
Rajendra Prasad Matcha	eae0abfdb6	SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API.	2025-07-17 16:14:31 +05:30
Chris Sidebottom	e105411460	Add infrastructure for bgemv/bscal - Sets up all the various entrypoints for `bgemv` - Adds `bscal` for use in the `bgemv` interface - Adds test cases for comparing `sgemv` and `bgemv` - Adds generic kernels for `bgemv_n` and `bgemv_t` which are accurate enough to pass above tests	2025-07-15 14:48:57 +01:00
Chris Sidebottom	f95e7b0e32	Add infrastructure for BGEMM Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places. Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287 Co-authored-by: Ye Tao <ye.tao@arm.com>	2025-07-08 16:22:41 +01:00
Srangrang	0a967797a1	Add FP16 support for RISCV	2025-05-27 14:34:57 +08:00
Martin Kroeker	5141a90993	Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222 ) * Fix ARMV9SME target and add support_sme1 code for MacOS * make sgemm_direct unconditionally available on all arm64 * build a (dummy) sgemm_direct kernel on all arm64 * Update dynamic_arm64.c	2025-05-10 22:39:32 +02:00
Vaisakh K V	f66ca05b31	Merge branch 'develop' into topic/sgemm_direct_sme1	2025-02-13 14:54:37 +05:30
Vaisakh K V	d23eb3b93e	Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API * Added ARMV9SME target * Added SGEMM_DIRECT kernel based on SME1	2025-02-13 14:51:21 +05:30
Martin Kroeker	1829ac5b44	Add (dummy) declaration of SBROT_M	2025-01-25 17:32:11 +01:00
tingbo.liao	3c8df6358f	Further rearranged the rotm kernel for the different architectures. Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>	2025-01-22 11:41:12 +08:00
Honglin Zhu	90f041e348	Invoke the syscall to allow the use of amx tiles	2023-05-19 10:48:18 +08:00
Martin Kroeker	437c0bf2b4	Merge pull request #3843 from Mousius/switch-ratio Propagate SWITCH_RATIO to DYNAMIC_ARCH builds	2023-04-19 11:51:54 +02:00
Chris Sidebottom	32f2fafde7	Propagate SWITCH_RATIO to DYNAMIC_ARCH builds Previously dynamic builds were either using the default SWITCH_RATIO or one from the higher level architecture; this patch ensures the dynamic builds can use this parameter as well.	2023-04-17 15:34:12 +01:00
Martin Kroeker	75d5e3eaf5	Replace ifdefs and fix conditional definitions for including only selected precisions in DYNAMIC_ARCH	2023-02-23 23:08:33 +01:00
Martin Kroeker	ee44082827	fix DYNAMIC_ARCH builds that use only a subset of precisions	2023-02-22 00:27:18 +01:00
Honglin Zhu	4989e039a5	Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build	2022-10-27 14:10:26 +08:00
Honglin Zhu	b00d5b9746	New sbgemm implementation for Neoverse N2 1. Use UZP instructions but not gather load and scatter store instructions to get lower latency. 2. Padding k to a power of 4.	2022-10-26 15:09:41 +08:00
Wangyang Guo	1d83ca4bca	Small Matrix: support BFLOAT16 data type	2021-08-30 17:40:20 +08:00
Wangyang Guo	478d1086c1	Small Matrix: support DYNAMIC_ARCH build	2021-08-04 03:12:41 +00:00
Chen, Guobing	a7b1f9b1bb	Implementation of BF16 based gemv 1. Add a new API -- sbgemv to support bfloat16 based gemv 2. Implement a generic kernel for sbgemv 3. Implement an avx512-bf16 based kernel for sbgemv Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-10-29 02:08:23 +08:00
Martin Kroeker	cb839575ed	Convert the prototypes of the unimplemented BFLOAT16 functions to the new naming scheme	2020-10-12 14:44:33 +02:00
Martin Kroeker	ca31c32693	Rename "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-11 23:49:22 +02:00
Martin Kroeker	1c0b03efb4	Merge branch 'develop' into develop	2020-10-11 23:34:14 +02:00
Martin Kroeker	e396ec8b56	Allow building support for only a subset of variable types	2020-10-11 15:11:15 +02:00
Martin Kroeker	c5a32288c6	Work around sgemm_r/dgemm_r not being properly defined with BUILD_COMPLEX/BUILD_COMPLEX16	2020-09-26 23:24:37 +02:00
Martin Kroeker	b886bd672b	add defines for building a subset of types	2020-09-22 23:18:55 +02:00
Chen, Guobing	deaeb6c5b8	Add bfloat16 based dot and conversion with single/double 1. Added bfloat16 based dot as new API: shdot 2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot 3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod shstobf16 -- convert single float array to bfloat16 array shdtobf16 -- convert double float array to bfloat16 array sbf16tos -- convert bfloat16 array to single float array dbf16tod -- convert bfloat16 array to double float array 4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16 5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs 6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building 7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-09-04 02:31:25 +08:00
Martin Kroeker	75eeb265d7	[WIP] Refactor the driver code for direct SGEMM (#2782 ) Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available (on x86_64 targets only for now) in DYNAMIC_ARCH builds * Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt * Add direct_sgemm functions to the gotoblas struct in common_param.h * Move sgemm_direct_performant helper to separate file * Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h * (Conditionally) add sgemm_direct functions in setparam-ref.c	2020-08-19 14:51:09 +02:00
Martin Kroeker	5dd14e3d48	Make building the bfloat16 functions conditional on option BUILD_HALF (#2590 ) * make building the bfloat16 BLAS functions conditional on BUILD_HALF * pass the BUILD_HALF option to gensymbol * Pass BUILD_HALF as a compiler define for dynamic_arch builds	2020-05-01 09:58:30 +02:00
Rajalakshmi Srinivasaraghavan	67cc4b9e16	Fix warnings in clang and export symbol	2020-04-15 19:15:23 -05:00
Rajalakshmi Srinivasaraghavan	a87793e03c	Fix DYNAMIC_ARCH compilation errors	2020-04-15 09:09:50 -05:00
Rajalakshmi Srinivasaraghavan	ac6a22ae78	Update header	2020-04-14 22:58:39 -05:00
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	2020-04-14 14:55:08 -05:00
Martin Kroeker	5c42287c4f	Add declarations for ?sum and cblas_?sum	2019-03-30 21:58:03 +01:00
Martin Kroeker	7e860acd38	Correct zgeadd_k prototype	2017-11-29 19:57:35 +01:00
Isuru Fernando	ca17b4b75c	Fix complex support for MSVC headers	2017-07-28 11:50:29 +05:30
Zhang Xianyi	69363622a8	Fix DYNAMIC_ARCH=1 bug.	2015-10-27 05:10:40 +08:00
Martin Koehler	711ca33bc6	Improved Ximatcopy when lda==ldb. The Ximatcopy functions create a copy of the input matrix although they seem to work inplace. The new routines XIMATCOPY_K_YY perform the operations inplace if the leading dimension does not change.	2015-09-07 14:36:16 +02:00
Martin Koehler	39cc6b21d3	Add ATLAS-style ?geadd function	2015-02-16 13:46:20 +01:00
wernsaar	f1b9a4a1ca	Ref #454 : fixed bug in common_param.h	2014-09-23 11:34:29 +02:00
wernsaar	7aae4a62e7	enabled use of GEMM3M functions	2014-09-20 14:27:10 +02:00
wernsaar	125610d23b	allow to set custom value for ?GEMM_DEFAULT_UNROLL_MN, optimizations for syrk	2014-07-24 18:43:31 +02:00

1 2

57 Commits