OpenBLAS

mirror of https://github.com/OpenMathLib/OpenBLAS synced 2026-06-08 01:15:39 +08:00

Author	SHA1	Message	Date
Martin Kroeker	d7b0fccbb4	Enable SME-based kernels for VortexM4 with clang-based compilers only	2025-10-19 13:34:26 -07:00
Martin Kroeker	9bfc3612f9	Merge branch 'OpenMathLib:develop' into issue5414	2025-10-12 09:18:06 -07:00
Martin Kroeker	e40714cabd	Merge pull request #5450 from quic/topic/strmm_direct_sme1 Support for SME1 based strmm_direct kernel for cblas_strmm level 3 API	2025-10-11 15:20:19 -07:00
changjua	644ea07ef9	Support for SME1 based strmm_direct kernel for cblas_strmm level 3 API	2025-10-10 10:48:27 +08:00
Martin Kroeker	fc516af155	Merge branch 'develop' into issue5414	2025-10-01 14:12:59 -07:00
Martin Kroeker	e939c6c315	Merge pull request #5471 from quic/topic/ssymm_direct_sme1 Support for SME1 based ssymm_direct kernel for cblas_ssymm level 3 API	2025-10-01 06:22:36 -07:00
Rajendra Prasad Matcha	19268471cc	Support for SME1 based ssymm_direct kernel for cblas_ssymm level 3 API	2025-09-30 15:05:33 +05:30
Chip Kerchner	92f09a6a98	Add BF16 sbgemm on RISCV.	2025-09-22 14:32:43 +00:00
Martin Kroeker	cb6c4392a5	Make GEMM3M parameters available on 32bit X86-GENERIC	2025-09-10 22:44:14 +02:00
Martin Kroeker	202a7a0e2a	Separate VORTEXM4 from VORTEX and ARMV9SME	2025-08-18 01:45:40 -07:00
Chris Sidebottom	114316f361	Optimize SBGEMM / BGEMM for NEOVERSEV1 further This changes the kernels to pack full SVE vectors and reduces the overall complexity of the inner GEMM loop.	2025-08-11 09:25:13 +00:00
Masato Nakagawa	7e29f11396	Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1)	2025-07-29 18:54:36 +09:00
Martin Kroeker	c504aedca1	Merge pull request #5400 from Mousius/neoversev2-target Add NEOVERSEV2 target support	2025-07-25 15:47:06 +02:00
Chris Sidebottom	87247daadc	Add NEOVERSEV2 target support Did a quick run around to make `TARGET=NEVOERSEV2` build successfully. Fixes #5385	2025-07-24 12:40:31 +01:00
Chris Sidebottom	ea2faf0c9a	Add optimized BGEMM for NEOVERSEN2 target This re-uses the existing NEOVERSEN2 8x4 `sbgemm` kernel to implement `bgemm`.	2025-07-24 10:59:28 +00:00
Chris Sidebottom	740efd71c4	Add optimized BGEMM kernel for NEOVERSEV1 target This also improves the testing and generic kernel by re-using the BF16 conversion functions. Built on top of https://github.com/OpenMathLib/OpenBLAS/pull/5357 and derived from https://github.com/OpenMathLib/OpenBLAS/pull/5287 Co-authored-by: Ye Tao <ye.tao@arm.com>	2025-07-10 23:23:27 +00:00
Chris Sidebottom	f95e7b0e32	Add infrastructure for BGEMM Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places. Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287 Co-authored-by: Ye Tao <ye.tao@arm.com>	2025-07-08 16:22:41 +01:00
Masato Nakagawa	5253c8f165	Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for A64FX.	2025-06-30 21:35:16 +09:00
h-motoki	bba75d5e45	GEMM_PREFERED_SIZE parameter has been changed for A64FX.	2025-06-27 19:37:36 +09:00
Martin Kroeker	d96daa220d	Merge pull request #5290 from Srangrang/develop Add support for FP16 to openBLAS and shgemm on RISCV	2025-06-24 23:10:15 +02:00
davidz-ampere	aa90ab4142	Add support for Ampere AmpereOne processors	2025-06-24 00:12:34 -04:00
davidz-ampere	be68ef03b4	Add support for Ampere processors	2025-06-15 22:00:40 -04:00
gkdddd	670ec6f757	Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B Added HFLOAT16 support for RISCV64 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16 The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0 Related to issue #5279 Co-authored-by Linjin Li <linjin_li@163.com>	2025-06-03 20:14:30 +08:00
Srangrang	0a967797a1	Add FP16 support for RISCV	2025-05-27 14:34:57 +08:00
Martin Kroeker	a34b487f22	Remove spurious cast from Alpha and Cell's DEFAULT_ALIGN	2025-04-09 17:25:46 +02:00
Vaisakh K V	f66ca05b31	Merge branch 'develop' into topic/sgemm_direct_sme1	2025-02-13 14:54:37 +05:30
Vaisakh K V	d23eb3b93e	Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API * Added ARMV9SME target * Added SGEMM_DIRECT kernel based on SME1	2025-02-13 14:51:21 +05:30
Ye Tao	c748e6a338	optimized sbgemm kernel for neoverse-v1 (sve-256) Signed-off-by: Ye Tao <ye.tao@arm.com>	2025-02-05 10:06:37 +00:00
Aditya Tewari	4379a6fbe3	* checkpoint sbgemm for SVE-256	2025-02-03 12:49:49 +00:00
Martin Kroeker	926e56e389	Align GEMM3M parameters for GENERIC with ZGEMM and add P/Q/R	2024-11-14 14:04:25 -08:00
Martin Kroeker	a47b3c8867	Fix unroll parameter selection for MIPS64_GENERIC	2024-10-13 22:54:34 +02:00
Martin Kroeker	7c4f3638fd	switch PPCG4 SGEMM kernel to 4x4	2024-10-03 22:00:15 +02:00
gxw	48698b2b1d	LoongArch64: Rename core Use microarchitecture name instead of meaningless strings to name the core, the legacy core is still retained. 1. Rename LOONGSONGENERIC to LA64_GENERIC 2. Rename LOONGSON3R5 to LA464 3. Rename LOONGSON2K1000 to LA264	2024-09-29 09:35:21 +08:00
Chip Kerchner	b1737698db	Fix DEFAULTS in SBGEMM for POWER10. Also comparisons for SBGEMM unit test can be exactly due to epilison differences.	2024-08-13 07:01:21 -05:00
Piotr Kubaj	4c12090776	Fix build on FreeBSD/powerpc64*	2024-07-10 22:21:48 +00:00
gxw	6017ad7146	loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6	2024-05-08 10:10:26 +08:00
Usui, Tetsuzo	ca673ca774	Add GEMM_PREFERED_SIZE parameter for Neoverse V1	2024-04-12 17:21:14 +09:00
Martin Kroeker	93d975d8fd	Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset loongarch: Optimizing the performance of the GEMM on servers	2024-04-10 14:23:31 +02:00
gxw	d8c4ea8793	loongarch: Optimizing the performance of the GEMM on servers	2024-04-09 09:03:34 -04:00
Martin Kroeker	ba6d485102	Adjust SWITCH_RATIO for ZEN and apply GEMM_PREFERRED_SIZE	2024-04-04 18:52:38 +02:00
Martin Kroeker	584e87661d	set SWITCH_RATIO for Cortex-A76	2024-04-02 23:10:45 +02:00
Martin Kroeker	b925f61fb0	Add support for Cortex-A76	2024-04-02 19:44:17 +02:00
Rajalakshmi Srinivasaraghavan	f5b2a877e2	POWER9: Use default param values from POWER8 on AIX AIX uses KERNEL.POWER8 optimization on POWER9 and changing the default GEMM parameters in param.h to use POWER8 values on POWER9.	2024-03-20 10:17:49 -05:00
pengxu	4787a55c64	Optimized cgemm kernel 16x4 LASX for LoongArch	2024-02-21 15:28:47 +08:00
pengxu	fe3da43b7d	Optimized zgemm kernel 84 LASX, 44 LSX and cgemm kernel 8*4 LSX for LoongArch	2024-02-06 11:49:01 +08:00
Martin Kroeker	e5d2725e5a	Merge pull request #4185 from XiWeiGu/mips_enable_msa MIPS: Enable MSA	2024-02-05 15:50:16 +01:00
Sergei Lewis	1093def0d1	Merge branch 'risc-v' into develop	2024-01-29 11:11:39 +00:00
Martin Kroeker	889c5d026a	Merge pull request #4456 from kseniyazaytseva/riscv-rvv10 Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics	2024-01-26 13:31:09 +01:00
kseniyazaytseva	b193ea3d7b	Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics * Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores) * Fixed nrm2, axpby, ncopy, zgemv and scal kernels * Added zero size checks	2024-01-18 22:14:32 +03:00
Dirreke	ec89466e14	Add CSKY support	2024-01-16 23:45:06 +08:00

1 2 3 4 5 ...

328 Commits