OpenBLAS

mirror of https://github.com/OpenMathLib/OpenBLAS synced 2026-06-08 01:15:39 +08:00

Author	SHA1	Message	Date
Martin Kroeker	669c847ceb	support extra flag for NaN handling	2025-05-23 05:52:48 -07:00
Martin Kroeker	0b0bb9951d	Merge pull request #5265 from guoyuanplct/develop kernel/riscv64:Added support for omatcopy on RISCV64_ZVL256B	2025-05-17 05:08:47 -07:00
guoyuanplct	be9f7550b5	Format Code	2025-05-15 18:55:47 +08:00
guoyuanplct	4d213653d8	kernel/riscv64:Added support for omatcopy on riscv64.	2025-05-15 13:29:14 +08:00
Martin Kroeker	8afddc1a81	Merge pull request #5262 from guoyuanplct/develop kernel/riscv64:Fixed the bug of openblas_utest_ext failing in c/zgemv and some c/zgbmv tests:	2025-05-14 02:40:32 -07:00
guoyuanplct	9a7e3f102b	kernel/riscv64:Fixed the bug of openblas_utest_ext failing in c/zgemv and some c/zgbmv tests:	2025-05-14 00:09:26 +08:00
pengxu	a978ad3180	Loongarch64: add C functions of zgemm_ncopy_16	2025-05-13 16:09:12 +08:00
pengxu	0ccb050583	Loongarch64: fixed cgemm_ncopy_16_lasx	2025-05-13 16:08:33 +08:00
Martin Kroeker	5141a90993	Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222 ) * Fix ARMV9SME target and add support_sme1 code for MacOS * make sgemm_direct unconditionally available on all arm64 * build a (dummy) sgemm_direct kernel on all arm64 * Update dynamic_arm64.c	2025-05-10 22:39:32 +02:00
Martin Kroeker	151b74284e	Merge pull request #5203 from quic/fix-sgemmdirect-sme1 Add vector registers to clobber list to prevent compiler optimization.	2025-05-09 05:39:47 -07:00
Martin Kroeker	cba32d001a	Merge pull request #5245 from guoyuanplct/develop Optimized RVV_ZVL256B Implementation of zgemv_n	2025-05-01 03:04:38 -07:00
pengxu	f19e72c402	Loongarch64: fixed swap_lasx	2025-04-30 16:42:52 +08:00
pengxu	b471fa337b	Loongarch64: fixed snrm2_lasx	2025-04-30 16:42:36 +08:00
pengxu	57bb46bedf	Loongarch64: fixed rot_lasx	2025-04-30 16:42:22 +08:00
pengxu	6dc4ca2391	Loongarch64: fixed icamax_lasx	2025-04-30 16:42:12 +08:00
pengxu	b528b1b8ea	Loongarch64: fixed iamax_lasx	2025-04-30 16:41:58 +08:00
pengxu	ba9569e382	Loongarch64: fixed dot_lasx	2025-04-30 16:41:48 +08:00
pengxu	dc5fa29851	Loongarch64: fixed cscal_lasx	2025-04-30 16:41:39 +08:00
pengxu	a98dd6d911	Loongarch64: fixed copy_lasx	2025-04-30 16:41:28 +08:00
pengxu	d49319c2d2	Loongarch64: fixed cnrm2_lasx	2025-04-30 16:41:18 +08:00
pengxu	74c97ef814	Loongarch64: fixed cdot_lasx	2025-04-30 16:41:05 +08:00
pengxu	be525521ad	Loongarch64: fixed asum_lasx	2025-04-30 16:40:55 +08:00
pengxu	0cd5ca5527	Loongarch64: fixed amax_lasx	2025-04-30 16:40:44 +08:00
guoyuanplct	11ffc8680e	Format the code	2025-04-25 00:27:27 +08:00
guoyuanplct	7616c42095	Optimized RVV_ZVL256B Implementation of zgemv_n The implementation of zgemv_n using RVV_ZVL256B has been optimized. Compared to the previous implementation, it has achieved a 1.5x performance improvement.	2025-04-25 00:05:15 +08:00
abhishek-fujitsu	9c02cdb073	optimise dot using thread throttling for NEOVERSE V1	2025-04-23 22:35:05 +05:30
Martin Kroeker	d0e8fd6d40	Merge pull request #5239 from annop-w/gemv_n_sve Use SVE kernel for S/DGEMVN for SVE machines	2025-04-22 10:19:49 -07:00
Iha, Taisei	08b5c18d70	fixed a potential out-of-bounds on gemv.	2025-04-22 19:56:44 +09:00
Annop Wongwathanarat	e11744a411	Use SVE kernel for S/DGEMVN for SVE machines	2025-04-22 09:40:13 +00:00
Martin Kroeker	db0abfa907	Merge pull request #5238 from martin-frbg/revert5125 remove non-vectorized SGEMV transpose reduce path for POWER8, restoring optimizations frpm PR4880	2025-04-22 02:12:19 -07:00
Martin Kroeker	7389b6c483	Merge pull request #5237 from martin-frbg/revert5219 Fix and reinstate the Cooper Lake/Sapphire Rapids microkernel for non-transpose SBGEMV	2025-04-21 23:36:23 -07:00
Martin Kroeker	4ec62d7f73	remove non-vectorized code path for power8, restoring PR4880	2025-04-21 23:14:10 +02:00
Martin Kroeker	1df8738f27	Merge pull request #5235 from quickwritereader/issue_unaligned_ppc64le Explicit unaligned vector load/stores in PPC64LE GEMV kernels	2025-04-21 14:03:56 -07:00
Martin Kroeker	99d9f1ff38	Fix conditional	2025-04-21 22:55:45 +02:00
Martin Kroeker	96d80801bc	Reinstate the CooperLake microkernel	2025-04-21 22:53:26 +02:00
Martin Kroeker	2e4309315c	Merge pull request #5219 from martin-frbg/sbgemvn_cooper Temporarily disable the Cooper Lake/Sapphire Rapids microkernel for non-transpose SBGEMV	2025-04-20 07:29:20 -07:00
Ubuntu	0cc2485594	Explicit unaligned vector load/stores in PPC64LE GEMV kernels	2025-04-20 08:00:29 +00:00
Martin Kroeker	dd38b4e811	Merge pull request #5225 from annop-w/gemv_n Improve performance for SGEMVN on NEONVERSEN1	2025-04-17 01:54:10 -07:00
Martin Kroeker	0241d516f6	Merge pull request #5220 from iha-taisei/sdgemv_n_unroll Further performance improvements to non-transposed [SD]GEMV kernels for A64FX and Neoverse V1.	2025-04-16 12:55:55 -07:00
Annop Wongwathanarat	d535728803	Improve performance for SGEMVN on NEONVERSEN1	2025-04-16 09:54:30 +00:00
Usui, Tetsuzo	d711906e3e	Add symv kernels for arm64	2025-04-11 20:39:52 +09:00
Iha, Taisei	f1e628b889	Further performance improvements to [SD]GEMV.	2025-04-11 20:00:33 +09:00
Martin Kroeker	211dfd0754	disable the CooperLake microkernel as it produces wrong results	2025-04-10 22:21:57 +02:00
Martin Kroeker	b30dc9701f	Merge pull request #5215 from annop-w/gemv_t Use SVE kernel for S/DGEMVT for SVE machines	2025-04-10 13:06:07 -07:00
Martin Kroeker	2893d0add4	Merge pull request #5211 from guoyuanplct/develop Optimizing the Implementation of GEMV on the RISC-V V Extension	2025-04-10 09:43:03 -07:00
Annop Wongwathanarat	ec146157d3	Use SVE kernel for S/DGEMVT for SVE machines	2025-04-09 20:38:14 +00:00
Martin Kroeker	70865a894e	Merge pull request #5180 from ywwry66/openmp_use_cmake CMake: Pass `OpenMP` compiler and linker flags through CMake targets	2025-04-08 13:16:07 -07:00
lglglglgy	1ff303f36e	Optimizing the Implementation of GEMV on the RISC-V V Extension Specialized some scenarios, performed loop unrolling, and reduced the number of multiplications.	2025-04-08 21:18:00 +08:00
ColumbusAI	7bf848454d	Update zsum.c -- fixed spelling error to successfully compile spelling error where zsum_kernel is used and it should be zasum_kernel. Will not compile without fix.	2025-04-05 09:57:53 -07:00
Vaisakh K V	04915be829	Add vector registers to clobber list to prevent compiler optimization. SME based SGEMMDIRECT kernel uses the vector registers (z) and adding clobber list informs compiler not to optimize these registers.	2025-04-03 12:18:43 +05:30

1 2 3 4 5 ...

2481 Commits