OpenBLAS

mirror of https://github.com/OpenMathLib/OpenBLAS synced 2026-05-31 00:45:48 +08:00

Author	SHA1	Message	Date
yuanjia	e955736005	Fix: Remove invalid parentheses after endif	2026-02-04 09:57:48 +08:00
Chris Sidebottom	2c3cdaf74e	Optimized BGEMV for NEOVERSEV1 target - Adds bgemv T based off of sbgemv T kernel - Adds bgemv N which is slightly alterated to not use Y as an accumulator due to the output being bf16 which results in loss of precision - Enables BGEMM_GEMV_FORWARD to proxy BGEMM to BGEMV with new kernels	2025-07-23 10:51:41 +01:00
Chris Sidebottom	740efd71c4	Add optimized BGEMM kernel for NEOVERSEV1 target This also improves the testing and generic kernel by re-using the BF16 conversion functions. Built on top of https://github.com/OpenMathLib/OpenBLAS/pull/5357 and derived from https://github.com/OpenMathLib/OpenBLAS/pull/5287 Co-authored-by: Ye Tao <ye.tao@arm.com>	2025-07-10 23:23:27 +00:00
Srangrang	9f13b2c6ac	style: modify HALF to BFLOAT16 in benchmark folder	2025-06-15 20:57:05 +08:00
gkdddd	670ec6f757	Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B Added HFLOAT16 support for RISCV64 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16 The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0 Related to issue #5279 Co-authored-by Linjin Li <linjin_li@163.com>	2025-06-03 20:14:30 +08:00
Martin Kroeker	0f8ff82592	Add build notes for Windows and flang from gh Discussion 5008	2024-12-06 01:35:42 -08:00
gxw	ffaa5765a4	Bench: Add omatcopy	2024-10-18 11:07:52 +08:00
Martin Kroeker	4460d3ee7f	re-enable the sgesdd benchmark	2024-07-26 15:07:52 +02:00
Evgeni Burovski	cd3c167c28	ignore sgesdd failure on codspeed In https://github.com/OpenMathLib/OpenBLAS/issues/4776 we're hitting ** On entry to SLASCL parameter number 4 had an illegal value on codspeed, but not outside (either locally or on github runners)	2024-07-03 12:35:26 +03:00
Evgeni Burovski	28fb95d0be	BENCH: actually add gemv/gbmv f2py wrappers	2024-06-27 12:38:47 +03:00
Evgeni Burovski	11a0c56166	BENCH: add BLAS level 2 gemv and gbmv	2024-06-27 11:14:22 +03:00
Evgeni Burovski	400cf9f63d	restore the problem sizes for codspeed benchmarks	2024-06-24 16:47:20 +03:00
Evgeni Burovski	37a854718b	BENCH: sync codspeed-benchmarks with BLAS-benchmarks	2024-06-24 14:33:06 +03:00
Evgeni Burovski	81cf0db047	DOC: add a readme for benchmarks/pybench	2024-05-18 15:30:00 +03:00
Evgeni Burovski	9f28161837	BENCH: add benchmarks using codspeed.io	2024-05-18 15:25:16 +03:00
Martin Kroeker	3f1ec74fe7	Fix OPENBLAS_LOOPS assignment	2024-03-20 19:22:48 +01:00
Martin Kroeker	fe39c891a6	Fix OPENBLAS_LOOPS assignment	2024-03-20 19:21:37 +01:00
Martin Kroeker	ffcbaca167	Fix OPENBLAS_LOOPS assignment	2024-03-20 19:20:16 +01:00
Martin Kroeker	05d0438c25	Fix OPENBLAS_LOOPS assignment	2024-03-20 19:19:11 +01:00
Sergei Lewis	3ffd6868d7	Merge branch 'develop' into dev/slewis/merge-from-riscv	2024-02-01 11:29:41 +00:00
gxw	3d4dfd0085	Benchmark: Rename the executable file names for {sc/dz}a{min/max} No interface named {c/z}a{min/max}, keeping it would cause ambiguity	2024-01-30 11:33:01 +08:00
Sergei Lewis	1093def0d1	Merge branch 'risc-v' into develop	2024-01-29 11:11:39 +00:00
Shiyou Yin	f745f02f35	benchmark: Fix missing colons in outputs of ./strsv.goto	2023-11-24 14:55:18 +08:00
martin-frbg	fec4867748	Fix file permissions (issue 4095)	2023-07-23 20:31:55 +02:00
Chris Sidebottom	ec334e69dc	Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1 This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance. After #3868, the SVE kernels represent a pretty good boost. This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).	2023-04-17 17:38:42 +01:00
Xianyi Zhang	e5313f53d5	Merge branch 'develop' of https://github.com/HellerZheng/OpenBLAS_riscv_x280 into HellerZheng-develop	2022-12-03 12:00:52 +08:00
Bart Oldeman	bae45d94d1	scal benchmark: eliminate y, move init/timing out of loop Removing y avoids cache effects (if y is the size of the L1 cache, the main array x is removed from it). Moving init and timing out of the loop makes the scal benchmark behave like the gemm benchmark, and allows higher accuracy for smaller test cases since the loop overhead is much smaller than the timing overhead. Example: OPENBLAS_LOOPS=10000 ./dscal.goto 1024 8192 1024 on AMD Zen2 (7532) with 32k (4k doubles) L1 cache per core. Before From : 1024 To : 8192 Step = 1024 Inc_x = 1 Inc_y = 1 Loops = 10000 SIZE Flops 1024 : 5627.08 MFlops 0.000000 sec 2048 : 5907.34 MFlops 0.000000 sec 3072 : 5553.30 MFlops 0.000001 sec 4096 : 5446.38 MFlops 0.000001 sec 5120 : 5504.61 MFlops 0.000001 sec 6144 : 5501.80 MFlops 0.000001 sec 7168 : 5547.43 MFlops 0.000001 sec 8192 : 5548.46 MFlops 0.000001 sec After From : 1024 To : 8192 Step = 1024 Inc_x = 1 Inc_y = 1 Loops = 10000 SIZE Flops 1024 : 6310.28 MFlops 0.000000 sec 2048 : 6396.29 MFlops 0.000000 sec 3072 : 6439.14 MFlops 0.000000 sec 4096 : 6327.14 MFlops 0.000001 sec 5120 : 5628.24 MFlops 0.000001 sec 6144 : 5616.41 MFlops 0.000001 sec 7168 : 5553.13 MFlops 0.000001 sec 8192 : 5600.88 MFlops 0.000001 sec We can see the L1->L2 switchover point is now where it should be, and the number of flops for L1 is more accurate.	2022-11-29 08:02:45 -05:00
HellerZheng	943372bdf5	Merge branch 'develop' into develop	2022-11-18 10:12:46 +08:00
Martin Kroeker	f92dd6e303	change line endings from CRLF to LF	2022-11-17 10:18:36 +01:00
Heller Zheng	bef47917bd	Initial version for riscv sifive x280	2022-11-15 00:06:25 -08:00
Bart Oldeman	9e6b060bf3	Fix comment. It stores the pointer, not an offset (that would be an alternative approach).	2022-10-20 20:11:09 -04:00
Bart Oldeman	9959a60873	Benchmarks: align malloc'ed buffers. Benchmarks should allocate with cacheline (often 64 bytes) alignment to avoid unreliable timings. This technique, storing the offset in the byte before the pointer, doesn't require C11's aligned_alloc for compatibility with older compilers. For example, Glibc's x86_64 malloc returns 16-byte aligned buffers, which is not sufficient for AVX/AVX2 (32-byte preferred) or AVX512 (64-byte).	2022-10-20 13:28:20 -04:00
Marius Hillenbrand	f119e26354	Fix flipped indices in benchmark for gemv Fixes #3439	2021-11-03 12:45:09 +01:00
Martin Kroeker	14e33e0f7e	Handle OPENBLAS_LOOPS in SYR2 benchmark	2021-07-10 21:27:53 +02:00
Martin Kroeker	4ed99c2ce3	Merge pull request #3292 from martin-frbg/syrk_limit Add lower limit for multithreading in xSYRK	2021-07-07 20:46:28 +02:00
Martin Kroeker	a4543e4918	Handle OPENBLAS_LOOP	2021-07-04 16:59:43 +02:00
Martin Kroeker	dcfc5cf714	Handle OPENBLAS_LOOPS for more stable results	2021-07-01 17:39:37 +02:00
Martin Kroeker	06e3b07ecb	Handle OPENBLAS_LOOPS and OPENBLAS_TEST options	2021-07-01 17:38:45 +02:00
Martin Kroeker	1f8bda71b9	Add OPENBLAS_LOOPS support to potrf/potrs/potri benchmark	2021-06-26 23:46:00 +02:00
Martin Kroeker	d57c681a6d	Fix compilation on older OSX versions	2021-03-26 22:29:29 +01:00
Martin Kroeker	38dcf3454b	Support timing Apple M1	2021-03-02 17:50:55 +01:00
Qiyu8	f917c26e83	Refractoring remaining benchmark cases.	2020-10-26 10:25:05 +08:00
Qiyu8	dd6ebdfdab	Refactor the performance measurement system	2020-10-23 10:32:03 +08:00
Martin Kroeker	7ae9e8960e	Change "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-12 00:08:29 +02:00
Martin Kroeker	5464eb13ea	Change ifdef linux to __linux for C11 compatibility	2020-09-30 22:59:41 +02:00
Martin Kroeker	6f8fad87c5	Use POSIX2001 clock.gettime for higher resolution	2020-09-05 19:44:01 +02:00
Martin Kroeker	ced49466f0	Use the fortran compiler to link LAPACK-related benchmarks to fix linking problems with (at least) the AMD version of flang that creates dependencies on more than just the fortran runtime.	2020-05-29 13:35:51 +02:00
Martin Kroeker	6e270f91ec	add support for RETURN_BY_STACK semantics, e.g. clang	2020-05-29 13:29:10 +02:00
Rajalakshmi Srinivasaraghavan	ce90e2bd3f	Include shgemm in benchtest This patch is to enable benchtest for half precision gemm when BUILD_HALF is set during make.	2020-05-11 09:57:46 -05:00
l00536773	6b7ef6543a	[OpenBLAS]: benchmark error of potrf [description]: when the matrix size goes higher than 5800 during the cpotrf test, error info, such as "Potrf info = 5679", will be returned on ARM64 and x86 machines. Uplo = L & F. [solution]: changed the func for building the matrix so that the complex Hermitian matrix can stay positive definite during the computation. [dts]:	2020-04-16 10:55:10 +08:00

1 2 3 4

189 Commits