shubham.chaudhari
8e289ecddc
Simplified thread throttling function in gemv
2025-03-18 13:24:05 +05:30
shubham.chaudhari
189dbbc04f
Add thread throttling for dynamic arch neoversev1
2025-03-18 13:14:30 +05:30
shubham.chaudhari
b6cb5ece58
Add thread throttling profile for DGEMV on NEOVERSEV1
2025-03-18 13:14:30 +05:30
Martin Kroeker
7338a473a7
Merge pull request #5150 from Harishmcw/WoA-Experiments
...
Redefined threading logic for GESV and GEMV on WoA
2025-03-03 21:45:53 +01:00
Martin Kroeker
09ba099461
make throttling code conditional on SMP
2025-02-25 12:10:48 +01:00
Harishmcw
030ae1fd97
Redefined threading logic for WoA
2025-02-25 15:40:39 +05:30
Marek Michalowski
650a062e19
Add thread throttling profile for SGEMV on NEOVERSEV2
2025-02-20 10:28:31 +00:00
Marek Michalowski
4d5b13f765
Add thread throttling profile for SGEMV on NEOVERSEV1
2025-01-22 10:50:04 +00:00
Martin Kroeker
d2fc4f3b4d
Increase multithreading threshold by a factor of 50
2024-01-17 20:59:24 +01:00
Martin Kroeker
1dea57ab25
Revert PR #3250 (shortcut without buffer allocation) as it is unsafe on some x86_64
2021-07-14 20:32:57 +02:00
Martin Kroeker
7bb59fceb7
Clean up some warnings
2021-07-11 16:00:29 +02:00
Martin Kroeker
f0e7345fb8
Add shortcut for small-size gemv_n with increments of one
2021-05-26 22:02:34 +02:00
Chen, Guobing
a7b1f9b1bb
Implementation of BF16 based gemv
...
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv
Signed-off-by: Chen, Guobing <guobing.chen@intel.com >
2020-10-29 02:08:23 +08:00
Martin Kroeker
933896a1d0
Use blasabs to switch between abs and labs as needed for INTERFACE64
2018-08-04 20:06:49 +02:00
Jerome Robert
1fe3aab047
Use GEMM_MULTITHREAD_THRESHOLD as a number of ops
...
...not a matrix size. For GEMM_MULTITHREAD_THRESHOLD=4
(the default value) this does not change anything but
for other values it make the GEMM and GEMV thresholds
changing in the same way.
Close #742
2016-01-24 11:31:40 +01:00
Jerome Robert
87a2ccc37c
Factorize MAX_STACK_ALLOC code to common_stackalloc.h
...
Ref #727
2016-01-08 16:03:52 +01:00
Jerome Robert
f9890a6452
Fix compilation when MAX_STACK_ALLOC is not set
...
Close #722
2015-12-31 14:43:09 +01:00
Zhang Xianyi
640cccc2b1
Refs #697 . Fixed gemv bug for Windows.
...
Thank matzeri's patch.
2015-11-30 15:19:45 -06:00
Zhang Xianyi
dcd5ba4443
Merge branch 'cmake' of https://github.com/hpanderson/OpenBLAS into hpanderson_cmake
2015-07-22 04:06:39 +08:00
Jerome Robert
ab567d8443
gemv: Ensure stack buffer is large enough to handle memory alignment
...
Ref #478
2015-04-24 10:12:49 +02:00
Zhang Xianyi
847e19c04e
Refs #478,#482, Enable stack alloc for s/dgemv_t.(revert 9798491)
2015-04-20 23:22:40 -05:00
Zhang Xianyi
fd9fd42936
Refs #478 , #482 . Fixed bug on previous commit.
2015-04-13 23:22:27 -05:00
Zhang Xianyi
9798481979
Refs #478 , #482 . Fix segfault bug for gemv_t with MAX_ALLOC_STACK flag.
...
For gemv_t, directly use malloc to create the buffer.
2015-04-13 19:45:27 -05:00
Hank Anderson
e74462a3f5
Moved declarations to start of functions to satisfy MSVC C89 implementation.
2015-02-11 11:16:57 -06:00
Jerome Robert
b17ccb4c5c
Fix a segfault in gemv when MAX_STACK_ALLOC is set
...
* stack_alloc_size is needed after the implementation call
but it may be overwritten if it's optimized to a register,
because some gemv implementation (ex: dgemv_n.S) do not
restore all register (ex: r10).
* do the same in ger.c for the same reasons even if the bug
has not been observed.
2015-01-29 09:55:57 +01:00
Jerome Robert
e9d9a8eae3
Allow to do gemv and ger buffer allocation on the stack
...
ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.
Fix #478
2014-12-27 14:33:12 +01:00
wernsaar
f511807fc0
modified multithreading threshold
2014-09-08 12:27:32 +02:00
wernsaar
d1800397f5
optimized interface/gemv.c for multithreading
2014-09-02 17:36:07 +02:00
wernsaar
f4ff889491
updated interface/gemv.c for multithreading
2014-09-02 16:30:04 +02:00
wernsaar
b985cea65d
adjust number of threads for sgemv and dgemv
2014-07-15 16:04:46 +02:00
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com >
2014-06-27 12:05:18 -07:00
Xianyi Zhang
342bbc3871
Import GotoBLAS2 1.13 BSD version codes.
2011-01-24 14:54:24 +00:00