Commit Graph

307 Commits

Author SHA1 Message Date
Martin Kroeker
ef27ec6bed Add pragma to limit optimization level 2026-02-22 13:42:41 +01:00
Martin Kroeker
46b963b9a0 Use generic C kernels for SCAL on FreeBSD 2026-02-19 22:46:03 +01:00
Martin Kroeker
601bdde8ec fix stack location of dummy2 flag 2026-01-27 22:40:50 +01:00
Martin Kroeker
d53d2b11a9 fix stack location of dummy2 flag 2026-01-27 22:39:37 +01:00
Amrita H S
b53d18b3ad Fixing warning messages in dgemm and dgemv kernels
Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2026-01-06 10:20:56 -06:00
Rajalakshmi Srinivasaraghavan
2283fcbbe7 POWER10: Reduce sgemm loop unrolling
With GCC 14, unnecessary move and lxvp instructions appear when unrolling the inner loop for larger sizes.
Reducing the loop unroll factor restores performance to GCC 11.
2026-01-04 17:01:01 -06:00
Martin Kroeker
f7b7296bff Fix compilation with LLVM 2025-11-22 16:07:34 +01:00
Martin Kroeker
0c59ae0b45 Merge pull request #5453 from pratiklp00/dgemm_optimization
Dgemm loop unroll and 4x1, 4x2 dgemv VSX implementation for power10.
2025-10-28 16:51:41 -07:00
pratiklp00
6637352260 remmove spacing 2025-10-14 00:06:04 -05:00
pratiklp00
e2399be6d2 add macro 2025-10-08 23:24:41 -05:00
Martin Kroeker
46fc6c0794 fix unspecified array size in clobber list 2025-10-08 08:23:24 +02:00
pratiklp00
d7b11605d1 fix build issue 2025-09-29 02:02:13 -05:00
Dan Horák
f5ec1c4e53 fix typos in Power8 routines
Fixes: https://github.com/OpenMathLib/OpenBLAS/pull/5448
2025-09-26 16:54:03 +02:00
Dan Horák
681af71d95 drop gcc 15 workaround
As the assembler routines has correctly specified parameter we can drop
the previously applied workaround in https://github.com/OpenMathLib/OpenBLAS/pull/5409.
2025-09-26 16:52:17 +02:00
Martin Kroeker
c92f7f6bb2 Merge pull request #5448 from martin-frbg/issue5372-2
Fix clobber list entries for arrays in POWER kernels that use inline asm
2025-09-26 02:24:50 -07:00
Martin Kroeker
14c9dcaac7 Use generic kernels for SCAL to fix corner cases of Inf/NAN 2025-09-25 20:31:12 +02:00
pratiklp00
16be28af7c dgemm loop unroll and 4x1 4x2 dgemv implimentation 2025-09-21 23:00:21 -05:00
Martin Kroeker
1d5279fd29 Fix clobber list entries for arrays in inline asm 2025-09-17 07:02:18 -07:00
Martin Kroeker
a3b9c933c5 mark xbuffer as volatile to work around gcc15.1 optimizer bug 2025-07-30 17:05:36 +02:00
Martin Kroeker
cf06250d36 add handling of dummy2 flag 2025-05-24 06:06:24 -07:00
Martin Kroeker
4ec62d7f73 remove non-vectorized code path for power8, restoring PR4880 2025-04-21 23:14:10 +02:00
Ubuntu
0cc2485594 Explicit unaligned vector load/stores in PPC64LE GEMV kernels 2025-04-20 08:00:29 +00:00
Martin Kroeker
77fba0f400 Fix "dummy2" flag handling 2025-02-22 20:09:21 +01:00
Martin Kroeker
81eed868b6 Restore the non-vectorized code from before PR4880 for POWER8 2025-02-12 09:07:20 +01:00
Martin Kroeker
98b5ef929c Restore the non-vectorized code from before PR4880 for POWER8 2025-02-12 09:04:22 +01:00
Martin Kroeker
d7036cfd74 Remove trailing blanks that break the cmake parser 2025-01-27 09:32:17 +01:00
tingbo.liao
3c8df6358f Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2025-01-22 11:41:12 +08:00
Sergey Fedorov
229efa42ff scal.S: use r11 on 32-bit Darwin on powerpc 2025-01-05 00:31:27 +08:00
Sergey Fedorov
81e1be8d90 Revert "temporarily disable the default S/DSCAL kernel"
This reverts commit 9b9c0aa5c9.
2025-01-04 22:54:54 +08:00
Martin Kroeker
9b9c0aa5c9 temporarily disable the default S/DSCAL kernel 2025-01-03 21:36:46 +01:00
Ayappan Perumal
020cce1068 Fix build issues with gcc compiler as well 2024-10-23 04:24:06 -05:00
Ayappan Perumal
b6ec73e77c Fix AIX build 2024-10-21 07:38:03 -05:00
Chip Kerchner
ab71a1edf2 Better VSX. 2024-10-17 08:25:02 -05:00
Chip Kerchner
36bd3eeddf Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power). 2024-10-13 13:46:11 -05:00
Martin Kroeker
e52d9b4cf1 Merge pull request #4928 from austinpagan/czgemm_in_c
CGEMM & ZGEMM using C code, Power only, P10 only.
2024-10-09 20:26:21 +02:00
Gordon Fossum
0b7fb5c791 CGEMM & ZGEMM using C code. 2024-10-09 09:42:23 -05:00
Martin Kroeker
c9e92348a6 Handle inf/nan if dummy2 flag is set 2024-10-06 19:57:17 +02:00
Martin Kroeker
d714013ab9 change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds 2024-10-03 22:04:20 +02:00
Chip Kerchner
1a7b8c650d Merge branch 'develop' into betterPowerGEMVTail 2024-08-01 14:59:12 -05:00
Martin Kroeker
f5d04318e3 Merge branch 'OpenMathLib:develop' into scalfixes 2024-07-21 13:43:43 +02:00
Martin Kroeker
73f8866ffb make NAN handling depend on DUMMY2 parameter 2024-07-21 13:42:47 +02:00
Hong Bo Peng
db98f8753f Try to fix LAPACK testing failures on P7.
1. Remove the FADD insn from the GEMV Transpose code.
  2. Remove the FADD insn from GEMM and ZGEMM code.
  3. Reorder the compution of the Imaginary part in ZGEMM code.
2024-07-19 02:08:19 -04:00
Martin Kroeker
b9bfc8ce09 make NAN handling depend on dummy2 parameter 2024-07-17 23:29:50 +02:00
Chip Kerchner
ba47c7f4f3 Vectorize reduction stage of sgemv_t. 2024-07-16 15:57:24 -05:00
Chip Kerchner
cb154832f8 Vectorize SBGEMM incopy - 4x faster. 2024-07-09 13:10:03 -05:00
Martin Kroeker
2a5fe97e3b temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN 2024-06-27 16:21:57 +02:00
Martin Kroeker
7f8f037a36 handle INF and NAN in input 2024-06-22 16:03:30 +02:00
Martin Kroeker
f1248b849d handle INF and NAN in input 2024-06-22 15:55:29 +02:00
Rajalakshmi Srinivasaraghavan
e112191b54 POWER: Fix issues in zscal to address lapack failures
This patch fixes following lapack failures with clang compiler on POWER.
zed.out: ZVX:   18 out of  5190 tests failed to pass the threshold
zgd.out: ZGV drivers:     25 out of   1092 tests failed to pass the threshold
zgd.out: ZGV drivers:      6 out of   1092 tests failed to pass the threshold
2024-05-22 08:00:06 -05:00
Martin Kroeker
aa259b141d Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix
Fix regression SAXPY when compiler with OpenXL compiler.
2024-05-18 19:11:25 +02:00