Commit Graph

9327 Commits

Author SHA1 Message Date
davidz-ampere
aa90ab4142 Add support for Ampere AmpereOne processors 2025-06-24 00:12:34 -04:00
davidz-ampere
84730068af reduce duplicate kernel code 2025-06-17 03:05:34 -04:00
davidz-ampere
be68ef03b4 Add support for Ampere processors 2025-06-15 22:00:40 -04:00
Martin Kroeker
f1097d1cba Merge pull request #5306 from martin-frbg/lapack1131
Fix missing initialization leading to bypassing corner cases in C/ZGEQP3RK (Reference-LAPACK PR #1131)
2025-06-15 15:00:05 -07:00
Martin Kroeker
bad47bd024 Fix too strict leading dimensions check in LAPACKE_?gesdd_work (Reference-LAPACK PR #1126) (#5307)
* relax leading dimensions check (Reference-LAPACK PR #1126)
2025-06-15 22:47:14 +02:00
Martin Kroeker
7f3093a0ad Merge pull request #5305 from martin-frbg/lapack1135
Fix 2nd dimension used by LAPACKE_c/zunmlq in NaN check and transposition (Reference-LAPACK PR #1135)
2025-06-15 12:41:03 -07:00
Martin Kroeker
1804ff58d7 fix missing initialization 2025-06-15 19:35:34 +02:00
Martin Kroeker
906b9df316 fix missing initialization 2025-06-15 19:34:01 +02:00
Martin Kroeker
f4e5177050 fix dimension used in nancheck (Reference-LAPACK PR 1135) 2025-06-15 19:16:58 +02:00
Martin Kroeker
2a6beac88f fix dimension used in transposition (Reference-LAPACK PR 1135) 2025-06-15 19:14:53 +02:00
Martin Kroeker
d8a2324699 fix dimension used in nancheck (Reference-LAPACK PR 1135) 2025-06-15 19:13:23 +02:00
Martin Kroeker
874744976c fix dimension used in nancheck (Reference-LAPACK PR 1135) 2025-06-15 19:11:26 +02:00
Martin Kroeker
0ea173ec8c Merge pull request #5304 from martin-frbg/fixgemmtr_if
fix source file used for sbgemmt/sbgemmtr in CMake builds
2025-06-14 22:53:10 -07:00
Martin Kroeker
5e393f207c fix source file used for sbgemmt/sbgemmtr 2025-06-15 00:06:34 +02:00
Martin Kroeker
dbd5643d37 Merge pull request #5302 from martin-frbg/zscal_mips_3
mips64 SICORTEX: temporarily change default C/ZSCAL to the non-asm implementation
2025-06-13 13:27:21 -07:00
Martin Kroeker
e338d34ce1 fix path 2025-06-13 13:37:15 +02:00
Martin Kroeker
d36093d084 temporarily change default C/ZSCAL to the non-asm implementation 2025-06-13 13:32:02 +02:00
Martin Kroeker
cc4b04a684 Merge pull request #5301 from martin-frbg/zscal_mips_2
kernel/mips(64): Fix cscal and zscal
2025-06-13 02:52:59 -07:00
Martin Kroeker
b3c90564d7 resync with the generic arm version for inf/nan handling 2025-06-13 00:54:27 -07:00
Martin Kroeker
6bdc7f9eb7 Merge pull request #5300 from martin-frbg/fixup5296
kernel/riscv64: Fix cscal/zscal for riscv64_generic
2025-06-12 15:02:22 -07:00
Martin Kroeker
63272b6c82 Merge pull request #5299 from martin-frbg/x86_64-ssezscal
Disable the default SSE kernels for x86_64 CSCAL/ZSCAL for now
2025-06-12 15:01:31 -07:00
Martin Kroeker
73af02b89f use dummy2 as Inf/NAN handling flag 2025-06-12 13:33:56 -07:00
Martin Kroeker
549a9f1dbb Disable the default SSE kernels for CSCAL/ZSCAL for now 2025-06-12 18:54:33 +02:00
Martin Kroeker
ca1ce84ee5 Merge pull request #5298 from martin-frbg/fixup5281
Fix PR5281 "kernel/arm64: fix cscal/zscal"
2025-06-12 09:49:55 -07:00
Martin Kroeker
58eeb9041c fix handling of dummy2 2025-06-12 03:03:01 -07:00
Martin Kroeker
7c77537b25 Merge pull request #5297 from martin-frbg/zscal_x86_sparc
kernel/(x86|sparc): Fix cscal and zscal by reverting to the generic C kernels
2025-06-12 01:10:35 -07:00
Martin Kroeker
63287e1855 Merge pull request #5296 from martin-frbg/zscal_riscv
kernel/riscv64: Fix cscal and zscal
2025-06-12 01:10:15 -07:00
Martin Kroeker
d2855d3dab Merge pull request #5285 from martin-frbg/zscal_zarch
kernel/zarch: Fix cscal and zscal
2025-06-12 01:09:52 -07:00
Martin Kroeker
1408be5fe0 Merge pull request #5282 from martin-frbg/zscal_power
kernel/power: Fixed cscal and zscal
2025-06-12 01:04:38 -07:00
Martin Kroeker
1589d0b21e Merge pull request #5281 from martin-frbg/zscal_arm64
kernel/arm64: fixed cscal and zscal
2025-06-12 01:04:18 -07:00
Martin Kroeker
a86419fb66 Merge pull request #5280 from martin-frbg/zscal_x86_64
kernel/x86_64: fixed cscal and zscal
2025-06-12 01:03:55 -07:00
Martin Kroeker
11ff18bb0f Merge pull request #5081 from XiWeiGu/kernel_generic_fixed_cscal_zscal
kernel/generic: Fixed cscal and zscal
2025-06-12 01:03:00 -07:00
Martin Kroeker
2e2691b34b Merge pull request #5078 from XiWeiGu/la64_fixed_cscal_zscal
LoongArch64: fixed cscal and zscal
2025-06-12 01:02:19 -07:00
Martin Kroeker
f4194fc65f Merge branch 'develop' into la64_fixed_cscal_zscal 2025-06-11 14:28:41 -07:00
Martin Kroeker
e12132abd4 Use generic C/ZSCAL kernels to address inf/nan handling for now 2025-06-11 22:12:10 +02:00
Martin Kroeker
1cefbea7ea Use generic SCAL kernels to address inf/nan handling for now 2025-06-11 22:10:46 +02:00
Martin Kroeker
f18b7a46bf add dummy2 flag handling for inf/nan agnostic zeroing 2025-06-11 01:47:43 -07:00
Martin Kroeker
fe220a0d7d Merge pull request #5291 from guoyuanplct/develop
kernel/riscv64:fixed the performance problem in RISCV64_ZVL256 when OPENBLAS_K is small
2025-06-09 23:42:04 -07:00
Martin Kroeker
bbdc265798 Merge pull request #5294 from arnej27959/arnej/fix-arm64-register
Accumulate results in output register explicitly
2025-06-09 23:41:12 -07:00
Arne Juul
5442aff218 Accumulate results in output register explicitly 2025-06-09 19:03:22 +00:00
guoyuanplct
83fcab7578 Merge branch 'develop' of https://github.com/guoyuanplct/OpenBLAS into develop 2025-06-05 21:58:13 +08:00
guoyuanplct
2ae019161a fixed the performance problem in RISCV64_ZVL256 when OPENBLAS_K is small 2025-06-05 21:53:03 +08:00
Martin Kroeker
02267d86f5 Merge pull request #5288 from guoyuanplct/develop
kernel/riscv64:Optimized the implementation of axpby on TARGET=RISCV64_ZVL256B.
2025-05-29 07:38:38 -07:00
guoyuanplct
d2003dc886 del lines 2025-05-29 18:38:22 +08:00
guoyuanplct
45fd2d9b07 Optimized the axpby function. 2025-05-29 17:50:44 +08:00
Martin Kroeker
fb8dc8ff5c Add dummy2 flag handling 2025-05-25 14:47:06 -07:00
Martin Kroeker
cf06250d36 add handling of dummy2 flag 2025-05-24 06:06:24 -07:00
Martin Kroeker
28f8fdaf0f support flag for NaN/Inf handling and fix scaling of NaN/Inf values 2025-05-23 14:59:59 +02:00
Martin Kroeker
669c847ceb support extra flag for NaN handling 2025-05-23 05:52:48 -07:00
Martin Kroeker
0163143fdd Merge pull request #5278 from martin-frbg/fixup5276
Fix compilation with pre-C99 compilers
2025-05-22 00:32:29 -07:00