Commit Graph

9063 Commits

Author SHA1 Message Date
Harishmcw
030ae1fd97 Redefined threading logic for WoA 2025-02-25 15:40:39 +05:30
Harish-Gits
daf16b8229 Adjusted GESV threading logic for optimal performance on WoA 2025-02-12 19:25:25 +05:30
Martin Kroeker
f42ce7067f Merge pull request #5116 from martin-frbg/issue5110
Handle INCX=0 in ?NRM2
2025-02-09 23:17:20 +01:00
Martin Kroeker
7478c10268 Merge branch 'OpenMathLib:develop' into issue5110 2025-02-09 21:40:02 +01:00
Martin Kroeker
c54f5417cc Merge pull request #5118 from martin-frbg/zrot_utestext
Disable extended utests for CSROT/ZDROT that invoke undefined behavior
2025-02-09 21:39:30 +01:00
Martin Kroeker
57208b8bce Disable tests with incx,incy=0 (undefined behavior) 2025-02-09 20:17:29 +01:00
Martin Kroeker
3a4a9b21eb Disable tests with incx,incy=0 (undefined behavior) 2025-02-09 20:16:03 +01:00
Martin Kroeker
60d0be0e97 Update nrm2.c 2025-02-08 23:42:21 +01:00
Martin Kroeker
0fd5448b2c Handle INCX=0 2025-02-08 19:33:05 +01:00
Martin Kroeker
1b85b6a396 Merge pull request #5108 from taoye9/sbgemm_neoversev1
Add SBGEMM for arm neoversev1
2025-02-07 20:30:41 +01:00
Martin Kroeker
cae480683a Merge pull request #5113 from martin-frbg/issue5112
Ensure that GEMMTR name appears in XERBLA if GEMMT was called as such
2025-02-07 09:37:53 +01:00
Martin Kroeker
db7e5f1fa7 Update gemmt.c 2025-02-06 21:26:20 +01:00
Martin Kroeker
ff30ac9666 Update Makefile 2025-02-06 19:51:23 +01:00
Martin Kroeker
7c3e169b67 Update gemmt.c 2025-02-06 19:21:08 +01:00
Martin Kroeker
09414a4187 Ensure that GEMMTR name appears in XERBLA if gemmt was called as such 2025-02-06 18:52:00 +01:00
Ye Tao
c748e6a338 optimized sbgemm kernel for neoverse-v1 (sve-256)
Signed-off-by: Ye Tao <ye.tao@arm.com>
2025-02-05 10:06:37 +00:00
Aditya Tewari
4379a6fbe3 * checkpoint sbgemm for SVE-256 2025-02-03 12:49:49 +00:00
Martin Kroeker
c139b63342 Merge pull request #5107 from jhgit/develop
fix signedness of pointer to integer type passed to blas_lock()
2025-02-02 08:12:45 +01:00
John Hein
6cd9bbe531 fix signedness of pointer to integer type passed to blas_lock() 2025-02-01 17:22:57 -07:00
Martin Kroeker
5de5072940 Improve flang-new identification and add CI job for it on OSX-x86_64 (#5103)
* AzureCI: Add LLVM/flang-new build on OSX-x86_64
* distinguish classic flang from flang-new in name based recognition
2025-01-30 16:55:26 +01:00
Martin Kroeker
1f74fb9a07 Merge pull request #5101 from martin-frbg/issue5100
Fix CMake build for PPCG4 breaking due to unparsable KERNEL file
2025-01-27 13:19:37 +01:00
Martin Kroeker
d7036cfd74 Remove trailing blanks that break the cmake parser 2025-01-27 09:32:17 +01:00
Martin Kroeker
3375a0c990 Merge pull request #5099 from martin-frbg/issue5097-2
Simplify build instructions for Windows on Arm
2025-01-26 21:00:28 +01:00
Martin Kroeker
7a27e2b00d Simplify build instructions for Windows on Arm 2025-01-26 18:36:58 +01:00
Martin Kroeker
fdeac17237 Merge pull request #5098 from martin-frbg/issue5095
Fix compilation with BUILD_BFLOAT16 enabled
2025-01-25 19:32:19 +01:00
Martin Kroeker
1829ac5b44 Add (dummy) declaration of SBROT_M 2025-01-25 17:32:11 +01:00
Martin Kroeker
53d20a83f3 Merge pull request #5089 from annop-w/gemv_t
Simplify gemv_t_sve_v1x3 kernel
2025-01-25 14:17:26 +01:00
Martin Kroeker
6e393a5599 Merge branch 'develop' into gemv_t 2025-01-25 12:54:04 +01:00
Martin Kroeker
9b11fd5802 Merge pull request #5088 from michalowski-arm/develop
Add thread throttling profile for SGEMV on `NEOVERSEV1`
2025-01-24 21:54:45 +01:00
Martin Kroeker
5930c162ef Merge pull request #5097 from matthew-brett/fix-woa-cmd
Fix Windows on ARM build instructions
2025-01-24 15:31:15 +01:00
Marek Michalowski
838bb57e27 Merge branch 'develop' into develop 2025-01-24 14:19:35 +00:00
Matthew Brett
252c43265d Fix Windows on ARM build instructions
The command as merged uses the compiler target as the compiler path.

I have run and tested a build with this command.

@Mugundanmcw - is this correct?
2025-01-24 12:58:20 +00:00
Martin Kroeker
876ba58e28 Merge pull request #5091 from goplanid/develop
Small gemm kernel improvements for AArch64
2025-01-24 10:59:16 +01:00
Martin Kroeker
a54f9a9c69 Merge pull request #5071 from annop-w/sgemm_throttling
Add thread throttling profile for SGEMM on NEOVERSEV1
2025-01-23 22:42:12 +01:00
Martin Kroeker
9f2319b46d Merge pull request #5094 from martin-frbg/issue5093
Fix "make install" operation when CPP_THREAD_SAFETY_TEST is selected
2025-01-23 19:50:27 +01:00
Martin Kroeker
9faebb3c97 fix lost indentation in the rules for the thread safety test 2025-01-23 17:59:45 +01:00
Martin Kroeker
262018f14c Merge pull request #5092 from XiWeiGu/la64_fixed_cmake
LoongArch64: Fixed cmake
2025-01-23 13:54:27 +01:00
Martin Kroeker
180ba5e7d0 Merge pull request #5069 from tingboliao/dev_rotm_20250107
Further rearranged the rotm kernel for the different architectures.
2025-01-23 10:16:43 +01:00
gxw
1ebcbdbab3 LoongArch64: Fixed the issue of using the old-style TARGET in cmake builds 2025-01-23 09:08:42 +00:00
Deeksha Goplani
d1bfa979f7 small gemm kernel packing modifications 2025-01-23 09:41:45 +05:30
Martin Kroeker
1a6a9fb22f add another generator line for rotm 2025-01-23 00:17:04 +01:00
Martin Kroeker
518e376820 Merge pull request #5090 from martin-frbg/cmakeutils
Fix CMake interpretation of KERNEL file variables relevant to WoA
2025-01-22 21:21:38 +01:00
Martin Kroeker
111c9b0733 Add translations for C_COMPILER and OSNAME 2025-01-22 19:51:43 +01:00
Martin Kroeker
4924319c50 fix position of srotm, qrotm 2025-01-22 16:07:35 +01:00
Martin Kroeker
b58cba9eb6 fix qrotm build rules 2025-01-22 15:51:49 +01:00
Marek Michalowski
4d5b13f765 Add thread throttling profile for SGEMV on NEOVERSEV1 2025-01-22 10:50:04 +00:00
tingbo.liao
3c8df6358f Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
2025-01-22 11:41:12 +08:00
Annop Wongwathanarat
c0318cea6e Simplify gemv_t_sve_v1x3 kernel 2025-01-21 13:40:17 +00:00
Martin Kroeker
76db346f7e Merge pull request #5082 from martin-frbg/woa_cpuid
Get ARM64 TARGET information from the registry on Windows
2025-01-18 20:28:31 +01:00
Martin Kroeker
5f7b03a441 Merge pull request #5083 from martin-frbg/fixmips64ci
MIPS64 CI :fix breakage from inadvertent line join in yml file
2025-01-18 20:27:37 +01:00