mirror of
https://github.com/OpenMathLib/OpenBLAS
synced 2026-06-08 01:15:39 +08:00
Update the Changelog for version 0.3.31
This commit is contained in:
116
Changelog.txt
116
Changelog.txt
@@ -1,4 +1,120 @@
|
||||
OpenBLAS ChangeLog
|
||||
====================================================================
|
||||
Version 0.3.31
|
||||
15-Jan-2025
|
||||
|
||||
general:
|
||||
- reverted a matrix partitioning optimization from 0.3.30 that could lead to
|
||||
race conditions and subsequent invalid results in GEMM
|
||||
- added the bfloat16 extensions BGEMM and BGEMV
|
||||
- added a BLAS interface for the ?GEMM_BATCH extensions
|
||||
- added the BLAS extensions ?GEMM_BATCH_STRIDED and their CBLAS interface
|
||||
- added the basic infrastructure for half-precision float (FP16) format
|
||||
using SH prefix
|
||||
- reimplemented the LAPACK SLAED3/DLAED3 function using multithreading, thereby
|
||||
improving the performance of the SSYEVD/DSYEVD eigensolver for symmetric matrices
|
||||
on all platforms
|
||||
- limited the number of retries for initial memory allocation to avoid infinite
|
||||
hanging on low-memory systems
|
||||
- fixed a thread lockup situation encountered with python 3.9 or older and numpy
|
||||
- introduced a problem size threshold for multithreading in STRMV/DTRMV
|
||||
- introduced a problem size threshold for multithreading in CHER/CHER2/CHPR/CHPR2
|
||||
and ZHER/ZHER2/ZHPR/ZHPR2
|
||||
- improved the problem size thresholds for multithreading in SGER/DGER
|
||||
- improved autodetection of the Fortran compiler
|
||||
- fixed passing of the INTERFACE64=1 option to the flang-new compiler
|
||||
- fixed a potential deadlock in multithreaded code after calling fork()
|
||||
- fixed builds using CMake on FreeBSD
|
||||
- fixed builds using CMake from within Cygwin on Windows
|
||||
- fixed builds using CMake and the NVHPC compiler on ARM64
|
||||
- fixed CMake build error from misdetecting compiler or OpenMP versions
|
||||
- improved contents of the CMake-generated OpenBLASConfig.cmake file
|
||||
- added support for cross-compilation to RISCV targets via CMake
|
||||
- fixed cross-compilation to x86 targets from non-x86 architectures
|
||||
- fixed failure to install cblas.h if NO_CBLAS=0 was specified
|
||||
- fixed missing user-defined pre- and postfixes on functions in lapack.h,lapacke.h
|
||||
- included fixes from the Reference-LAPACK project:
|
||||
- fix ordering bug in ?LAED/?LASD (Reference-LAPACK PR 1140)
|
||||
- revert changes in ?GEEV from PR 1129 (Reference-LAPACK PR 1142)
|
||||
- fix workspace allocation in LAPACKE_?TRSEN (Reference-LAPACK PR 1144)
|
||||
|
||||
riscv:
|
||||
- added optimized SBGEMM kernels for ZVL128B and ZVL256B targets
|
||||
- added optimized SHGEMM kernels for ZVL128B and ZVL256B targets
|
||||
- added optimized SBGEMV and SHGEMV kernels for ZVL128B/ZVL256B
|
||||
- improved performance of the GEMV kernel for ZVL256B
|
||||
- improved the performance of the CROT and ZROT kernels for ZVL128B and x280
|
||||
- improved the detection of RVV1.0 capability
|
||||
- improved performance of the matrix packing helper functions for ZVL128B and ZVL256B
|
||||
- improved performance of OMATCOPY for ZVL128B and ZVL256B
|
||||
|
||||
arm:
|
||||
- fixed spurious executable stack in the getarch utility
|
||||
|
||||
arm64:
|
||||
- fixed spurious executable stack in the getarch utility
|
||||
- fixed compiler warnings arising from the timer macro RPCC
|
||||
- fixed cache size detection for Qualcomm Oryon under Windows on Arm
|
||||
- fixed argument handling in the default SVE kernel for SDOT/DDOT
|
||||
- building the BFLOAT16 kernels is now enabled by default
|
||||
- improved the overall performance of GEMM,SYMM and HEMM on A64FX
|
||||
- improved the performance of SDOT/DDOT on A64FX
|
||||
- improved the multithreading performance of SDOT/DDOT on A64FX by
|
||||
introduction of a throttling table matching thread count to problem size
|
||||
- improved the performance of SGER/DGER on A64FX and NEOVERSEV1
|
||||
- improved the multithreading performance of GEMM on A64FX and NEOVERSEV1
|
||||
- improved the performance of the GEMV kernel for SVE-capable targets
|
||||
- improved the multithreading performance of SGEMM on NEOVERSEV1 and V2
|
||||
- added optimized SAXPY/DAXPY SVE kernels for A64FX and NEOVERSEV1
|
||||
- added optimized BGEMM and BGEMV kernels for NEOVERSEV1
|
||||
- added an optimized BGEMM kernel for NEOVERSEN2
|
||||
- added support for the NEOVERSEV2 cpu
|
||||
- added dedicated support for the Apple M4 cpu as VORTEXM4
|
||||
- added optimized SGEMM/SSYMM/STRMM/SSYRK/SSYR2K for SME-capable targets
|
||||
(ARMV9SME and VORTEXM4)
|
||||
- improved the precision of the SNRM2 kernel
|
||||
- added cpu autodetection and compiler settings for Ampere One processors
|
||||
- fixed cpu autodetection for Apple M systems running Linux
|
||||
- fixed building on MacOS with AppleClang,gfortran and xcode v16 or newer
|
||||
- fixed several errors in the C code replacements for the complex and double
|
||||
precision complex LAPACK functions that get used (only) when compiling with
|
||||
Microsoft C and NOFORTRAN=1 under MS Windows
|
||||
|
||||
power:
|
||||
- added initial support for the POWER11 architecture
|
||||
- improved performance of DGEMM and DGEMV on POWER10
|
||||
- fixed the default compiler flags to use "-O3" instead of the possibly unsafe
|
||||
"-Ofast"
|
||||
- fixed building under MacOS (for old G4 Macs) with CMake
|
||||
- fixed potential miscompilation of DGEMV and other assembly kernels by gcc15.1
|
||||
- fixed compilation with recent versions of flang
|
||||
|
||||
loongarch64:
|
||||
- fixed warnings and potential inaccuracies arising from incorrect saving of registers
|
||||
- fixed enumeration of logical cores on big NUMA servers
|
||||
- fixed building with LLVM and the INTERFACE64=1 option
|
||||
|
||||
x86:
|
||||
- fixed building the GEMM3M kernels for the GENERIC target
|
||||
- fixed several errors in the C code replacements for the complex and double
|
||||
precision complex LAPACK functions that get used (only) when compiling with
|
||||
Microsoft C and NOFORTRAN=1 under MS Windows
|
||||
|
||||
x86_64:
|
||||
- added cpu autodetection for Intel Lunar Lake (Core Ultra 200V)
|
||||
- changed all ?MIN and ?MAX assembly kernels to use unaligned operations
|
||||
- fixed several errors in the C code replacements for the complex and double
|
||||
precision complex LAPACK functions that get used (only) when compiling with
|
||||
Microsoft C and NOFORTRAN=1 under MS Windows
|
||||
- fixed potential crashes in builds for Cooper Lake, Sapphire Rapids or Zen5 cpus
|
||||
under MS Windows
|
||||
|
||||
zarch:
|
||||
- added support for building with CMake
|
||||
|
||||
sparc:
|
||||
- fixed a potential crash in the DNRM2 kernel
|
||||
|
||||
====================================================================
|
||||
Version 0.3.30
|
||||
19-Jun-2025
|
||||
|
||||
Reference in New Issue
Block a user