platform/upstream/openblas - Domain: Machine Learning / ML Framework; Licenses: BSD-3-Clause;

Age	Commit message (Collapse)	Author	Files	Lines
2020-05-12	s390x: Add vectorized sgemm kernel for Z14 and newer	Marius Hillenbrand	1	-1/+1
	Add a new GEMM kernel implementation to exploit the FP32 SIMD operations introduced with z14 and employ it for SGEMM on z14 and newer architectures. The SIMD extensions introduced with z13 support operations on double-sized scalars in vector registers. Thus, the existing SGEMM code would extend floats to doubles before operating on them. z14 extended SIMD support to operations on 32-bit floats. By employing these instructions, we can operate on twice the number of scalars per instruction (four floats in each vector registers) and avoid the conversion operations. The code is written in C with explicit vectorization. In experiments, this kernel improves performance on z14 and z15 by around 2x over the current implementation in assembly. The flexibilty of the C code paves the way for adjustments in subsequent commits. Tested via make -C test / ctest / utest and by a couple of additional unit tests that exercise blocking (e.g., partial register blocks with fewer than UNROLL_M rows and/or fewer than UNROLL_N columns). Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2018-08-06	[ZARCH] Z14 support, BLAS 1/2 single precision implementations, Some missing ↵	maamountki	1	-0/+4
	double precision implementations, Gemv optimization
2017-01-04	dtrmm and dgemm for z13	Abdurrauf	1	-2/+2

2016-04-15	Init IBM z system (s390x) porting.	Zhang Xianyi	1	-0/+6