summaryrefslogtreecommitdiff
path: root/Makefile.arm64
diff options
context:
space:
mode:
authorRenato Golin <rengolin@systemcall.eu>2018-11-16 15:45:12 +0000
committerRenato Golin <rengolin@systemcall.eu>2018-11-19 16:41:49 +0000
commit310ea55f29f16771438386fb2f1f140e2fd7e397 (patch)
treec7b9016770bf4811a96cd77a6b24d2d21f571cf7 /Makefile.arm64
parent368d14f8c8b2eb2916d7cd6765f40c5aa31e2184 (diff)
downloadopenblas-310ea55f29f16771438386fb2f1f140e2fd7e397.tar.gz
openblas-310ea55f29f16771438386fb2f1f140e2fd7e397.tar.bz2
openblas-310ea55f29f16771438386fb2f1f140e2fd7e397.zip
Simplifying ARMv8 build parameters
ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
Diffstat (limited to 'Makefile.arm64')
-rw-r--r--Makefile.arm6433
1 files changed, 24 insertions, 9 deletions
diff --git a/Makefile.arm64 b/Makefile.arm64
index d19e796a5..a529fab80 100644
--- a/Makefile.arm64
+++ b/Makefile.arm64
@@ -4,22 +4,37 @@ CCOMMON_OPT += -march=armv8-a
FCOMMON_OPT += -march=armv8-a
endif
+ifeq ($(CORE), CORTEXA53)
+CCOMMON_OPT += -march=armv8-a -mtune=cortex-a53
+FCOMMON_OPT += -march=armv8-a -mtune=cortex-a53
+endif
+
ifeq ($(CORE), CORTEXA57)
-CCOMMON_OPT += -march=armv8-a+crc+crypto+fp+simd -mtune=cortex-a57
-FCOMMON_OPT += -march=armv8-a+crc+crypto+fp+simd -mtune=cortex-a57
+CCOMMON_OPT += -march=armv8-a -mtune=cortex-a57
+FCOMMON_OPT += -march=armv8-a -mtune=cortex-a57
+endif
+
+ifeq ($(CORE), CORTEXA72)
+CCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
+FCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
endif
-ifeq ($(CORE), VULCAN)
-CCOMMON_OPT += -mtune=vulcan -mcpu=vulcan
-FCOMMON_OPT += -mtune=vulcan -mcpu=vulcan
+ifeq ($(CORE), CORTEXA73)
+CCOMMON_OPT += -march=armv8-a -mtune=cortex-a73
+FCOMMON_OPT += -march=armv8-a -mtune=cortex-a73
endif
ifeq ($(CORE), THUNDERX)
-CCOMMON_OPT += -mtune=thunderx -mcpu=thunderx
-FCOMMON_OPT += -mtune=thunderx -mcpu=thunderx
+CCOMMON_OPT += -march=armv8-a -mtune=thunderx
+FCOMMON_OPT += -march=armv8-a -mtune=thunderx
+endif
+
+ifeq ($(CORE), FALKOR)
+CCOMMON_OPT += -march=armv8.1-a -mtune=falkor
+FCOMMON_OPT += -march=armv8.1-a -mtune=falkor
endif
ifeq ($(CORE), THUNDERX2T99)
-CCOMMON_OPT += -mtune=thunderx2t99 -mcpu=thunderx2t99
-FCOMMON_OPT += -mtune=thunderx2t99 -mcpu=thunderx2t99
+CCOMMON_OPT += -march=armv8.1-a -mtune=thunderx2t99
+FCOMMON_OPT += -march=armv8.1-a -mtune=thunderx2t99
endif