summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorwmi <wmi@google.com>2017-08-21 15:33:39 -0700
committerVictor Costan <pwnall@chromium.org>2017-08-24 16:54:12 -0700
commit824e6718b5b5a50d32a89124853da0a11828b25c (patch)
treec5dc711af74746cfd3f0240c96bb989bb592881d
parent55924d11095df25ab25c405fadfe93d0a46f82eb (diff)
downloadsnappy-824e6718b5b5a50d32a89124853da0a11828b25c.tar.gz
snappy-824e6718b5b5a50d32a89124853da0a11828b25c.tar.bz2
snappy-824e6718b5b5a50d32a89124853da0a11828b25c.zip
Add a loop alignment directive to work around a performance regression.
We found LLVM upstream change at rL310792 degraded zippy benchmark by ~3%. Performance analysis showed the regression was caused by some side-effect. The incidental loop alignment change (from 32 bytes to 16 bytes) led to increase of branch miss prediction and caused the regression. The regression was reproducible on several intel micro-architectures, like sandybridge, haswell and skylake. Sadly we still don't have good understanding about the internal of intel branch predictor and cannot explain how the branch miss prediction increases when the loop alignment changes, so we cannot make a real fix here. The workaround solution in the patch is to add a directive, align the hot loop to 32 bytes, which can restore the performance. This is in order to unblock the flip of default compiler to LLVM.
-rw-r--r--snappy.cc7
1 files changed, 7 insertions, 0 deletions
diff --git a/snappy.cc b/snappy.cc
index 23f948f..fd519e5 100644
--- a/snappy.cc
+++ b/snappy.cc
@@ -685,6 +685,13 @@ class SnappyDecompressor {
}
MAYBE_REFILL();
+ // Add loop alignment directive. Without this directive, we observed
+ // significant performance degradation on several intel architectures
+ // in snappy benchmark built with LLVM. The degradation was caused by
+ // increased branch miss prediction.
+#if defined(__clang__) && defined(__x86_64__)
+ asm volatile (".p2align 5");
+#endif
for ( ;; ) {
const unsigned char c = *(reinterpret_cast<const unsigned char*>(ip++));