summaryrefslogtreecommitdiff
path: root/Documentation/performance/JitOptimizerTodoAssessment.md
diff options
context:
space:
mode:
authorJoseph Tremoulet <jotrem@microsoft.com>2017-07-20 13:28:53 -0400
committerJoseph Tremoulet <JCTremoulet@gmail.com>2017-07-31 15:52:35 -0400
commit6b38dca32dee8321dafab8be92366d17da2b8bec (patch)
tree57e9afbf7d90e272a6211b4ad3f039fc0aeed22d /Documentation/performance/JitOptimizerTodoAssessment.md
parentf17fae2a1aa1bcc312cf15d6857d30cfef00c2d0 (diff)
downloadcoreclr-6b38dca32dee8321dafab8be92366d17da2b8bec.tar.gz
coreclr-6b38dca32dee8321dafab8be92366d17da2b8bec.tar.bz2
coreclr-6b38dca32dee8321dafab8be92366d17da2b8bec.zip
Add documents about JIT optimization planning
This change adds two documents: - JitOptimizerPlanningGuide.md discusses how we can/do/should go about identifying, prioritizing, and validating optimization improvement opportunities, as well as several ideas for how we might improve the process. - JitOptimizerTodoAssessment.md lists several potential optimization improvements that always come up in planning discussions, with brief notes about each, to capture current thinking.
Diffstat (limited to 'Documentation/performance/JitOptimizerTodoAssessment.md')
-rw-r--r--Documentation/performance/JitOptimizerTodoAssessment.md134
1 files changed, 134 insertions, 0 deletions
diff --git a/Documentation/performance/JitOptimizerTodoAssessment.md b/Documentation/performance/JitOptimizerTodoAssessment.md
new file mode 100644
index 0000000000..7d53dab5f5
--- /dev/null
+++ b/Documentation/performance/JitOptimizerTodoAssessment.md
@@ -0,0 +1,134 @@
+Optimizer Codebase Status/Investments
+=====================================
+
+There are a number of areas in the optimizer that we know we would invest in
+improving if resources were unlimited. This document lists them and some
+thoughts about their current state and prioritization, in an effort to capture
+the thinking about them that comes up in planning discussions.
+
+
+Improved Struct Handling
+------------------------
+
+This is an area that has received recent attention, with the [first-class structs](https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/first-class-structs.md)
+work and the struct promotion improvements that went in for `Span<T>`. Work here
+is expected to continue and can happen incrementally. Possible next steps:
+
+ - Struct promotion stress mode (test mode to improve robustness/reliability)
+ - Promotion of more structs; relax limits on e.g. field count (should generally
+ help performance-sensitive code where structs are increasingly used to avoid
+ heap allocations)
+ - Improve handling of System V struct passing (I think we currently insert
+ some unnecessary round-trips through memory at call boundaries due to
+ internal representation issues)
+ - Implicit byref parameter promotion w/o shadow copy
+
+We don't have specific benchmarks that we know would jump in response to any of
+these. May well be able to find some with some looking, though this may be an
+area where current performance-sensitive code avoids structs.
+
+
+Exception handling
+------------------
+
+This is increasingly important as C# language constructs like async/await and
+certain `foreach` incantations are implemented with EH constructs, making them
+difficult to avoid at source level. The recent work on finally cloning, empty
+finally removal, and empty try removal targeted this. [Writethrough](https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/eh-writethru.md)
+is another key optimization enabler here, and we are actively pursuing it. Other
+things we've discussed include inlining methods with EH and computing funclet
+callee-save register usage independently of main function callee-save register
+usage, but I don't think we have any particular data pointing to either as a
+high priority.
+
+
+Loop Optimizations
+------------------
+
+We haven't been targeting benchmarks that spend a lot of time doing compuations
+in an inner loop. Pursuing loop optimizations for the peanut butter effect
+would seem odd. So this simply hasn't bubbled up in priority yet, though it's
+bound to eventually.
+
+
+More Expression Optimizations
+-----------------------------
+
+We again don't have particular benchmarks pointing to key missing cases, and
+balancing the CQ vs TP will be delicate here, so it would really help to have
+an appropriate benchmark suite to evaluate this work against.
+
+
+Forward Substitution
+--------------------
+
+This too needs an appropriate benchmark suite that I don't think we have at
+this time. The tradeoffs against register pressure increase and throughput
+need to be evaluated. This also might make more sense to do if/when we can
+handle SSA renames.
+
+
+Value Number Conservativism
+---------------------------
+
+We have some frustrating phase-ordering issues resulting from this, but the
+opt-repeat experiment indicated that they're not prevalent enough to merit
+pursuing changing this right now. Also, using SSA def as the proxy for value
+number would require handling SSA renaming, so there's a big dependency chained
+to this.
+Maybe it's worth reconsidering the priority based on throughput?
+
+
+High Tier Optimizations
+-----------------------
+
+We don't have that many knobs we can "crank up" (though we do have the tracked
+assertion count and could switch inliner policies), nor do we have any sort of
+benchmarking story set up to validate whether tiering changes are helping or
+hurting. We should get that benchmarking story sorted out and at least hook
+up those two knobs.
+
+
+Low Tier Back-Off
+-----------------
+
+We have some changes we know we want to make here: morph does more than it needs
+to in minopts, and tier 0 should be doing throughput-improving inlines, as
+opposed to minopts which does no inlining. It would be nice to have the
+benchmarking story set up to measure the effect of such changes when they go in,
+we should do that.
+
+
+Async
+-----
+
+We've made note of the prevalence of async/await in modern code (and particularly
+in web server code such as TechEmpower), and have some opportunities listed in
+[#7914](https://github.com/dotnet/coreclr/issues/7914). Some sort of study of
+async peanut butter to find more opportunities is probably in order, but what
+would that look like?
+
+
+Address Mode Building
+---------------------
+
+One opportunity that's frequently visible in asm dumps is that more address
+expressions could be folded into memory operands' address expressions. This
+would likely give a measurable codesize win. Needs some thought about where
+to run in phase list and how aggressive to be about e.g. analyzing across
+statements.
+
+
+If-Conversion (cmov formation)
+------------------------------
+
+This hits big in microbenchmarks where it hits. There's some work in flight
+on this (see #7447 and #10861).
+
+
+Mulshift
+--------
+
+Replacing multiplication by constants with shift/add/lea sequences is a
+classic optimization that keeps coming up in planning. An [analysis](https://gist.github.com/JosephTremoulet/c1246b17ea2803e93e203b9969ee5a25#file-mulshift-md)
+indicates that RyuJIT is already capitalizing on most of the opportunity here.