diff options
author | Joseph Tremoulet <jotrem@microsoft.com> | 2017-07-20 13:28:53 -0400 |
---|---|---|
committer | Joseph Tremoulet <JCTremoulet@gmail.com> | 2017-07-31 15:52:35 -0400 |
commit | 6b38dca32dee8321dafab8be92366d17da2b8bec (patch) | |
tree | 57e9afbf7d90e272a6211b4ad3f039fc0aeed22d /Documentation/performance/JitOptimizerTodoAssessment.md | |
parent | f17fae2a1aa1bcc312cf15d6857d30cfef00c2d0 (diff) | |
download | coreclr-6b38dca32dee8321dafab8be92366d17da2b8bec.tar.gz coreclr-6b38dca32dee8321dafab8be92366d17da2b8bec.tar.bz2 coreclr-6b38dca32dee8321dafab8be92366d17da2b8bec.zip |
Add documents about JIT optimization planning
This change adds two documents:
- JitOptimizerPlanningGuide.md discusses how we can/do/should go about
identifying, prioritizing, and validating optimization improvement
opportunities, as well as several ideas for how we might improve the
process.
- JitOptimizerTodoAssessment.md lists several potential optimization
improvements that always come up in planning discussions, with brief
notes about each, to capture current thinking.
Diffstat (limited to 'Documentation/performance/JitOptimizerTodoAssessment.md')
-rw-r--r-- | Documentation/performance/JitOptimizerTodoAssessment.md | 134 |
1 files changed, 134 insertions, 0 deletions
diff --git a/Documentation/performance/JitOptimizerTodoAssessment.md b/Documentation/performance/JitOptimizerTodoAssessment.md new file mode 100644 index 0000000000..7d53dab5f5 --- /dev/null +++ b/Documentation/performance/JitOptimizerTodoAssessment.md @@ -0,0 +1,134 @@ +Optimizer Codebase Status/Investments +===================================== + +There are a number of areas in the optimizer that we know we would invest in +improving if resources were unlimited. This document lists them and some +thoughts about their current state and prioritization, in an effort to capture +the thinking about them that comes up in planning discussions. + + +Improved Struct Handling +------------------------ + +This is an area that has received recent attention, with the [first-class structs](https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/first-class-structs.md) +work and the struct promotion improvements that went in for `Span<T>`. Work here +is expected to continue and can happen incrementally. Possible next steps: + + - Struct promotion stress mode (test mode to improve robustness/reliability) + - Promotion of more structs; relax limits on e.g. field count (should generally + help performance-sensitive code where structs are increasingly used to avoid + heap allocations) + - Improve handling of System V struct passing (I think we currently insert + some unnecessary round-trips through memory at call boundaries due to + internal representation issues) + - Implicit byref parameter promotion w/o shadow copy + +We don't have specific benchmarks that we know would jump in response to any of +these. May well be able to find some with some looking, though this may be an +area where current performance-sensitive code avoids structs. + + +Exception handling +------------------ + +This is increasingly important as C# language constructs like async/await and +certain `foreach` incantations are implemented with EH constructs, making them +difficult to avoid at source level. The recent work on finally cloning, empty +finally removal, and empty try removal targeted this. [Writethrough](https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/eh-writethru.md) +is another key optimization enabler here, and we are actively pursuing it. Other +things we've discussed include inlining methods with EH and computing funclet +callee-save register usage independently of main function callee-save register +usage, but I don't think we have any particular data pointing to either as a +high priority. + + +Loop Optimizations +------------------ + +We haven't been targeting benchmarks that spend a lot of time doing compuations +in an inner loop. Pursuing loop optimizations for the peanut butter effect +would seem odd. So this simply hasn't bubbled up in priority yet, though it's +bound to eventually. + + +More Expression Optimizations +----------------------------- + +We again don't have particular benchmarks pointing to key missing cases, and +balancing the CQ vs TP will be delicate here, so it would really help to have +an appropriate benchmark suite to evaluate this work against. + + +Forward Substitution +-------------------- + +This too needs an appropriate benchmark suite that I don't think we have at +this time. The tradeoffs against register pressure increase and throughput +need to be evaluated. This also might make more sense to do if/when we can +handle SSA renames. + + +Value Number Conservativism +--------------------------- + +We have some frustrating phase-ordering issues resulting from this, but the +opt-repeat experiment indicated that they're not prevalent enough to merit +pursuing changing this right now. Also, using SSA def as the proxy for value +number would require handling SSA renaming, so there's a big dependency chained +to this. +Maybe it's worth reconsidering the priority based on throughput? + + +High Tier Optimizations +----------------------- + +We don't have that many knobs we can "crank up" (though we do have the tracked +assertion count and could switch inliner policies), nor do we have any sort of +benchmarking story set up to validate whether tiering changes are helping or +hurting. We should get that benchmarking story sorted out and at least hook +up those two knobs. + + +Low Tier Back-Off +----------------- + +We have some changes we know we want to make here: morph does more than it needs +to in minopts, and tier 0 should be doing throughput-improving inlines, as +opposed to minopts which does no inlining. It would be nice to have the +benchmarking story set up to measure the effect of such changes when they go in, +we should do that. + + +Async +----- + +We've made note of the prevalence of async/await in modern code (and particularly +in web server code such as TechEmpower), and have some opportunities listed in +[#7914](https://github.com/dotnet/coreclr/issues/7914). Some sort of study of +async peanut butter to find more opportunities is probably in order, but what +would that look like? + + +Address Mode Building +--------------------- + +One opportunity that's frequently visible in asm dumps is that more address +expressions could be folded into memory operands' address expressions. This +would likely give a measurable codesize win. Needs some thought about where +to run in phase list and how aggressive to be about e.g. analyzing across +statements. + + +If-Conversion (cmov formation) +------------------------------ + +This hits big in microbenchmarks where it hits. There's some work in flight +on this (see #7447 and #10861). + + +Mulshift +-------- + +Replacing multiplication by constants with shift/add/lea sequences is a +classic optimization that keeps coming up in planning. An [analysis](https://gist.github.com/JosephTremoulet/c1246b17ea2803e93e203b9969ee5a25#file-mulshift-md) +indicates that RyuJIT is already capitalizing on most of the opportunity here. |