summaryrefslogtreecommitdiff
path: root/Documentation/design-docs
diff options
context:
space:
mode:
authorJiyoung Yun <jy910.yun@samsung.com>2017-06-13 18:47:36 +0900
committerJiyoung Yun <jy910.yun@samsung.com>2017-06-13 18:47:36 +0900
commit61d6a817e39d3bae0f47dbc09838d51db22a5d30 (patch)
treecb37caa1784bc738b976273335d6ed04a7cc80b0 /Documentation/design-docs
parent5b975f8233e8c8d17b215372f89ca713b45d6a0b (diff)
downloadcoreclr-61d6a817e39d3bae0f47dbc09838d51db22a5d30.tar.gz
coreclr-61d6a817e39d3bae0f47dbc09838d51db22a5d30.tar.bz2
coreclr-61d6a817e39d3bae0f47dbc09838d51db22a5d30.zip
Imported Upstream version 2.0.0.11992upstream/2.0.0.11992
Diffstat (limited to 'Documentation/design-docs')
-rw-r--r--Documentation/design-docs/eh-writethru.md100
-rw-r--r--Documentation/design-docs/first-class-structs.md2
2 files changed, 90 insertions, 12 deletions
diff --git a/Documentation/design-docs/eh-writethru.md b/Documentation/design-docs/eh-writethru.md
index 0afa5a7688..311ca2cd91 100644
--- a/Documentation/design-docs/eh-writethru.md
+++ b/Documentation/design-docs/eh-writethru.md
@@ -1,36 +1,113 @@
# Exception Handling Write Through Optimization.
-Write through is an optimization done on local variables that live across exception handling flow like a handler, filter, or finally so that they can be enregistered - treated as a register candidate - throughout a method. For each variable live across one of these constructs, the minimum requirement is that a store to the variables location on the stack is placed between a reaching definition and any point of control flow leading to the handler, as well as a load between any return from a filter or finally and an upward exposed use. Conceptually this maintains the value of the variable on the stack across the exceptional flow which would kill any live registers. This transformation splits a local variable into multiple enregisterable compiler temporaries backed by the local variable on the stack. For local vars that additionally have appearances within a eh construct, a load from the stack local is inserted to a temp that will be enregistered within the handler.
+Write through is an optimization done on local variables that live across
+exception handling flow like a handler, filter, or finally so that they can be
+enregistered - treated as a register candidate - throughout a method. For each
+variable live across one of these constructs, the minimum requirement is that a
+store to the variables location on the stack is placed between a reaching
+definition and any point of control flow leading to the handler, as well as a
+load between any return from a filter or finally and an upward exposed use.
+Conceptually this maintains the value of the variable on the stack across the
+exceptional flow which would kill any live registers. This transformation splits
+a local variable into an enregisterable compiler temporary backed by
+the local variable on the stack. For local vars that additionally have
+appearances within an eh construct, a load from the stack local is inserted to
+a temp that will be enregistered within the handler.
## Motivation
-Historically the JIT has not done this transformation because exception handling was rare and thus the transformation was not worth the compile time. Additionally it was easy to make the recomendation to users to remove EH from performance critical methods since they had control of where the EH appeared. Neither of these points remain true as we increase our focus on cloud workloads. The use of non-blocking async calls are common in performance critical paths for these workloads and async injects exception handling constructs to implement the feature. This in combination with the long standing use of EH in 'foreach' and 'using' statements means that we are seeing EH constructs that are difficult for the user to manage or remove high in the profile (Techempower on Kestrel is a good example). Given these cloud workloads doing the transformation would be a clear benefit.
+Historically the JIT has not done this transformation because exception
+handling was rare and thus the transformation was not worth the compile time.
+Additionally it was easy to make the recomendation to users to remove EH from
+performance critical methods since they had control of where the EH appeared.
+Neither of these points remain true as we increase our focus on cloud
+workloads. The use of non-blocking async calls are common in performance
+critical paths for these workloads and async injects exception handling
+constructs to implement the feature. This in combination with the long
+standing use of EH in 'foreach' and 'using' statements means that we are seeing
+EH constructs that are difficult for the user to manage or remove high in the
+profile (Techempower on Kestrel is a good example). It's also good to consider
+that in MSIL, basic operations can raise semantically meaningful exceptions
+(unlike say C++, where an explicit throw is required to raise an exception) so
+injected handlers can end up pessimizing a number of local variables in the
+method. Given this combination of issues in cloud workloads doing the
+transformation should be a clear benefit.
## Design
-The goal of the design is to preserve the constraints listed above - i.e. preserve a correct value on the stack for any local var that crosses an EH edge in the flow graph. To ensure that the broad set of global optimizations can act on the IR shape produced by this transformation and that phase ordering issues do not block enregistration opportunities the write through phase will be staged just prior to SSA build after morph and it will do a full walk of the IR rewriting appearances to proxies as well as inserting reloads at the appropriate blocks in the flow graph as indicated by EH control flow semantics. To preserve the needed values on the stack a store will also be inserted after every definition to copy the new value in the proxy back to the stack location. This will leave non optimal number of stores (too many) but with the strategy that the more expensive analysis to eliminate/better place stores will be staged as a global optimization in a higher compilation tier.
+The goal of the design is to preserve the constraints listed above - i.e.
+preserve a correct value on the stack for any local var that crosses an EH edge
+in the flow graph. To ensure that the broad set of global optimizations can act
+on the IR shape produced by this transformation and that phase ordering issues
+do not block enregistration opportunities the write through phase will be
+staged just prior to SSA build after morph and it will do a full walk of the
+IR rewriting appearances to proxies as well as inserting reloads at the
+appropriate blocks in the flow graph as indicated by EH control flow semantics.
+To preserve the needed values on the stack a store will also be inserted after
+every definition to copy the new value in the proxy back to the stack location.
+This will leave non optimal number of stores (too many) but with the strategy
+that the more expensive analysis to eliminate/better place stores will be
+staged as a global optimization in a higher compilation tier.
+
+There are a number of wrinkles informing this design based on how the JIT models EH:
+- The jit does not explicitly model the exception flow, so a given block and
+ even a given statement within a block may have multiple exception-raising sites.
+- For statements within protected regions, and for all variables live into any
+ reachable handler, the jit assumes all definitions within the region can
+ potentially reach uses in the handlers, since the exact interleaving of
+ definition points and exception points is not known. Hence every definition
+ is a reaching definition, even both values back from to back stores with no
+ read of the variable in between.
+- The jit does not model which handlers are reachable from a given protected region,
+ so considers a variable live into a handler if it is live into any handler in the method.
+
+It is posible to do better than the "store every definition" approch outlined
+in the design, but the expectation is that this would require posibly
+modifying the model in the JIT and staging more throughput intensive analyses.
+With these considerations this design was selected and further improvements
+left to future optimization.
### Throughput
-To identify EH crossing local vars global liveness is necessary. This comes at the significant cost of the liveness analysis. To mitigate this the write through phase is staged immediately before SSA build for the global optimizer. Since the typical case is that there is no EH, the liveness analysis in write through can be reused directly by SSA build. For the case where EH local vars are present liveness today must be rebuilt for SSA since new local vars have been added, but incremental update to the RyuJIT liveness analysis can be implemented (worklist based live analysis) to improve the throughput. Additionally the write through transformation does a full IR walk - also expensive - to replace EH local var appearances with proxies and insert transfers to and from the stack for EH flow, given this initial implementations may need to be staged as part of AOT (crossgen) compiles until tiering can move the more expensive analysis out of the startup path.
+To identify EH crossing local vars global liveness is necessary. This comes at
+the significant cost of the liveness analysis. To mitigate this the write
+through phase is staged immediately before SSA build for the global optimizer.
+Since the typical case is that there is no EH, the liveness analysis in write
+through can be reused directly by SSA build. For the case where EH local vars
+are present liveness today must be rebuilt for SSA since new local vars have
+been added, but incremental update to the RyuJIT liveness analysis can be
+implemented (worklist based live analysis) to improve the throughput.
+Additionally the write through transformation does a full IR walk - also
+expensive - to replace EH local var appearances with proxies and insert
+transfers to and from the stack for EH flow, given this initial implementations
+may need to be staged as part of AOT (crossgen) compiles until tiering can move
+the more expensive analysis out of the startup path.
### Algorithm
+
On the IR directly before SSA build:
-- Run global liveness to identify local vars that cross EH boundaries (as a byproduct of this these local vars are marked "do not enregister")
+- Run global liveness to identify local vars that cross EH boundaries (as a
+ byproduct of this these local vars are marked "do not enregister")
- Foreach EH local var create a new local var "proxy" that can be enregisterd.
- Iterate each block in the flow graph doing the following:
* Foreach tree in block do a post order traversal and
- Replace all appearances of EH local vars with the defined proxy
- Insert a copy of proxy definition back to the EH local var (on the stack)
- * If EH handler entry block insert reloads from EH local var to proxy at block head
- * If finally or filter exit, insert reloads from EH local var to proxy at successor block heads
-- For method entry block, insert reloads from parameter EH local vars to proxies
+ * If EH handler entry block insert reloads from EH local var to proxy at
+ block head
+ * If finally or filter exit, insert reloads from EH local var to proxy at
+ successor block heads
+- For method entry block, insert reloads from parameter EH local vars to
+ proxies
-At end no proxy should be live across EH flow and all value updates will be written back to the stack location.
+At end no proxy should be live across EH flow and all value updates will be
+written back to the stack location.
## Next steps
-The initial prototype that produced the example bellow is currently being improved to make it production ready. At the same time a more extensive suite of example tests are being developed.
+The initial prototype that produced the example bellow is currently being
+improved to make it production ready. At the same time a more extensive suite
+of example tests are being developed.
- [X] Proof of concept prototype.
- [ ] Production implementation of WriteThru phase.
@@ -42,7 +119,8 @@ The initial prototype that produced the example bellow is currently being improv
## Example
-The following is a simple example that shows enregistration for a local var live, and modified, through a catch.
+The following is a simple example that shows enregistration for a local var
+live, and modified, through a catch.
#### Source code snippet
diff --git a/Documentation/design-docs/first-class-structs.md b/Documentation/design-docs/first-class-structs.md
index fd6a3762c4..f7bd9b6fb4 100644
--- a/Documentation/design-docs/first-class-structs.md
+++ b/Documentation/design-docs/first-class-structs.md
@@ -21,7 +21,7 @@ Struct-Related Issues in RyuJIT
The following issues illustrate some of the motivation for improving the handling of value types
(structs) in RyuJIT:
-* VSO Bug 98404: .NET JIT x86 - poor code generated for value type initialization
+* [\#11407 [RyuJIT] Fully enregister structs that fit into a single register when profitable](https://github.com/dotnet/coreclr/issues/11407), also VSO Bug 98404: .NET JIT x86 - poor code generated for value type initialization
* This is a simple test case that should generate simply `xor eax; ret` on x86 and x64, but
instead generates many unnecessary copies. It is addressed by full enregistration of
structs that fit into a register: