summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorPat Gavlin <pgavlin@gmail.com>2017-07-12 08:14:05 -0700
committerGitHub <noreply@github.com>2017-07-12 08:14:04 -0700
commitb4479a3b6aa77261285debeb50d3a87b42e2082e (patch)
treecfab6ad7a1208ffd8f753e1c7bf6c053df2e19c5 /Documentation
parenteeaac091084e0e582a8a66b5bc0ca7e93115dc00 (diff)
parentf51adbae6eb691f622b45af21433f7a15eac57f0 (diff)
downloadcoreclr-b4479a3b6aa77261285debeb50d3a87b42e2082e.tar.gz
coreclr-b4479a3b6aa77261285debeb50d3a87b42e2082e.tar.bz2
coreclr-b4479a3b6aa77261285debeb50d3a87b42e2082e.zip
Merge pull request #12741 from pgavlin/IRDocs
Update the RyuJIT IR documentation.
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/botr/ryujit-overview.md244
1 files changed, 131 insertions, 113 deletions
diff --git a/Documentation/botr/ryujit-overview.md b/Documentation/botr/ryujit-overview.md
index ffbe350d50..6432d1c478 100644
--- a/Documentation/botr/ryujit-overview.md
+++ b/Documentation/botr/ryujit-overview.md
@@ -26,47 +26,41 @@ RyuJIT provides the just in time compilation service for the .NET runtime. The r
# Internal Representation (IR)
-## Overview of the IR
-
-The RyuJIT IR can be described at a high level as follows:
-
-* The Compiler object is the primary data structure of the JIT. Each method is represented as a doubly-linked list of `BasicBlock` objects. The Compiler object points to the head of this list with the `fgFirstBB` link, as well as having additional pointers to the end of the list, and other distinguished locations.
- * `ICorJitCompiler::CompileMethod()` is invoked for each method, and creates a new Compiler object. Thus, the JIT need not worry about thread synchronization while accessing Compiler state. The EE has the necessary synchronization to ensure there is a single JIT’d copy of a method when two or more threads try to trigger JIT compilation of the same method.
-* `BasicBlock` nodes contain a list of doubly-linked statements with no internal control flow (there is an exception for the case of the qmark/colon operator)
- * The `BasicBlock` also contains the dataflow information, when available.
-* `GenTree` nodes represent the operations and statement of the method being compiled.
- * It includes the type of the node, as well as value number, assertions, and register assignments when available.
-* `LclVarDsc` represents a local variable, argument or JIT-created temp. It has a `gtLclNum` which is the identifier usually associated with the variable in the JIT and its dumps. The `LclVarDsc` contains the type, use count, weighted use count, frame or register assignment etc. These are often referred to simply as “lclVars”. They can be tracked (`lvTracked`), in which case they participate in dataflow analysis, and have a different index (`lvVarIndex`) to allow for the use of dense bit vectors.
+## `Compiler` object
-![RyuJIT IR Overview](../images/ryujit-ir-overview.png)
+The `Compiler` object is the primary data structure of the JIT. While it is not part of the JIT's IR per se, it serves as the root from which the data structures that implement the IR are accessible. For example, the `Compiler` object points to the head of the function's `BasicBlock` list with the `fgFirstBB` field, as well as having additional pointers to the end of the list, and other distinguished locations. `ICorJit`Compiler`::CompileMethod()` is invoked for each method, and creates a new `Compiler` object. Thus, the JIT need not worry about thread synchronization while accessing `Compiler` state. The EE has the necessary synchronization to ensure there is a single JIT’d copy of a method when two or more threads try to trigger JIT compilation of the same method.
-The IR has two modes:
+## Overview of the IR
-* In tree-order mode, non-statement nodes (often described as expression nodes, though they are not always strictly expressions) are linked only via parent-child links (unidirectional). That is, the consuming node has pointers to the nodes that produce its input operands.
-* In linear-order mode, non-statement nodes have both parent-child links as well as execution order links (`gtPrev` and `gtNext`).
- * In the interest of maintaining functionality that depends upon the validity of the tree ordering, the linear mode of the `GenTree` IR has an unusual constraint that the execution order must represent a valid traversal of the parent-child links.
+RyuJIT represents a function as a doubly-linked list of `BasicBlock` values. Each `BasicBlock` has explicit edges to its successors that define the function's non-exceptional control flow. Exceptional control flow is implicit, with protected regions and handlers described in a table of `EHblkDsc` values. At the beginning of a compilation, each `BasicBlock` contains nodes in a high-level, statement- and tree-oriented form (HIR); this form persists throughout the JIT's front end. During the first phase of the back end--the rationalization phase--the HIR for each block is lowered to a linearly-ordered, node-oriented form (LIR). The fundamental distinction between HIR and LIR is in ordering semantics, though there are also some restrictions on the types of nodes that may appear in an HIR or LIR block.
-A separate representation, `insGroup` and `instrDesc`, is used during the actual instruction encoding.
+Both HIR and LIR blocks are composed of `GenTree` nodes that define the operations performed by the block. A `GenTree` node may consume some number of operands and may produce a singly-defined, at-most-singly-used value as a result. These values are referred to interchangably as *SDSU* temps or *tree* temps. Defs of SDSU temps are represented by `GenTree` nodes themselves, and uses are represented by edges from the using node to the defining node. Furthermore, SDSU temps defined in one block may not be used in a different block. In cases where a value must be multiply-defined, multiply-used, or defined in one block and used in another, the IR provides another class of temporary: the local var. Local vars are defined by assignment nodes in HIR or store nodes in LIR, and are used by local var nodes in both forms.
-### Statement Order
+An HIR block is composed of a doubly-linked list of statement nodes (`GenTreeStmt`), each of which references a single expression tree (`gtStmtExpr`). The `GenTree` nodes in this tree execute in "tree order", which is defined as the order produced by a depth-first, left-to-right traversal of the tree, with two notable exceptions:
+- Binary nodes marked with the `GTF_REVERSE_OPS` flag execute their right operand tree (`gtOp2`) before their left operand tree (`gtOp1`)
+- Dynamically-sized block copy nodes where `gtEvalSizeFirst` is `true` execute the `gtDynamicSize` tree before executing their other operand trees
+In addition to tree order, HIR also requires that no SDSU temp is defined in one statement and used in another. In situations where the requirements of tree and statement order prove onerous (e.g. when code must execute at a particular point in a function), HIR provides `GT_COMMA` nodes as an escape valve: these nodes consume and discard the results of their left-hand side while producing a copy of the vaule produced by their right-hand side. This allows the compiler to insert code in the middle of a statement without requiring that the statement be split apart.
-During the “front end” of the JIT compiler (prior to Rationalization), the execution order of the `GenTree` nodes on a statement is fully described by the “tree” order – that is, the links from the top node of a statement (the `gtStmtExpr`) to its children. The order is determined by a depth-first, left-to-right traversal of the tree, with the exception of nodes marked `GTF_REVERSE_OPS` on binary nodes, whose second operand is traversed before its first.
+An LIR block is composed of a doubly-linked list of `GenTree` nodes, each of which describes a single operation in the method. These nodes execute in the order given by the list; there is no relationship between the order in which a node's operands appear and the order in which the operators that produced those operands execute. The only exception to this rule occurs after the register allocator, which may introduce `GT_COPY` and `GT_RELOAD` nodes that execute in "spill order". Spill order is defined as the order in which the register allocator visits a node's operands. For correctness, the code generator must generate code for spills, reloads, and `GT_COPY`/`GT_RELOAD` nodes in this order.
-After rationalization, the execution order can no longer be deduced from the tree order alone. At this point, the dominant ordering becomes “linear order”. This is because at this point any `GT_COMMA` nodes have been replaced by embedded statements, whose position in the execution order can only be determined by the `gtNext` and `gtPrev` links on the tree nodes.
+In addition to HIR and LIR `BasicBlock`s, a separate representation--`insGroup` and `instrDesc`--is used during the actual instruction encoding.
-This modality is captured in the `fgOrder` flag on the Compiler object – it is either `FGOrderTree` or `FGOrderLinear`.
+![RyuJIT IR Overview](../images/ryujit-ir-overview.png)
## GenTree Nodes
-Each operation is represented as a GenTree node, with an opcode (GT_xxx), zero or more child `GenTree` nodes, and additional fields as needed to represent the semantics of that node.
+Each operation is represented as a GenTree node, with an opcode (`GT_xxx`), zero or more child/operand `GenTree` nodes, and additional fields as needed to represent the semantics of that node. Every node includes its type, value number, assertions, register assignments, etc. when available.
-The `GenTree` nodes are doubly-linked in execution order, but the links are not necessarily valid during all phases of the JIT.
-
-The statement nodes utilize the same `GenTree` base type as the operation nodes, though they are not truly related.
+`GenTree` nodes are doubly-linked in execution order, but the links are not necessarily valid during all phases of the JIT. In HIR these links are primarily a convenience, as the order produced by a traversal of the links must match the order produced by a "tree order" traversal (see above for details). In LIR these links define the execution order of the nodes.
+HIR statement nodes utilize the same `GenTree` base type as the operation nodes, though they are not truly related.
* The statement nodes are doubly-linked. The first statement node in a block points to the last node in the block via its `gtPrev` link. Note that the last statement node does *not* point to the first; that is, the list is not fully circular.
* Each statement node contains two `GenTree` links – `gtStmtExpr` points to the top-level node in the statement (i.e. the root of the tree that represents the statement), while `gtStmtList` points to the first node in execution order (again, this link is not always valid).
+## Local var descriptors
+
+A `LclVarDsc` represents a possibly-multiply-defined, possibly-multiply-used temporary. These temporaries may be used to represent user local variables, arguments or JIT-created temps. Each lclVar has a `gtLclNum` which is the identifier usually associated with the variable in the JIT and its dumps. The `LclVarDsc` contains the type, use count, weighted use count, frame or register assignment etc. A local var may be "tracked" (`lvTracked`), in which case it participates in dataflow analysis, and has a secondary name (`lvVarIndex`) that allows for the use of dense bit vectors.
+
### Example of Post-Import IR
For this snippet of code (extracted from [tests/src/JIT/CodeGenBringUpTests/DblRoots.cs](https://github.com/dotnet/coreclr/blob/master/tests/src/JIT/CodeGenBringUpTests/DblRoots.cs)):
@@ -103,22 +97,21 @@ The JIT is primarily concerned with “primitive” types, i.e. integers, refere
## Dataflow Information
-In order to limit throughput impact, the JIT limits the number of lvlVars for which liveness information is computed. These are the tracked lvlVars (`lvTracked` is true), and they are the only candidates for register allocation.
+In order to limit throughput impact, the JIT limits the number of lclVars for which liveness information is computed. These are the tracked lclVars (`lvTracked` is true), and they are the only candidates for register allocation (i.e. only these lclVars may be assigned registers over their entire lifetime). Defs and uses of untracked lclVars are treated as stores and loads to/from the appropriate stack location, and the corresponding nodes act as normal operators during register allocation.
The liveness analysis determines the set of defs, as well as the uses that are upward exposed, for each block. It then propagates the liveness information. The result of the analysis is captured in the following:
* The live-in and live-out sets are captured in the `bbLiveIn` and `bbLiveOut` fields of the `BasicBlock`.
-* The `GTF_VAR_DEF` flag is set on a lvlVar `GenTree` node that is a definition.
+* The `GTF_VAR_DEF` flag is set on a lclVar node (all of which are of type `GenTreeLclVarCommon`) that is a definition.
* The `GTF_VAR_USEASG` flag is set (in addition to the `GTF_VAR_DEF` flag) for the target of an update (e.g. +=).
-* The `GTF_VAR_USEDEF` is set on the target of an assignment of a binary operator with the same lvlVar as an operand.
## SSA
-Static single assignment (SSA) form is constructed in a traditional manner [[1]](#[1]). The SSA names are recorded on the lvlVar references. While SSA form usually retains a pointer or link to the defining reference, RyuJIT currently retains only the `BasicBlock` in which the definition of each SSA name resides.
+Static single assignment (SSA) form is constructed in a traditional manner [[1]](#[1]). The SSA names are recorded on the lclVar references. While SSA form usually retains a pointer or link to the defining reference, RyuJIT currently retains only the `BasicBlock` in which the definition of each SSA name resides.
## Value Numbering
-Value numbering utilizes SSA for lvlVar values, but also performs value numbering of expression trees. It takes advantage of type safety by not invalidating the value number for field references with a heap write, unless the write is to the same field. The IR nodes are annotated with the value numbers, which are indexes into a type-specific value number store. Value numbering traverses the trees, performing symbolic evaluation of many operations.
+Value numbering utilizes SSA for lclVar values, but also performs value numbering of expression trees. It takes advantage of type safety by not invalidating the value number for field references with a heap write, unless the write is to the same field. The IR nodes are annotated with the value numbers, which are indexes into a type-specific value number store. Value numbering traverses the trees, performing symbolic evaluation of many operations.
# Phases of RyuJIT
@@ -129,13 +122,13 @@ The top-level function of interest is `Compiler::compCompile`. It invokes the fo
|[Pre-import](#pre-import)|`Compiler->lvaTable` created and filled in for each user argument and variable. BasicBlock list initialized.|
|[Importation](#importation)|`GenTree` nodes created and linked in to Statements, and Statements into BasicBlocks. Inlining candidates identified.|
|[Inlining](#inlining)|The IR for inlined methods is incorporated into the flowgraph.|
-|[Struct Promotion](#struct-promotion)|New lvlVars are created for each field of a promoted struct.|
-|[Mark Address-Exposed Locals](#mark-addr-exposed)|lvlVars with references occurring in an address-taken context are marked. This must be kept up-to-date.|
+|[Struct Promotion](#struct-promotion)|New lclVars are created for each field of a promoted struct.|
+|[Mark Address-Exposed Locals](#mark-addr-exposed)|lclVars with references occurring in an address-taken context are marked. This must be kept up-to-date.|
|[Morph Blocks](#morph-blocks)|Performs localized transformations, including mandatory normalization as well as simple optimizations.|
|[Eliminate Qmarks](#eliminate-qmarks)|All `GT_QMARK` nodes are eliminated, other than simple ones that do not require control flow.|
|[Flowgraph Analysis](#flowgraph-analysis)|`BasicBlock` predecessors are computed, and must be kept valid. Loops are identified, and normalized, cloned and/or unrolled.|
-|[Normalize IR for Optimization](#normalize-ir)|lvlVar references counts are set, and must be kept valid. Evaluation order of `GenTree` nodes (`gtNext`/`gtPrev`) is determined, and must be kept valid.|
-|[SSA and Value Numbering Optimizations](#ssa-vn)|Computes liveness (`bbLiveIn` and `bbLiveOut` on `BasicBlocks`), and dominators. Builds SSA for tracked lvlVars. Computes value numbers.|
+|[Normalize IR for Optimization](#normalize-ir)|lclVar references counts are set, and must be kept valid. Evaluation order of `GenTree` nodes (`gtNext`/`gtPrev`) is determined, and must be kept valid.|
+|[SSA and Value Numbering Optimizations](#ssa-vn)|Computes liveness (`bbLiveIn` and `bbLiveOut` on `BasicBlocks`), and dominators. Builds SSA for tracked lclVars. Computes value numbers.|
|[Loop Invariant Code Hoisting](#licm)|Hoists expressions out of loops.|
|[Copy Propagation](#copy-propagation)|Copy propagation based on value numbers.|
|[Common Subexpression Elimination (CSE)](#cse)|Elimination of redundant subexressions based on value numbers.|
@@ -173,11 +166,11 @@ Struct promotion (`fgPromoteStructs()`) analyzes the local variables and temps,
Next, it determines whether it is likely to be profitable, based on the number of fields, and whether the fields are individually referenced.
-When a lvlVar is promoted, there are now N+1 lvlVars for the struct, where N is the number of fields. The original struct lvlVar is not considered to be tracked, but its fields may be.
+When a lclVar is promoted, there are now N+1 lclVars for the struct, where N is the number of fields. The original struct lclVar is not considered to be tracked, but its fields may be.
### <a name="mark-addr-exposed"/>Mark Address-Exposed Locals
-This phase traverses the expression trees, propagating the context (e.g. taking the address, indirecting) to determine which lvlVars have their address taken, and which therefore will not be register candidates. If a struct lvlVar has been promoted, and is then found to be address-taken, it will be considered “dependently promoted”, which is an odd way of saying that the fields will still be separately tracked, but they will not be register candidates.
+This phase traverses the expression trees, propagating the context (e.g. taking the address, indirecting) to determine which lclVars have their address taken, and which therefore will not be register candidates. If a struct lclVar has been promoted, and is then found to be address-taken, it will be considered “dependently promoted”, which is an odd way of saying that the fields will still be separately tracked, but they will not be register candidates.
### <a name="morph-blocks"/>Morph Blocks
@@ -201,7 +194,7 @@ At this point, a number of analyses and transformations are done on the flowgrap
At this point, a number of properties are computed on the IR, and must remain valid for the remaining phases. We will call this “normalization”
-* `lvaMarkLocalVars` – set the reference counts (raw and weighted) for lvlVars, sort them, and determine which will be tracked (currently up to 128). Note that after this point any transformation that adds or removes lvlVar references must update the reference counts.
+* `lvaMarkLocalVars` – set the reference counts (raw and weighted) for lclVars, sort them, and determine which will be tracked (currently up to 128). Note that after this point any transformation that adds or removes lclVar references must update the reference counts.
* `optOptimizeBools` – this optimizes Boolean expressions, and may change the flowgraph (why is it not done prior to reachability and dominators?)
* Link the trees in evaluation order (setting `gtNext` and `gtPrev` fields): and `fgFindOperOrder()` and `fgSetBlockOrder()`.
@@ -216,7 +209,7 @@ This phase traverses all the loop nests, in outer-to-inner order (thus hoisting
* A valid CSE candidate
* Has no side-effects
* Does not raise an exception OR occurs in the loop prior to any side-effects
-* Has a valid value number, and it is a lvlVar defined outside the loop, or its children (the value numbers from which it was computed) are invariant.
+* Has a valid value number, and it is a lclVar defined outside the loop, or its children (the value numbers from which it was computed) are invariant.
### <a name="copy-propagation"/>Copy Propagation
@@ -226,7 +219,7 @@ The JIT currently requires that the IR be maintained in conventional SSA form, a
### <a name="cse"/>Common Subexpression Elimination (CSE)
-Utilizes value numbers to identify redundant computations, which are then evaluated to a new temp lvlVar, and then reused.
+Utilizes value numbers to identify redundant computations, which are then evaluated to a new temp lclVar, and then reused.
### <a name="assertion-propagation"/>Assertion Propagation
@@ -243,11 +236,12 @@ As the JIT has evolved, changes have been made to improve the ability to reason
* Elimination of assignment nodes (`GT_ASG`). The assignment node was problematic because the semantics of its destination (left hand side of the assignment) could not be determined without context. For example, a `GT_LCL_VAR` on the left-hand side of an assignment is a definition of the local variable, but on the right-hand side it is a use. Furthermore, since the execution order requires that the children be executed before the parent, it is unnatural that the left-hand side of the assignment appears in execution order before the assignment operator.
* During rationalization, all assignments are replaced by stores, which either represent their destination on the store node itself (e.g. `GT_LCL_VAR`), or by the use of a child address node (e.g. `GT_STORE_IND`).
* Elimination of address nodes (`GT_ADDR`). These are problematic because of the need for parent context to analyze the child.
-* Elimination of “comma” nodes (`GT_COMMA`). These nodes are introduced for convenience during importation, during which a single tree is constructed at a time, and not incorporated into the statement list until it is completed. When it is necessary, for example, to store a partially-constructed tree into a temporary variable, a `GT_COMMA` node is used to link it into the tree. However, in later phases, these comma nodes are an impediment to analysis, and thus are split into separate statements.
+* Elimination of “comma” nodes (`GT_COMMA`). These nodes are introduced for convenience during importation, during which a single tree is constructed at a time, and not incorporated into the statement list until it is completed. When it is necessary, for example, to store a partially-constructed tree into a temporary variable, a `GT_COMMA` node is used to link it into the tree. However, in later phases, these comma nodes are an impediment to analysis, and thus are eliminated.
* In some cases, it is not possible to fully extract the tree into a separate statement, due to execution order dependencies. In these cases, an “embedded” statement is created. While these are conceptually very similar to the `GT_COMMA` nodes, they do not masquerade as expressions.
* Elimination of “QMark” (`GT_QMARK`/`GT_COLON`) nodes is actually done at the end of morphing, long before the current rationalization phase. The presence of these nodes made analyses (especially dataflow) overly complex.
+* Elimination of statements. Without statements, the execution order of a basic block's contents is fully defined by the `gtNext`/`gtPrev` links between `GenTree` nodes.
-For our earlier example (Example of Post-Import IR), here is what the simplified dump looks like just prior to Rationalization (the $ annotations are value numbers). Note that some common subexpressions have been computed into new temporary lvlVars, and that computation has been inserted as a `GT_COMMA` (comma) node in the IR:
+For our earlier example (Example of Post-Import IR), here is what the simplified dump looks like just prior to Rationalization (the $ annotations are value numbers). Note that some common subexpressions have been computed into new temporary lclVars, and that computation has been inserted as a `GT_COMMA` (comma) node in the IR:
▌ stmtExpr void (top level) (IL 0x000...0x026)
│ ┌──▌ lclVar double V07 cse1 $185
@@ -279,38 +273,50 @@ For our earlier example (Example of Post-Import IR), here is what the simplified
└──▌ indir double $186
└──▌ lclVar byref V03 arg3 u:2 (last use) $c0
-After rationalization, the nodes are presented in execution order, and the `GT_COMMA` (comma) and `GT_ASG` (=) nodes have been eliminated:
-
- ▌ stmtExpr void (top level) (IL 0x000... ???)
- │ ┌──▌ lclVar double V01 arg1
- │ ├──▌ lclVar double V01 arg1
- │ ┌──▌ \* double
- │ │ ┌──▌ lclVar double V00 arg0
- │ │ ├──▌ dconst double 4.00
- │ │ ┌──▌ \* double
- │ │ ├──▌ lclVar double V02 arg2
- │ ├──▌ \* double
- │ ┌──▌ - double
- │ ┌──▌ mathFN double sqrt
- └──▌ st.lclVar double V06
-
- ▌ stmtExpr void (top level) (IL 0x000...0x026)
- │ ┌──▌ lclVar double V06
- │ │ ┌──▌ lclVar double V01 arg1
- │ ├──▌ unary - double
- │ ┌──▌ + double
- │ │ { ▌ stmtExpr void (embedded) (IL 0x000... ???)
- │ │ { │ ┌──▌ lclVar double V00 arg0
- │ │ { │ ├──▌ dconst double 2.00
- │ │ { │ ┌──▌ \* double
- │ │ { └──▌ st.lclVar double V07
- │ ├──▌ lclVar double V07
- │ ┌──▌ / double
- │ ├──▌ lclVar byref V03 arg3
- └──▌ storeIndir double
-
-
-Note that the first operand of the first comma has been extracted into a separate statement, but the second comma causes an embedded statement to be created, in order to preserve execution order.
+After rationalization, the nodes are presented in execution order, and the `GT_COMMA` (comma), `GT_ASG` (=), and `GT_STMT` nodes have been eliminated:
+
+ t0 = lclVar double V01 arg1
+ t1 = lclVar double V01 arg1
+ ┌──▌ t0 double
+ ├──▌ t1 double
+ t2 = \* double
+ t3 = lclVar double V00 arg0
+ t4 = dconst double 4.00
+ ┌──▌ t3 double
+ ├──▌ t4 double
+ t5 = \* double
+ t6 = lclVar double V02 arg2
+ ┌──▌ t5 double
+ ├──▌ t6 double
+ t7 = \* double
+ ┌──▌ t2 double
+ ├──▌ t7 double
+ t8 = - double
+ ┌──▌ t8 double
+ t9 = mathFN double sqrt
+ ┌──▌ t9 double
+ st.lclVar double V06
+ t10 = lclVar double V06
+ t11 = lclVar double V01 arg1
+ ┌──▌ t11 double
+ t12 = unary - double
+ ┌──▌ t10 double
+ ├──▌ t12 double
+ t13 = + double
+ t14 = lclVar double V00 arg0
+ t15 = dconst double 2.00
+ ┌──▌ t14 double
+ ├──▌ t15 double
+ t16 = \* double
+ ┌──▌ t16 double
+ st.lclVar double V07
+ t18 = lclVar double V07
+ ┌──▌ t17 double
+ ├──▌ t18 double
+ t19 = / double
+ t20 = lclVar byref V03 arg3
+ ┌──▌ t20 double
+ storeIndir double
## <a name="lowering"/>Lowering
@@ -318,34 +324,49 @@ Lowering is responsible for transforming the IR in such a way that the control f
It accomplishes this in two passes.
-The first pass is a post-order traversal that performs context-dependent transformations such as expanding switch statements (using a switch table or a series of conditional branches), constructing addressing modes, etc. For example, this:
-
- ┌──▌ lclVar ref V00 arg0
- │ ┌──▌ lclVar int V03 loc1
- │ ┌──▌ cast long <- int
- │ ├──▌ const long 2
- ├──▌ << long
- ┌──▌ + byref
- ├──▌ const long 16
- ┌──▌ + byref
- ┌──▌ indir int
+The first pass is an execution-order traversal that performs context-dependent transformations such as expanding switch statements (using a switch table or a series of conditional branches), constructing addressing modes, etc. For example, this:
+
+ t0 = lclVar ref V00 arg0
+ t1 = lclVar int V03 loc1
+ ┌──▌ t1 int
+ t2 = cast long <- int
+ t3 = const long 2
+ ┌──▌ t2 long
+ ├──▌ t3 long
+ t4 = << long
+ ┌──▌ t0 ref
+ ├──▌ t4 long
+ t5 = + byref
+ t6 = const long 16
+ ┌──▌ t5 ref
+ ├──▌ t6 long
+ t7 = + byref
+ ┌──▌ t7 ref
+ t8 = indir int
Is transformed into this, in which the addressing mode is explicit:
- ┌──▌ lclVar ref V00 arg0
- │ ┌──▌ lclVar int V03 loc1
- ├──▌ cast long <- int
- ┌──▌ lea(b+(i*4)+16) byref
- ┌──▌ indir int
+ t0 = lclVar ref V00 arg0
+ t1 = lclVar int V03 loc1
+ ┌──▌ t1 int
+ t2 = cast long <- int
+ ┌──▌ t0 ref
+ ├──▌ t2 long
+ t7 = lea(b+(i*4)+16) byref
+ ┌──▌ t7 ref
+ t8 = indir int
-The next pass annotates the nodes with register requirements, and this is done in an execution order traversal (effectively post-order) in order to ensure that the children are visited prior to the parent. It may also do some transformations that do not require the parent context, such as determining the code generation strategy for block assignments (e.g. `GT_COPYBLK`) which may become helper calls, unrolled loops, or an instruction like rep stos.
+The next pass annotates the nodes with register requirements. This is done in an execution order traversal in order to ensure that each node's operands are visited prior to the node itself. This pass may also do some transformations that do not require the parent context, such as determining the code generation strategy for block assignments (e.g. `GT_COPYBLK`) which may become helper calls, unrolled loops, or an instruction like `rep stos`.
-The register requirements are expressed in the `TreeNodeInfo` (`gtLsraInfo`) for each node. For example, for the `copyBlk` node in this snippet:
+The register requirements of a node are expressed by its `TreeNodeInfo` (`gtLsraInfo`) value. For example, for the `copyBlk` node in this snippet:
- Source │ ┌──▌ const(h) long 0xCA4000 static
- Destination │ ├──▌ &lclVar byref V04 loc4
- │ ├──▌ const int 34
- └──▌ copyBlk void
+ t0 = const(h) long 0xCA4000 static
+ t1 = &lclVar byref V04 loc4
+ t2 = const int 34
+ ┌──▌ t0 long
+ ├──▌ t1 byref
+ ├──▌ t2 int
+ copyBlk void
The `TreeNodeInfo` would be as follows:
@@ -372,7 +393,7 @@ Pre-conditions:
Allocation proceeds in 4 phases:
* Determine the order in which the `BasicBlocks` will be allocated, and which predecessor of each block will be used to determine the starting location for variables live-in to the `BasicBlock`.
-* Construct Intervals for each tracked lvlVar, then walk the `BasicBlocks` in the determined order building `RefPositions` for each register use, def, or kill.
+* Construct Intervals for each tracked lclVar, then walk the `BasicBlocks` in the determined order building `RefPositions` for each register use, def, or kill.
* Allocate the registers by traversing the `RefPositions`.
* Write back the register assignments, and perform any necessary moves at block boundaries where the allocations don’t match.
@@ -380,8 +401,8 @@ Post-conditions:
* The `gtRegNum` property of all `GenTree` nodes that require a register has been set to a valid register number.
* The `gtRsvdRegs` field (a set/mask of registers) has the requested number of registers specified for internal use.
-* All spilled values (lvlVar or expression) are marked with `GTF_SPILL` at their definition. For lvlVars, they are also marked with `GTF_SPILLED` at any use at which the value must be reloaded.
-* For all lvlVars that are register candidates:
+* All spilled values (lclVar or expression) are marked with `GTF_SPILL` at their definition. For lclVars, they are also marked with `GTF_SPILLED` at any use at which the value must be reloaded.
+* For all lclVars that are register candidates:
* `lvRegNum` = initial register location (or `REG_STK`)
* `lvRegister` flag set if it always lives in the same register
* `lvSpilled` flag is set if it is ever spilled
@@ -391,12 +412,12 @@ Post-conditions:
The process of code generation is relatively straightforward, as Lowering has done some of the work already. Code generation proceeds roughly as follows:
-* Determine the frame layout – allocating space on the frame for any lvlVars that are not fully enregistered, as well as any spill temps required for spilling non-lvlVar expressions.
+* Determine the frame layout – allocating space on the frame for any lclVars that are not fully enregistered, as well as any spill temps required for spilling non-lclVar expressions.
* For each `BasicBlock`, in layout order, and each `GenTree` node in the block, in execution order:
* If the node is “contained” (i.e. its operation is subsumed by a parent node), do nothing.
* Otherwise, “consume” all the register operands of the node.
- * This updates the liveness information (i.e. marking a lvlVar as dead if this is the last use), and performs any needed copies.
- * This must be done in correct execution order, obeying any reverse flags (GTF_REVERSE_OPS) on the operands, so that register conflicts are handled properly.
+ * This updates the liveness information (i.e. marking a lclVar as dead if this is the last use), and performs any needed copies.
+ * This must be done in "spill order" so that any spill/restore code inserted by the register allocator to resolve register conflicts is generated in the correct order. "
* Track the live variables in registers, as well as the live stack variables that contain GC refs.
* Produce the `instrDesc(s)` for the operation, with the current live GC references.
* Update the scope information (debug info) at block boundaries.
@@ -415,14 +436,14 @@ There are several properties of the IR that are valid only during (or after) spe
* Computes edge weights, if profile information is available.
* Identifies and normalizes loops. These may be invalidated, but must be marked as such.
* Normalization
- * The lvlVar reference counts are set by `lvaMarkLocalVars()`.
+ * The lclVar reference counts are set by `lvaMarkLocalVars()`.
* Statement ordering is determined by `fgSetBlockOrder()`. Execution order is a depth-first preorder traversal of the nodes, with the operands usually executed in order. The exceptions are:
- * Commutative operators, which can have the `GTF_REVERSE_OPS` flag set to indicate that op2 should be evaluated before op1.
- * Assignments, which can also have the `GTF_REVERSE_OPS` flag set to indicate that the rhs (op2) should be evaluated before the target address (if any) on the lhs (op1) is evaluated. This can only be done if there are no side-effects in the expression for the lhs.
+ * Binary operators, which can have the `GTF_REVERSE_OPS` flag set to indicate that the RHS (`gtOp2`) should be evaluated before the LHS (`gtOp1`).
+ * Dynamically-sized block copy nodes, which can have `gtEvalSizeFirst` set to `true` to indicate that their `gtDynamicSize` tree should be evaluated before executing their other operands.
* Rationalization
- * All `GT_COMMA` nodes are split into separate statements, which may be embedded in other statements in execution order.
* All `GT_ASG` trees are transformed into `GT_STORE` variants (e.g. `GT_STORE_LCL_VAR`).
* All `GT_ADDR` nodes are eliminated (e.g. with `GT_LCL_VAR_ADDR`).
+ * All `GT_COMMA` and `GT_STMT` nodes are removed and their constituent nodes linked into execution order.
* Lowering
* `GenTree` nodes are split or transformed as needed to expose all of their register requirements and any necessary `flowgraph` changes (e.g., for switch statements).
@@ -436,13 +457,10 @@ Ordering:
* After normalization the `gtStmtList` of the containing statement points to the first node to be executed.
* Prior to normalization, the `gtNext` and `gtPrev` pointers on the expression (non-statement) `GenTree` nodes are invalid. The expression nodes are only traversed via the links from parent to child (e.g. `node->gtGetOp1()`, or `node->gtOp.gtOp1`). The `gtNext/gtPrev` links are set by `fgSetBlockOrder()`.
* After normalization, and prior to rationalization, the parent/child links remain the primary traversal mechanism. The evaluation order of any nested expression-statements (usually assignments) is enforced by the `GT_COMMA` in which they are contained.
-* After rationalization, all `GT_COMMA` nodes are eliminated, and the primary traversal mechanism becomes the `gtNext/gtPrev` links. Statements may be embedded within other statements, but the nodes of each statement preserve the valid traversal order.
+* After rationalization, all `GT_COMMA` nodes are eliminated, statements are flattened, and the primary traversal mechanism becomes the `gtNext/gtPrev` links which define the execution order.
* In tree ordering:
* The `gtPrev` of the first node (`gtStmtList`) is always null.
* The `gtNext` of the last node (`gtStmtExpr`) is always null.
-* In linear ordering:
- * The nodes of each statement are ordered such that `gtStmtList` is encountered first, and `gtStmtExpr` is encountered last.
- * The nodes of an embedded statement S2 (starting with `S2->gtStmtList`) appear in the ordering after a node from the “containing” statement S1, and no other node from S1 will appear in the list prior to the `gtStmtExpr` of S2. However, there may be multiple levels of nesting of embedded statements.
TreeNodeInfo:
@@ -450,7 +468,7 @@ TreeNodeInfo:
## LclVar phase-dependent properties
-Prior to normalization, the reference counts (`lvRefCnt` and `lvRefCntWtd`) are not valid. After normalization they must be updated when lvlVar references are added or removed.
+Prior to normalization, the reference counts (`lvRefCnt` and `lvRefCntWtd`) are not valid. After normalization they must be updated when lclVar references are added or removed.
# Supporting technologies and components
@@ -465,8 +483,8 @@ Instruction encoding is performed by the emitter ([emit.h](https://github.com/do
Reporting of live GC references is done in two ways:
-* For stack locations that are not tracked (these could be spill locations or lvlVars – local variables or temps – that are not register candidates), they are initialized to null in the prolog, and reported as live for the entire method.
-* For lvlVars with tracked lifetimes, or for expression involving GC references, we report the range over which the reference is live. This is done by the emitter, which adds this information to the instruction group, and which terminates instruction groups when the GC info changes.
+* For stack locations that are not tracked (these could be spill locations or lclVars – local variables or temps – that are not register candidates), they are initialized to null in the prolog, and reported as live for the entire method.
+* For lclVars with tracked lifetimes, or for expression involving GC references, we report the range over which the reference is live. This is done by the emitter, which adds this information to the instruction group, and which terminates instruction groups when the GC info changes.
The tracking of GC reference lifetimes is done via the `GCInfo` class in the JIT. It is declared in [src/jit/jitgcinfo.h](https://github.com/dotnet/coreclr/blob/master/src/jit/jitgcinfo.h) (to differentiate it from [src/inc/gcinfo.h](https://github.com/dotnet/coreclr/blob/master/src/inc/gcinfo.h)), and implemented in [src/jit/gcinfo.cpp](https://github.com/dotnet/coreclr/blob/master/src/jit/gcinfo.cpp).
@@ -478,7 +496,7 @@ Debug info consists primarily of two types of information in the JIT:
* Mapping of IL offsets to native code offsets. This is accomplished via:
* the `gtStmtILoffsx` on the statement nodes (`GenTreeStmt`)
- * the `gtLclILoffs` on lvlVar references (`GenTreeLclVar`)
+ * the `gtLclILoffs` on lclVar references (`GenTreeLclVar`)
* The IL offsets are captured during CodeGen by calling `CodeGen::genIPmappingAdd()`, and then written to debug tables by `CodeGen::genIPmappingGen()`.
* Mapping of user locals to location (register or stack). This is accomplished via:
* Struct `siVarLoc` (in [compiler.h](https://github.com/dotnet/coreclr/blob/master/src/jit/compiler.h)) captures the location