diff options
Diffstat (limited to 'Documentation/design-docs/removing-embedded-statements.md')
-rw-r--r-- | Documentation/design-docs/removing-embedded-statements.md | 180 |
1 files changed, 180 insertions, 0 deletions
diff --git a/Documentation/design-docs/removing-embedded-statements.md b/Documentation/design-docs/removing-embedded-statements.md new file mode 100644 index 0000000000..c7d63e7926 --- /dev/null +++ b/Documentation/design-docs/removing-embedded-statements.md @@ -0,0 +1,180 @@ +## Removing Embedded Statements From the RyuJIT Backend IR + +Pat Gavlin ([*pagavlin@microsoft.com*](mailto:pagavlin@microsoft.com)) + +July 2016 + +### Abstract + +RyuJIT’s IR comes in two forms: that produced and manipulated in the front end and that expected and manipulated by the +back end. The boundary between these two forms of IR is comprised of rationalization and lowering, both of which +transform the front-end IR into back end IR. For the purposes of this paper, the relevant differences between the two +IRs revolve around ordering constructs within basic blocks: top-level statements, trees, comma nodes, and embedded +statements. The latter two constructs are used to represent arbitrary (often side-effecting) code that executes at a +specific point in a tree but does not otherwise participate in the tree’s dataflow. Unfortunately, representational +challenges with embedded statements make them difficult to understand and error-prone to manipulate. This paper proposes +that we remove all statements--embedded and otherwise--from the backend IR by chaining the last linearly-threaded node +within a statement to the first linearly-threaded node within its successor and vice versa as well as removing certain +constraints on the shape of the nodes within a block. + +### Review: IR ordering semantics + +As previously mentioned, RyuJIT uses two forms of IR: the front-end IR (referred to hereafter as HIR) and the back-end +IR (referred to hereafter as LIR). Aside from using different representations for operations such as stores, HIR and LIR +differ in their ordering constructs within basic blocks. + +Within a basic block, the HIR is ordered first by statements. Each statement consists of a single tree that performs the +computation associated with that statement. The nodes of that tree are executed in the order produced by a left-to-right +post-order visit of the tree’s nodes (with the exception of binary operator nodes that have the GTF\_REVERSE\_OPS flag +set, which reverses the order of their operand trees). As such, the edges in a tree represent both ordering and edges in +a dataflow graph where every definition has a single use, with some exceptions: + +- Edges to nodes that represent defs of storage locations (e.g. edges to nodes on the LHS of assignment operators) + often represent neither ordering or a use-to-def relationship: the def of the location happens as part of the + execution of its parent, and the edge does not represent a use of an SDSU temp. + +- Edges to unused values do not represent a use-to-def relationship: these edges exist only for ordering. + +The primary source of the latter are comma nodes, which are used in the HIR specifically to insert arbitrary code into a +tree’s execution that does not otherwise participate in its SDSU dataflow. + +Similar to the HIR, the LIR is ordered first by statements. Each statement consists of a single tree and a linear +threading of nodes that represent the SDSU dataflow and computation, respectively, that are associated with that +statement. The linear threading of nodes must contain each node that is present in the tree, and all nodes that make up +the tree must be present in the linear threading in the same order in which they would have been executed by the HIR ’s +tree ordering. Additional nodes may be present in the linear threading in the form of embedded statements, which are +sub-statements whose nodes form a contiguous subsequence of their containing statement’s execution order, but do not +participate in the containing statement’s SDSU dataflow (i.e. the embedded statement’s nodes are not present in the +containing statement’s tree). Embedded statements otherwise have the same ordering semantics as top-level statements, +and are used for the same purpose as comma nodes in the HIR: they allow the compiler to insert arbitrary code into a +statement’s execution that does not otherwise participate in its SDSU dataflow. + +As mentioned, both comma nodes and embedded statements are used for the same purpose in the HIR and LIR, respectively. +Each construct represents a contiguous sequence of code that executes within the context of a tree but that does not +participate in its SDSU dataflow. Of the two, however, embedded statements are generally more difficult to work with: +since they are not present in their containing statement’s tree, the developer must take particular care when processing +LIR in order to avoid omitting the nodes that comprise embedded statements from analyses such as those required for code +motion (e.g. the analysis required to make addressing mode “contained”) and to avoid violating the tree order constraint +placed upon the LIR ’s linear threading. + +The problems with manipulating embedded statements have two relatively clear solutions: either replace embedded +statements with a construct that is represented in its parent statement’s tree, or remove the tree order constraint and +move the LIR to a linear ordering that is constrained only by the requirements of dataflow and side-effect consistency. +We believe that the latter is preferable to the former, as it is consistent with the overall direction that the LIR has +taken thus far and reduces the number of concepts bound up in tree edges, which would thereafter represent only the SDSU +dataflow for a node. This would clarify the meaning of the IR as well as the analysis required to perform the +transformations required by the backend. + +### Approach + +The approach that we propose is outlined below. + +#### 1. Add utilities for working with LIR. + +Efficiently working with the new IR shape will require the creation of utilities for manipulating, analyzing, +validating, and displaying linearly-ordered LIR. Of these, validation and display of the LIR are particularly +interesting. Validation is likely to check at least the following properties: + +1. The linear threading is loop-free +2. All SDSU temps (i.e. temps represented by edges) are in fact singly-used +3. All defs of SDSU temps occur before the corresponding use +4. All SDSU defs that are used are present in the linear IR + +In the short term, we propose that the LIR be displayed using the linear dump that was recently added to the JIT. These +dumps would be configured such that all of the information typically present in today’s tree-style dumps (e.g. node +flags, value numbers, etc.) is also present in the linear dump. + +#### 2. Stop Generating embedded statements in the rationalizer. + +This can be approached in a few different ways: + +1. Remove commas from the IR before rationalizing. There is already infrastructure in-flight in order to perform this + transformation, so this is attractive from a dev-hours point of view. Unfortunately, this approach carries both + throughput and a code quality risks due to the addition of another pass and the creation of additional local vars. + +2. Change the rationalizer such that it simply does not produce embedded statements. This requires that the + rationalizer is either able to operate on both HIR and LIR or simply linearize commas as it goes. + +3. Remove embedded statements from the IR with a linearization pass between the rationalizer and lowering. + +We will move forward with option 2, as it is the most attractive from a throughput and code quality standpoint: +it does not add additional passes (as does option 3), nor does it introduce additional local vars (as does option +1). + +#### 3. Refactor decomposition and lowering to work with linear LIR. + +The bulk of the work in this step involves moving these passes from statement- and tree-ordered walks to linear walks +and refactoring any code that uses the parent stack to instead use a helper that can calculate the def-to-use edge for a +node. It will also be necessary to replace embedded statement insertion with simple linear IR insertion, which is +straightforward. + +##### 3.i. Decomposition + +Transitioning decomposition should be rather simple. This pass walks the nodes of each statement in execution order, +decomposing 64-bit operations into equivalent sequences of 32-bit operations as it goes. Critically, the rewrites +performed by decomposition are universally expansions of a single operator into a contiguous sequence of operators whose +results are combined to form the ultimate result. This sort of transformation is easily performed on linear IR by +inserting the new nodes before the node being replaced. Furthermore, because nodes will appear in the linear walk in the +same relative order that they appear in the execution-order tree walk, rewrites performed in linear order will occur in +the same order in which they were originally performed. + +##### 3.ii. Lowering + +The picture for lowering is much the same as the picture for decomposition. Like decomposition, all rewrites are +performed in execution order, and most rewrites are simple expansions of a single node into multiple nodes. There is one +notable exception, however: the lowering of add nodes for the x86 architecture examines the parent stack provided by +tree walks in order to defer lowering until it may be possible to form an address mode. In this case, the use of the +parent stack can be replaced with a helper that finds the node that uses the def produced by the add node. + +##### 3.iii. General issues + +Both decomposition and lowering make use of a utility to fix up information present in the per-call-node side table that +tracks extra information about the call’s arguments when necessary. This helper can be replaced by instead fixing up the +side table entries when the call is visited by each pass. + +#### 4. Remove uses of statements and the tree-order invariant from LSRA. + +LSRA currently depends on the tree order invariant so that it can build a stack to track data associated with the defs +consumed by an operator. Removing the tree order invariant requires that LSRA is able to accommodate situations where +the contents of this stack as produced by a linear walk would no longer contain correct information due to the insertion +of new nodes that produce values that are live across the operator and some subset of its operands. Analysis indicates +that a simple map from defs to the necessary information should be sufficient. + +#### 5. Remove uses of statements from the rest of the backend. + +The rest of the backend--i.e. codegen--has no known dependencies on tree order, so there are no concerns with respect to +the change in ordering semantics. However, statement nodes are used to derive IL offsets for debug information. There +are a number of alternative methods of tracking IL offsets to consider: + +1. Insert “IL offset” nodes into the linearly-ordered LIR. These nodes would then be noted by code generation and used + to emit the necessary IP-mapping entries in the same way that statements are today. This has the disadvantage of + requiring additional work when performing code motion on LIR if the IL offset for a particular node needs to be + kept up-to-date. However, today’s backend does not perform this sort of code motion and even if it did, optimized + debugging is not currently a scenario, so a loss of debugging fidelity may be acceptable. + +2. Use a side table to map from each node to its IL offset. Although this increases the total working set of the + backend, this has the advantage of making it easy to keep IL offsets correct in the face of code motion. + +3. Add IL offset information directly to each node. This has the disadvantage of increasing the working set of the + entire compiler. + +Unless we expect to require correct debug information in the face of code motion in the future, our recommendation is +for the first option, which comes at a minimum size and implementation cost. + +#### 6. Remove contained nodes from the LIR’s execution order. + +Once statements are removed from the LIR, there is one major issue left to address: the presence of contained nodes in +the LIR’s execution order. The execution of these nodes logically happens as part of the node in which they are +contained, but these nodes remain physically present in the IR at their original locations. As a result, the backend +must often take care to skip contained nodes when manipulating the LIR in ways that must take execution order into +account. Instead of leaving such nodes in the LIR, these node should be removed from execution order and instead be +represented as (probably unordered) trees referenced only by the containing node. + +### Conclusion and Future Directions + +The changes suggested in this paper move RyuJIT’s LIR from a linear view of a tree-ordered nodes with certain nodes +represented only in execution order to a linearly ordered sequence of nodes. Furthermore, with this change, tree edges +in LIR would represent either uses of SDSU temps that have a place in execution order (where the edge points from the +use of the temp to the def) or uses of unordered expression trees that execute as part of the parent node. This form +should be easier to work with due to the removal of the tree order constraint and embedded statements and its similarity +to other linear IR designs. |