diff options
Diffstat (limited to 'Documentation/design-docs/lsra-throughput.md')
-rw-r--r-- | Documentation/design-docs/lsra-throughput.md | 74 |
1 files changed, 74 insertions, 0 deletions
diff --git a/Documentation/design-docs/lsra-throughput.md b/Documentation/design-docs/lsra-throughput.md new file mode 100644 index 0000000000..4fd704c763 --- /dev/null +++ b/Documentation/design-docs/lsra-throughput.md @@ -0,0 +1,74 @@ +Improving LSRA Throughput +========================= + +There are a number of ways in which the current implementation of linear scan register allocation (LSRA) is sub-optimal: +* I'm not certain that the extra pass that enumerates the nodes before the current `TreeNodeInfoInit` pass must be separate. + Further investigation is needed. +* The identification of opportunities for "containment" (i.e. where the computation of a node's result can be folded into the parent, +such as a load or store) is done during `Lowering` and communicated to the register allocator via a `gtLsraInfo` field on the node, that +is otherwise unused, and is basically duplicated when the `RefPosition`s are built for the node. + * A more efficient representation of "containment" could allow this to remain in `Lowering`, where existing transformations already + take into account the parent context. + * This would also have the additional benefit of simplifying the containment check, which is done at least once for each node + (at the beginning of `CodeGen::genCodeForTreeNode()`), and then additionally when considering whether the operands of the current + node are contained. + * Alternatively, the containment analysis could be done during the building of `RefPosition`s, though see below. +* Similarly, the specification of register requirements is done during the final pass of `Lowering`, and fundamentally requires more + space (it must specify register masks for sources, destination and any internal registers). In addition, the requirement for a new + register definition (the destination of the node, or any internal registers) is independent of the parent, so this could be done in + `LinearScan::buildRefPositionsForNode()` without having to do a dual traversal, unlike the identification of contained nodes. +* After building `RefPositions`, they are traversed in order to set the last use bits. + This is done separately because there are currently inconsistencies between the gtNext/gtPrev links and the actual order of codegen. + Once this is resolved, the lastUse bits should be set prior to register allocation by the liveness pass (#7256). +* The `RefPosition`s are all created prior to beginning the register allocation pass. However, they are only really needed in advance + for the lclVars, which, unlike the "tree temps", have multiple definitions and may be live across basic blocks. + The `RefPositions`s for the tree temps could potentially be allocated on-the-fly, saving memory and probably improving locality (#7257). +* The loop over all the candidate registers in `LinearScan::tryAllocateFreeReg()` and in `LinearScan::allocateBusyReg()` could be + short-circuited when a register is found that has the best possible score. Additionally, in the case of MinOpts, it could potentially + short-circuit as soon as a suitable candidate is found, though one would want to weight the throughput benefit against the code + quality impact. + +Representing Containedness +========================== +My original plan for this was to combine all of the functionality of the current `TreeNodeInfoInit` pass with the building of `RefPositions`, +and eliminate `gtLsraInfo`. +The idea was to later consider pulling the containment analysis back into the first phase of `Lowering`. +However, after beginning down that path (extracting the `TreeNodeInfoInit` methods into separate lsra{arch}.cpp files), I realized that +there would be a great deal of throw-away work to put the containment analysis into `LinearScan`, only to potentially pull it out later. + +Furthermore, the representation of containedness is not very clean: +* `Lowering` first communicates this as a combination of implicit knowledge of a node's behavior and its `gtLsraInfo.dstCount`. +* Later, during `CodeGen`, it is determined by combining similar implicit node characteristics with the presence or absence of a register. + +I propose instead to do the following: +* Add a flag to each node to indicate whether or not it is a tree root. + * To free up such a flag, I propose to eliminate `GTF_REG_VAL` for the non-`LEGACY_BACKEND`. Doing so will require some additional cleanup, + but in the process a number of hacks can be eliminated that are currently there to workaround the fact that the emitter was designed + to work with a code generator that dynamically assigned registers, and set that flag when the code had been generated to put it in a + register, unlike the RyuJIT backend, which assigns the registers before generating code. +* Define new register values: + * `REG_UNK` is assigned by `Lowering` when a register is required. + * `REG_OPT` is assigned by `Lowering` when a register is optional at both definition and use. + * `REG_OPT_USE` is assigned by `Lowering` when a register is required at the definition, but optional at the use. + * I don't know if we need `REG_OPT_DEF`, but that could be added as well. +* Having done this, we can greatly simplify `IsContained()`. + +It may be more effective to use the extra bit for an actual `GTF_CONTAINED` flag, and that is something we might want to consider +eventually, but initially it is easier to simplify the containedness check using `GTF_TREE_ROOT` without having to change all the +places that currently mark nodes as contained. + +Combining Containedness Analysis with Lowering +============================================== +Once we've changed containedness to use the above representation, we can move the code to set it into the first pass of `Lowering`. +There are likely to be some phase ordering challenges, but I don't think they will be prohibitive. + +Eliminating gtLsraInfo +====================== +Issue #7225. + +After the containedness changes above, all that remains to communicate via `gtLsraInfo` is the register requirements. +This step would still use the `TreeNodeInfo` data structure and the `TreeNodeInfoInit()` methods, but they would be called as +each node is handled by `LinearScan::buildRefPositionsForNode()`. + + + |