Age | Commit message (Collapse) | Author | Files | Lines |
|
Passing CompAllocator objects by value is advantageous because it no longer needs to be dynamically allocated and cached. CompAllocator instances can now be freely created, copied and stored, which makes adding new CompMemKind values easier.
Together with other cleanup this also improves memory allocation performance by removing some extra levels of indirection that were previously required - jitstd::allocator had a pointer to CompAllocator, CompAllocator had a pointer to Compiler and Compiler finally had a pointer to ArenaAllocator. Without MEASURE_MEM_ALLOC enabled, both jitstd::allocator and CompAllocator now just contain a pointer to ArenaAllocator. When MEASURE_MEM_ALLOC is enabled CompAllocator also contains a pointer but to a MemStatsAllocator object that holds the relevant memory kind. This way CompAllocator is always pointer sized so that enabling MEASURE_MEM_ALLOC does not result in increased memory usage due to objects that store a CompAllocator instance.
In order to implement this, 2 additional signficant changes have been made:
* MemStats has been moved to ArenaAllocator, it's after all the allocator's job to maintain statistics. This also fixes some issues related to memory statistics, such as not tracking the memory allocated by the inlinee compiler (since that one used its own MemStats instance).
* Extract the arena page pooling logic out of the allocator. It doesn't make sense to pool an allocator, it has very little state that can actually be reused and everyting else (including MemStats) needs to be reset on reuse. What really needs to be pooled is just a page of memory.
Since this was touching allocation code the opportunity has been used to perform additional cleanup:
* Remove unnecessary LSRA ListElementAllocator
* Remove compGetMem and compGetMemArray
* Make CompAllocator and HostAllocator more like the std allocator
* Update HashTable to use CompAllocator
* Update ArrayStack to use CompAllocator
* Move CompAllocator & friends to alloc.h
|
|
|
|
- Add default key infos for int/unsigned
- Call compGetMem through a forward-declared wrapper to avoid circular
dependency problems if Compiler itself (or one of its dependencies)
uses a SmallHashTable
|
|
These changes implement the design for RyuJIT's LIR described in https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/removing-embedded-statements.md.
The following passes required changes:
Rationalize, which has been almost completely rewritten
Long decomposition
Target-independent lowering
Target-dependent lowering
LSRA
Liveness
Flowgraph optimization
Codegen
For the most part, these changes are confined to the backend. Common code that needed to be updated included liveness, flowgraph optimization, and a few miscellaneous utilities.
The utilities used to analyze and manipulate LIR live (almost) entirely in src/jit/lir.{cpp,h}. The core concepts that are unique to LIR are LIR::Use and LIR::Range. The latter is a tuple that captures an SDSU def (i.e. an LIR node) and its corresponding use->def edge and user. The former serves to abstract a self-contained sequence of LIR nodes that make up e.g. the contents of a basic block.
Testing indicates that neither JIT throughput nor code quality are significantly impacted by these changes.
|
|
This change is the result of running clang-tidy and clang-format on jit
sources.
|
|
LSRA currently uses a stack to find the `LocationInfo` for each register consumed
by a node. The changes in this stack's contents given a particular node are
governed by three factors:
- The number of registers the node consumes (`gtLsraInfo.srcCount`)
- The number of registers the node produces (`gtLstaInfo.dstCount`)
- Whether or not the node produces an unused value (`gtLsraInfo.isLocalDefUse`)
In all cases, `gtLsraInfo.srcCount` values are popped off of the stack in the
order in which they were pushed (i.e. in FIFO rather than LIFO order). If the
node produces a value that will be used, `gtLsraInfo.dstCount` values are
then pushed onto the stack. If the node produces an unused value, nothing is
pushed onto the stack.
Naively, it would appear that the number of registers a node consumes would be
at least the count of the node's non-leaf operands (to put it differently, one
might assume that any non-leaf operator that produces a value would define at
least one register). However, contained nodes complicate the situation: because
a contained node's execution is subsumed by its user, the contained node's
sources become sources for its user and the contained node does not define any
registers. As a result, both the number of registers consumed and the number of
registers produced by a contained node are 0. Thus, contained nodes do not
update the stack, and the node's parent (if it is not also contained) will
pop the values produced by the contained node's operands. Logically speaking,
it is as if a contained node defines the transitive closure of the registers
defined by its own non-contained operands.
The use of the stack relies on the property that even in linear order the
JIT's IR is still tree ordered. That is to say, given an operator and its
operands, any nodes that execute between any two operands do not produce
SDSU temps that are consumed after the second operand. IR with such a
shape would unbalance the stack.
The planned move to the LIR design given in #6366 removes the tree order
constraint in order to simplify understanding and manipulating the IR in the
backend. Because LIR may not be tree ordered, LSRA may no longer use a stack
to find the `LocationInfo` for a node's operands. This change replaces the
stack with a map from nodes to lists of `LocationInfo` values, each of
which describes a register that is logically defined (if not physically
defined) by that node. Only contained nodes logically define registers that
they do not physically define: contained nodes map to the list of
`LocationInfo` values logically defined by their operands. All non-contained
nodes map to the list of `LocationInfo` values that they physically define.
Non-contained nodes that do not define any registers are not inserted into
the map.
|