Age | Commit message (Collapse) | Author | Files | Lines |
|
Currently, all frame types place saved FP/LR at low addresses on the
frame, below the GS cookie. If a function has localloc, the dynamically
allocate and unsafe buffer will be lower than the saved FP/LR and
not the GS cookie won't properly protect the saved FP/LR.
This change introduces new frame types, used only for functions needing
a GS cookie and using localloc, saving FP/LR along with the rest of
the callee-saved registers at the top (highest addresses) of the frame,
above the GS cookie.
|
|
This change improves detection of allocators with side effects.
Allocators can cause side effects if the allocated object may have a finalizer.
This change adds a pHasSideEffects parameter to getNewHelper JitEE interface
method. It's used by the jit to check for allocator side effects instead of
guessing from helper ids.
Fixes #21530.
|
|
Passing CompAllocator objects by value is advantageous because it no longer needs to be dynamically allocated and cached. CompAllocator instances can now be freely created, copied and stored, which makes adding new CompMemKind values easier.
Together with other cleanup this also improves memory allocation performance by removing some extra levels of indirection that were previously required - jitstd::allocator had a pointer to CompAllocator, CompAllocator had a pointer to Compiler and Compiler finally had a pointer to ArenaAllocator. Without MEASURE_MEM_ALLOC enabled, both jitstd::allocator and CompAllocator now just contain a pointer to ArenaAllocator. When MEASURE_MEM_ALLOC is enabled CompAllocator also contains a pointer but to a MemStatsAllocator object that holds the relevant memory kind. This way CompAllocator is always pointer sized so that enabling MEASURE_MEM_ALLOC does not result in increased memory usage due to objects that store a CompAllocator instance.
In order to implement this, 2 additional signficant changes have been made:
* MemStats has been moved to ArenaAllocator, it's after all the allocator's job to maintain statistics. This also fixes some issues related to memory statistics, such as not tracking the memory allocated by the inlinee compiler (since that one used its own MemStats instance).
* Extract the arena page pooling logic out of the allocator. It doesn't make sense to pool an allocator, it has very little state that can actually be reused and everyting else (including MemStats) needs to be reset on reuse. What really needs to be pooled is just a page of memory.
Since this was touching allocation code the opportunity has been used to perform additional cleanup:
* Remove unnecessary LSRA ListElementAllocator
* Remove compGetMem and compGetMemArray
* Make CompAllocator and HostAllocator more like the std allocator
* Update HashTable to use CompAllocator
* Update ArrayStack to use CompAllocator
* Move CompAllocator & friends to alloc.h
|
|
Remove JIT LEGACY_BACKEND code
All code related to the LEGACY_BACKEND JIT is removed. This includes all code related to x87 floating-point code generation. Almost 50,000 lines of code have been removed.
Remove legacyjit/legacynonjit directories
Remove reg pairs
Remove tiny instruction descriptors
Remove compCanUseSSE2 (it's always true)
Remove unused FEATURE_FP_REGALLOC
|
|
This ensures that it gets a low LclVar number so that we don't hit
the IMPL_LIMITATION associated with offsets > 255 for LclVar numbers above 32767
|
|
These need to be modified to work directly with JIT's allocator(s) instead of going through IAllocator. It may also be useful to adjust these to account for the fact that the JIT never releases memory.
Besides, the JIT is the primary user of these classes - only ExpandArray(Stack) isn't used anywhere else and SimplerHashTable's only other user is the gcinfo library.
Renamed headers and classes to avoid potential conflicts with the old ones.
Also made the JIT's hash table behavior the default to avoid the need to specify it in hash table instantiations.
|
|
Makes it easier to find unnecessary uses of IAllocator
|
|
It's only used for VarScopeListNode, doesn't offer any useful functionality to begin with but requires a forward declaration for CompAllocator.
|
|
|
|
|
|
For fixed out args ABIs, the jit currently computes the size of the
out arg area during morph, as it encounters calls that must pass arguments
via the stack.
Subsequently, the optimizer may delete calls, either because they are
pure and the results are unused, or because they are conditional and the
guarding predicates are resolved by the optimizer in such a way that
the call is no longer reachable.
In particular if all the calls seen by morph are subsequently removed by
the optimizer, the jit may generate a stack frame that it never uses and
doesn't need. If only some calls are removed then the stack frame may end
up larger than necessary.
One motivating example is the inlining of a shared generic method that
ignores its generic context parameter. The caller typically must invoke a pure
helper to determine the proper argument to pass. Post inline the call's result
is unused, and the helper call is deleted by the optimizer.
This change defers the outgoing arg size computation until fgSimpleLowering,
which runs after optimization. The code in morph now simply records the
needed size for each call in the associated fgArgInfo.
As before, the outgoing arg size computation ignores fast tail calls, since
they can use the caller-supplied scratch area for memory arguments.
The jit may introduce helper calls after optimization which could seemingly
invalidate the out args area size. The presumption here is that these calls
are carefully designed not to use the caller-supplied scratch area. The
current code makes the same assumption.
This change introduces some unanticipated diffs. Optcse estimates the
frame size to decide how aggressive it should be about cses of local
variable references. This size estimate included the out args area, which
appears to be something of an oversight; there are no cse opportunities
in this area and the offsets of the other local variables are not impacted
by the size of this area. With this change the out args area size is seen
as zero during optimization and so the cse strategy for a method may be
different.
|
|
the math intrinsics.
|
|
This change is the result of running clang-tidy and clang-format on jit
sources.
|
|
LSRA currently uses a stack to find the `LocationInfo` for each register consumed
by a node. The changes in this stack's contents given a particular node are
governed by three factors:
- The number of registers the node consumes (`gtLsraInfo.srcCount`)
- The number of registers the node produces (`gtLstaInfo.dstCount`)
- Whether or not the node produces an unused value (`gtLsraInfo.isLocalDefUse`)
In all cases, `gtLsraInfo.srcCount` values are popped off of the stack in the
order in which they were pushed (i.e. in FIFO rather than LIFO order). If the
node produces a value that will be used, `gtLsraInfo.dstCount` values are
then pushed onto the stack. If the node produces an unused value, nothing is
pushed onto the stack.
Naively, it would appear that the number of registers a node consumes would be
at least the count of the node's non-leaf operands (to put it differently, one
might assume that any non-leaf operator that produces a value would define at
least one register). However, contained nodes complicate the situation: because
a contained node's execution is subsumed by its user, the contained node's
sources become sources for its user and the contained node does not define any
registers. As a result, both the number of registers consumed and the number of
registers produced by a contained node are 0. Thus, contained nodes do not
update the stack, and the node's parent (if it is not also contained) will
pop the values produced by the contained node's operands. Logically speaking,
it is as if a contained node defines the transitive closure of the registers
defined by its own non-contained operands.
The use of the stack relies on the property that even in linear order the
JIT's IR is still tree ordered. That is to say, given an operator and its
operands, any nodes that execute between any two operands do not produce
SDSU temps that are consumed after the second operand. IR with such a
shape would unbalance the stack.
The planned move to the LIR design given in #6366 removes the tree order
constraint in order to simplify understanding and manipulating the IR in the
backend. Because LIR may not be tree ordered, LSRA may no longer use a stack
to find the `LocationInfo` for a node's operands. This change replaces the
stack with a map from nodes to lists of `LocationInfo` values, each of
which describes a register that is logically defined (if not physically
defined) by that node. Only contained nodes logically define registers that
they do not physically define: contained nodes map to the list of
`LocationInfo` values logically defined by their operands. All non-contained
nodes map to the list of `LocationInfo` values that they physically define.
Non-contained nodes that do not define any registers are not inserted into
the map.
|
|
* Adding some basic System.Math performance tests.
* Renaming 'floatnative' to 'floatdouble'.
* Removing outdated workarounds in the floatdouble interop code.
* Renaming 'finite.cpp' to 'math.cpp'
* Updating the double-precision math tests.
* Updating PAL_EPSILON to be more precise.
|
|
Move CritSecObject into util.h, and use it to lock around reading
and writing inline Xml. Introduce CritSecHolder for RAII management
of the locks.
Add a simple file position cache for methods to speed up replay
when the inline xml file is large.
|
|
Rework and cleanup ConfigMethodRange. Give it a configurable size. Do
some error detection when parsing the range string. Update comments
and add notes at existing usage sites about the behavior when the
range string is not specified.
Use ConfigMethodRange to implement a JitNoInlineRange option that
suppresses inlining in a specified set of methods. Set this up so
inlning still happens if the range string is empty. Choose capacity so
it can hold the entire range string's set of ranges. Add a new
observation for the cases where enabling this JIT option disables
inlines.
Make compMethodHash available for INLINE_DATA builds. Report this hash
in the Xml inline dumps, so it can provide hash values to feed into
JitNoInlineRange strings.
|
|
Math.Round is implemented as a target intrinsic for Arm64.
JIT emitted `frinta` which rounds the ties to away.
As described in MSDN
https://msdn.microsoft.com/en-us/library/system.math.round(v=vs.110).aspx#Round2_Example,
we should round the ties to even.
So, I corrected `frintn` instruction instead for such intrinsic expansion.
I've also identified that `EvalMathFuncUnary` incorrectly folds the round
operation when the argument is constant at the compile time.
The logic itself is not the right semantic as described above in MSDN.
I made a separate helper function which is essentially a duplicate of
classlib. This is not Arm64 specific fix but applies to all other targets.
So, I just fixed it while I'm here.
|
|
A small amount of JIT code was using `_ASSERTE` instead of `assert`.
Change that code to use `assert` instead, which operates correctly
w.r.t. the JIT host.
|
|
Replace these with the corresponding `char*` type.
|
|
These behaviors override the default out-of-memory handling
s.t. it is appropriate for the JIT.
|
|
|
|
SuperPMI has been udpated to accommodate this change, and should no
longer crash when run using a JIT built with these changes.
[tfs-changeset: 1582144]
|
|
[tfs-changeset: 1578925]
|
|
These APIs accommodate the retrieval of config values using the JIT
interface rather than the utilcode library. All configuration options
are now initialized upon the first call to compileMethod. The values
of configuration options are available off of an ambient JitConfig
object.
This also changed `JitHost::get*ConfigValue` to use the
`EEConfig_default` policy instead of `REGUTIL_default` in order to
avoid breaking a small set of JIT config options available in release
builds that were using the former. This change is exceedingly
unlikely to adversely affect the behavior of other JIT config options
that were originally fetched using `REGUTIL_default`, since values
for these options should not be present any locations searched
by `EEConfig_default` that are not searched by
`REGUTIL_default` (namely config files).
[tfs-changeset: 1578859]
|
|
|
|
Successfully builds all binaries except sos.dll & x64 binaries
|
|
[tfs-changeset: 1407945]
|