Age | Commit message (Collapse) | Author | Files | Lines |
|
* Fix contained LEA handling
This adds an LEA case to both `LinearScan::BuildOperandUses` and `CodeGen::genConsumeRegs`.
Fix #25039
|
|
* Fix Arm64 UpperVector save/restore
Change the general handling of end-of-block restores so that we always have a RefPosition on which to allocate the register needed on Arm64.
Fix #23885
|
|
* Support for Arm64 Vector ABI
Extend HFA support to support vectors as well as floating point types.
This requires that the JIT recognize vector types even during crossgen,
so that the ABI is supported consistently.
Also, fix and re-enable the disabled Arm64 Simd tests.
Fix #16022
|
|
Improve Upper Vector Save/Restore
In order to avoid saving and restore the upper half of large vectors around every
call even if they are not used, separately model the upper half of large vector
lclVars, and track whether the large vector lclVar is partially-spilled, in which
case its upper half resides in its upper half Interval's location.
Fix #18144
|
|
* LSRA cleanup
These are zero-diff changes. Some cleanup, some in preparation for improvemetns to save/restore of upper vectors.
|
|
* Handle addressing modes for HW intrinsics
Also, eliminate some places where the code size estimates were over-estimating.
Contribute to #19550
Fix #19521
|
|
* Fix min-opts spill of tree temp large vectors
Even if we're not enregistering local vars, we may have large vectors live across a call that need to be spilled.
Fix #22200
|
|
Add a new marker instruction that we emit once we've enabled preepmtive gc in
the inline pinvoke method prolog. Use that to kill off callee saves registers
with GC references, instead of waiting until the call.
This closes a window of vulnerability we see in GC stress where if a stress
interrupt happens between the point at which we enable preeemptive GC and
the point at which we make the call, we may report callee saves as GC live
when they're actually dead.
Closes #19211.
|
|
* Spill tree temp large vectors around calls
The code was there to handle lclVars, but there was an implicit assumption that non-lclVar large vector tree temps would never be live across a call.
Fixes the GitHub_20657 failure in #22253
|
|
* Propagate preferences
Instead of selecting a single relatedInterval for preferencing,
follow the chain of relatedIntervals (preferenced intervals).
Change when preferences are propagated to the relatedInterval;
do it in allocateRegisters, so that we can update it when we see a last use.
Also tune when and how intervals are preferenced, including allowing multiple
uses on an RMW node to have the target as their preference.
Fixes #11463
Contributes to #16359
|
|
* Fix the issue.
* Fix the comment.
* update the comment
|
|
* Add a repro test.
* Fix the issue.
* Add comments
|
|
* Fix the strange `ifdef ` placement.
* Fix comments/refactoring of `LinearScan::BuildReturn`.
* Delete `FixupIfSIMDLocal`.
Do not change `LCL_FLD(long)` back to `LCL_VAR(simd8)`,
|
|
A multireg COPY will only have a valid register for indices that require copying. Thus, the `GetRegCount` method must return the highest index that has a valid register.
Fix #20063
|
|
Fix #19448
|
|
On x86, `MUL_LONG` wasn't considered a multi-reg node, as it should be, so that when it gets spilled or copied, the additional register will be correctly handled.
Also, the ARM and X86 versions of genStoreLongLclVar should be identical and shared (neither version were handling the copy of a `MUL_LONG`).
Finally, fix the LSRA dumping of multi-reg nodes.
Fix #19397
|
|
We use the following format when print the BasicBlock number: bbNum
This define is used with string concatenation to put this in printf format strings
|
|
This is a preparatory change for auditing and controlling how local
variable ref counts are observed and manipulated.
See #18969 for context.
No diffs seen locally. No TP impact expected.
There is a small chance we may see some asserts in broader testing
as there were places in original code where local ref counts were
incremented without checking for possible overflows. The new APIs
will assert for overflow cases.
|
|
RCX must be explicitly killed. Otherwise, if there's a case of a def/use conflict - as in this test case where the shift amount is defined by a divide that must go in RAX, it won't be explicitly assigned to RCX,.
Also, the handling of conflicts must not use the register assignment of the def on the use if it conflicts with the use register requirements, and vice versa.
Fix #18884
|
|
`BuildUse` was setting the regNumber for all of the uses of a multi-reg node to the main/first regNum. This was missed because this results in a def/use conflict on that reg (the def was set correctly), which is generally resolved in favor of the def. The exception is when there is a kill of the register in between, in which case the use tries to allocate the register its been assigned, causing the `farthestRefPhysRegRecord != nullptr` assert (aka "a register can't be found to spill").
This fixes the specific issue, and adds additional asserts to identify future/additional such issues.
The new asserts depend upon all the regNums being appropriately when/if any are set, which wasn't always the case prior to register allocation.
Fix #18153
|
|
Passing CompAllocator objects by value is advantageous because it no longer needs to be dynamically allocated and cached. CompAllocator instances can now be freely created, copied and stored, which makes adding new CompMemKind values easier.
Together with other cleanup this also improves memory allocation performance by removing some extra levels of indirection that were previously required - jitstd::allocator had a pointer to CompAllocator, CompAllocator had a pointer to Compiler and Compiler finally had a pointer to ArenaAllocator. Without MEASURE_MEM_ALLOC enabled, both jitstd::allocator and CompAllocator now just contain a pointer to ArenaAllocator. When MEASURE_MEM_ALLOC is enabled CompAllocator also contains a pointer but to a MemStatsAllocator object that holds the relevant memory kind. This way CompAllocator is always pointer sized so that enabling MEASURE_MEM_ALLOC does not result in increased memory usage due to objects that store a CompAllocator instance.
In order to implement this, 2 additional signficant changes have been made:
* MemStats has been moved to ArenaAllocator, it's after all the allocator's job to maintain statistics. This also fixes some issues related to memory statistics, such as not tracking the memory allocated by the inlinee compiler (since that one used its own MemStats instance).
* Extract the arena page pooling logic out of the allocator. It doesn't make sense to pool an allocator, it has very little state that can actually be reused and everyting else (including MemStats) needs to be reset on reuse. What really needs to be pooled is just a page of memory.
Since this was touching allocation code the opportunity has been used to perform additional cleanup:
* Remove unnecessary LSRA ListElementAllocator
* Remove compGetMem and compGetMemArray
* Make CompAllocator and HostAllocator more like the std allocator
* Update HashTable to use CompAllocator
* Update ArrayStack to use CompAllocator
* Move CompAllocator & friends to alloc.h
|
|
Need to build a use for each reg.
Also, dump the defList if it's not empty at end of block.
|
|
* An UnusedValue still requires a target reg
The BuildSimple method wasn't creating a def for an unused value. Although (in this case) the code is dead, the code generator must still be able to generate code for it.
* Add test case for #18295 to arm/arm64 tests.lst
|
|
* Unify struct arg handling
Eliminate unnecessary struct copies, especially on Linux, and reduce code duplication.
Across all targets, use GT_FIELD_LIST to pass promoted structs on stack, and avoid
requiring a copy and/or marking `lvDoNotEnregister` for those cases.
Unify the specification of multi-reg args:
- numRegs now indicates the actual number of reg args (not the size in pointer-size units)
- regNums contains all the arg register numbers
|
|
|
|
|
|
* Create RefPositions without TreeNodeInfo
* Remove all references to TreeNodeInfo
* Fix function header comments
|
|
Remove JIT LEGACY_BACKEND code
All code related to the LEGACY_BACKEND JIT is removed. This includes all code related to x87 floating-point code generation. Almost 50,000 lines of code have been removed.
Remove legacyjit/legacynonjit directories
Remove reg pairs
Remove tiny instruction descriptors
Remove compCanUseSSE2 (it's always true)
Remove unused FEATURE_FP_REGALLOC
|
|
It should really only be a fixed reference, not a kill, but if the reference is changed by `LinearScan::resolveConflictingDefAndUse()` it can fail to cause the value in EDI to be killed.
Fix #17634
|
|
Eliminate `FEATURE_UNIX_AMD64_STRUCT_PASSING` and replace it with `UNIX_AMD64_ABI` when used alone. Both are currently defined; it is highly unlikely the latter will work alone; and it significantly clutters up the code, especially the JIT.
Also, fix the altjit support (now `UNIX_AMD64_ABI_ITF`) to *not* call `ClassifyEightBytes` if the struct is too large. Otherwise it asserts.
|
|
LSRA: remove last uses only at use point
|
|
LSRA maintains liveness within a block to determine what's live across a call. It uses the last use bits on lclVar nodes to remove them from the set. However, this should be done at the point of use rather than at the point where the lclVar is encountered in the execution stream.
Fix #17389
|
|
The JIT write barrier helpers have a custom calling convention that
avoids killing most registers. The JIT was not taking advantage of
this, and thus was killing unnecessary registers when a write barrier
was necessary. In particular, some integer callee-trash registers
are unaffected by the write barriers, and no floating-point register
is affected.
Also, I got rid of the `FEATURE_WRITE_BARRIER` define, which is always
set. I also put some code under `LEGACY_BACKEND` for easier cleanup
later. I removed some unused defines in target.h for some platforms.
|
|
Consider specialPutArgs for jitStressRegs
|
|
LoadAlignedVector128 as contained.
|
|
When we have a lclVar that is being kept alive between its `PUTARG_REG` and the call, we need to take that into account in determining the minimum register requirement for a node.
|
|
Move code for building `RefPosition`s and `Interval`s out of lsra.cpp into lsrabuild.cpp. Also, move common code from lsraarm*.cpp and lsraxarch.cpp to lsrabuild.cpp.
Maintain the `ListNodePool` on the `LinearScan` object to be used by all the building methods.
Rename `TreeNodeInfoInit` methods to `Build`, to more accurately reflect the next round of changes where they will build the `RefPosition`s directly.
|