Age | Commit message (Collapse) | Author | Files | Lines |
|
* Fix MEASURE_NODE_SIZE and naming mistakes.
* The additional fields were deleted in #14582 (~1.5 years ago).
* Fix GT_INDEX_ADDR def.
We created them as `new (this, GT_INDEX_ADDR) GenTreeIndexAddr` but used smaller `GenTreeIndex` as nessecary size.
* Use LargeOpOpcode instead of GT_CALL.
|
|
When a struct field is imported, its type needs to be normalized.
Also, the LHS of a struct init, even if a SIMD type, should not be transformed to a non-block node,
except in the case of a SIMD local, in which case it must be transformed to a simple assignment.
Also, add an assert to catch this kind of bug in liveness.
Fix #24336
|
|
|
|
- A GT_LCL_VAR may have a zeroOffset field
- Add an assert to prevent building field sequences with duplicates
- Fix fgMorphField when we have a zero offset field
Improve fgAddFieldSeqForZeroOffset
- Add JItDump info
- Handle GT_LCL_FLD
Changing the sign of an int constant also remove any field sequence information.
Added method header comment for fgAddFieldSeqForZeroOffset
Changed when we call fgAddFieldSeqForZeroOffset to be before the call to fgMorphSmpOp.
Prefer calling fgAddFieldSeqForZeroOffset() to GetZeroOffsetFieldMap()->Set()
|
|
Before implicit byrefs were tracked by setting lvIsParam and lvIsTemp.
This change explicitly adds a flag for implicitByRef instead of overloading.
In addition, it fixes the decision to copy an implicitByRef for arm64 varargs.
Temporarily bump weight on byref params to match old behavior and avoid codegen
diffs.
Re-enabled various tests and parts of tests.
Closes #20046
Closes #19860
|
|
* Support for Arm64 Vector ABI
Extend HFA support to support vectors as well as floating point types.
This requires that the JIT recognize vector types even during crossgen,
so that the ABI is supported consistently.
Also, fix and re-enable the disabled Arm64 Simd tests.
Fix #16022
|
|
* Extract `impAppendStmt` and `impExtractLastStmt`.
* Delete `BEG_STMTS` fake stmt.
Use new functions to keep the list updated.
* Retype `impTreeList` and `impTreeLast` as statements.
Rename `impTreeList` and `impTreeLast` to show that they are statements.
* Fix fields that have to be stmt.
* Start using GenTreeStmt.
Change `optVNAssertionPropCurStmt` to use GenTreeStmt.
Replace `GenTree* stmt = block->bbTreeList` with `GenTreeStmt* stmt = block->firstStmt()`.
Save results of `FirstNonPhiDef` as `GenTreeStmt`.
* Replace do-while with for loop.
* Change type inside VNAssertionPropVisitorInfo.
* Delete unused args fron `optVNConstantPropOnTree`.
* Update fields to be stmt.
Update optVNConstantPropCurStmt to use Stmt.
Change `lvDefStmt` to stmt.
Update LoopCloning structs.
Update `optDebugLogLoopCloning`.
Make `compCurStmt` a statement.
Update declaration name in `BuildNode`.
* Clean simple cpp files.
Clean valuenum.
Clean ssabuilder.
Clean simd.
Clean optcse.
Clean loopcloning.
Clean copyprop.
Clean optimizer part1.
* Start cleaning importer, morph, flowgraph, gentree.
* Continue clean functons.
Clean assertionprop.
Clean morph.
Clean gentree.
Clean flowgraph.
Clean compiler.
Clean rangecheck.
Clean indirectcalltransofrmer.
Clean others.
* Create some temp stmt.
* Delete unnecessary noway_assert and casts.
* Init `impStmtList` and `impLastStmt` in release.
* Response review 1.
|
|
|
|
When transferring a Zero offset from one GenTree node to another, we need to check if there already is a FieldSeq and append to it.
Added third parameter 'kind' to JitHashTable::Set, and Added enum SetKind
Only allow Set to overwrite an existing entry when kind is set to Overwrite.
Added validation for all calls to JitHashTable::Set
asserting that we don't expect the key to already exist or that we passed Overwrite indicating that we expect to handle it properly.
Added two test cases for Issue 21231
|
|
Add a new marker instruction that we emit once we've enabled preepmtive gc in
the inline pinvoke method prolog. Use that to kill off callee saves registers
with GC references, instead of waiting until the call.
This closes a window of vulnerability we see in GC stress where if a stress
interrupt happens between the point at which we enable preeemptive GC and
the point at which we make the call, we may report callee saves as GC live
when they're actually dead.
Closes #19211.
|
|
IsVarAddr was checking GTF_ADDR_ONSTACK to determine if
the GT_ADDR node is an address of a local. This change removes both
GTF_ADDR_ONSTACK and IsVarAddr and uses IsLocalAdrExpr instead.
IsLocalAddrExpr uses opcodes to determine if GT_ADDR node is
a local address.
GTF_ADDR_ONSTACK flag is ancient, added before 2002 so I couldn't find
the checkin that introduced it.
I changed the assert to a check and an assignment since simplifications
inside fgMorphArgs between
https://github.com/dotnet/coreclr/blob/1a1e4c4d5a8030cb8d82a2e5b06c2ab357b92534/src/jit/morph.cpp#L3709
(which causes https://github.com/dotnet/coreclr/blob/1a1e4c4d5a8030cb8d82a2e5b06c2ab357b92534/src/jit/morph.cpp#L3057)
and
https://github.com/dotnet/coreclr/blob/1a1e4c4d5a8030cb8d82a2e5b06c2ab357b92534/src/jit/morph.cpp#L3790
may result in more GT_ADDR nodes recognized by IsLocalAdrExpr.
x86 and x64 pmi frameworks had no code diffs and some gcinfo reductions
(15 methods with gcinfo diffs in x86).
Fixes #22190.
|
|
Some IR cleanup
|
|
Add methods that answer the general question of whether or not
the jit is optimizing the code it produces.
Use this to replace composite checks for minopts and debug
codegen (the two modes where the jit is not optimizing).
|
|
This change improves detection of allocators with side effects.
Allocators can cause side effects if the allocated object may have a finalizer.
This change adds a pHasSideEffects parameter to getNewHelper JitEE interface
method. It's used by the jit to check for allocator side effects instead of
guessing from helper ids.
Fixes #21530.
|
|
Fix arm32 local variable references
|
|
This change modified the importer to create GenTreeAllocObj node for
box and newobj instead of a helper call in R2R mode. ObjectAllocator phase
decides whether the object can be allocated on the stack or has to be created
on the heap via a helper call.
To trigger object stack allocation COMPlus_JitObjectStackAllocation has
to be set (it's not set by default).
|
|
Issue #18201 / Hackathon
|
|
Arm32 has different addressing mode offset ranges for floating-point
and integer instructions. In addition, the ranges aren't too large.
So in functions with a frame pointer, we try to access some variables
using the frame pointer and some with the stack pointer, to expand the
total number of variables we can access without allocating a "reserved
register" just used for constructing large offsets.
This calculation was incorrect for struct variables that contained floats,
as float fields require calculating using the floating point range, but we
were calculating using the variable type (struct), instead of the instruction
type (floating-point). In addition, we were not correctly calculating the
frame pointer range using the actual variable offset plus "within variable"
offset (struct member offset).
Added a test that covers some of these cases.
Fixes #19537
|
|
With this change, the JIT will recognize a call to BinaryPrimitives.ReverseEndianness and will emit a bswap instruction.
This logic is currently only hooked up for x86 and x64; ARM still uses fallback logic.
If the JIT can't emit a bswap instruction (for example, trying to emit a 64-bit bswap in a 32-bit process), it will fall back to a software implementation, so the APIs will work across all architectures.
|
|
This is only used to hold a pointer to a BasicBlock for GenTreeBoundsChk and GenTreeIndexAddr. This doesn't serve any purpose and it does not behave like a real operand (e.g. it's not included in the linear order).
|
|
It is always defined and disabling it isn't really an option - making everything "large" would require more memory and some code (e.g. gtCloneExpr) seems to be broken without SMALL_TREE_NODES.
|
|
When compInitMem is true long-lifetime structs (i.e., the ones with lvIsTemp set to false)
are zero-initialized in the prolog: https://github.com/dotnet/coreclr/blob/c8a63947382b0db428db602238199ca81badbe8e/src/jit/codegencommon.cpp#L4765
Therefore, these structs don't need an explicit zero-initialization in blocks that are not in a loop.
|
|
Closes #20651.
Also fix up some "near miss" cases for GenTreeField and GenTreeBoundsCheck,
where we get lucky and the importer currently splits trees with temps so the
currently ignored child nodes have no interesting side effects.
Revise GenTreeField a bit to pull more of the initialization work into the
constructor. Add a missing R2R field propagation for field nodes in GtClone
(evidently also never hit in practice).
|
|
Compiler temps are created with a "reason" that is dumped in JitDump.
Save the reason and display it in the local variable table dump.
This is useful when trying to find a particular temp and see what
code has been generated using it.
|
|
* Delete dead code
optFindLocalInit and related functions (optIsTrackedLocal, lvaLclVarRefs, lvaLclVarRefsAccumIntoRes, lvaLclVarRefsAccum) are not used anywhere.
Also delete a bunch of undefined function declarations.
* Cleanup DataFlow callback comment
|
|
GTF_IND_ARR_LEN was set by the importer in minopts/debug mode and used only by value numbering, which does not run in minopts/debug mode.
GTF_FLD_NULLCHECK was also set by the importer and not used anywhere. fgMorphField has its own opinion about when an explicit null check is needed.
|
|
|
|
This function is relatively expensive due to the many checks it does. Adding an LclVarDsc "in SSA" bit that is set during SSA construction by calling fgExcludeFromSsa only once per variable results in 0.35% drop in instructions retired.
Most of the checks done in fgExcludeFromSsa are implied by lvTracked and they could probably be converted to asserts. But lvOverlappingFields is not implied by lvTracked so even if all redundant checks are converted to asserts fgExcludeFromSsa still needs 2 checks rather than just one.
Incidentally, this difference between tracked variables and SSA variables results in SSA and value numbers being assigned to some variables that are actually excluded from SSA - SsaBuilder::RenameVariables and fgValueNumber assign numbers to all live in fgFirstBB variables that require initialization without checking fgExcludeFromSsa first. Structs with overlapping fields are not common but properly excluding them is still enough to save 0.15% memory when compiling corelib.
- Replace calls to fgExcludeFromSsa with calls to lvaInSsa (the old name is kind of weird, it has nothing to do with the flow graph and "exclude" results in double negation)
- Move fgExcludeFromSsa logic to SsaBuild::IncludeInSsa and use it to initialize LclVarDsc::lvInSsa for all variables
- Change RenameVariables and fgValueNumber to call lvaInSsa before assigning numbers to live in fgFirstBB variables
|
|
Eliminate duplicate SSA number bookkeeping
|
|
Remove almost all of the code in the jit that tries to maintain local ref
counts incrementally. Also remove `lvaSortAgain` and related machinery.
Explicitly sort locals before post-lower-liveness when optimizing to get the
best set of tracked locals.
Explicitly recount after post-lower liveness to get accurate counts after
dead stores. This can lead to tracked unreferenced arguments; tolerate this
during codegen.
|
|
If the jit has started normal ref counting and is in minopts or debug,
set all new temps to be implicitly referenced by default.
Closes #19346.
|
|
For minopts and debug codegen, consider all locals to be implicitly
referenced.
This is set up during `lvaMarkLocalVars` and maintained after that
by having `incRefCnts` set the implicit reference bit and not doing
anything in `decRefCnts` for minopts / debug.
Likewise suppress local var sorting, as we don't have accurate counts
to go by.
|
|
Consolidate various compiler globals used when setting local var ref
counts by folding them into the visitor:
* lvaMarkRefsCurBlock
* lvaMarkRefsCurStmt
* lvaMarkRefsWeight
Remove the largely vestigial `lvPrefReg` and associated methods to set
or modify this field. Haven't verified but this is likely a remmant of
the legacy backend.
In the one remaning use (lcl var sorting predicates), swap in `lvIsRegArg`
instead, which gets most of the same cases.
|
|
Introduce a notion of state for local var ref counts and weighted ref counts.
Accesses and current state must agree.
State is invalid initially, enabled for an early period around bits of morph,
invalid again for a time, and then enabled normally once lvaMarkRefs is called.
Accesses normally specify RCS_NORMAL as the desired state, but in the accesses
of selected ref counts in morph, specify RCS_EARLY.
Revise how we decide if normal ref counting is active by changing
`lvaLocalVarRefCounted` into a method.
Update `gtIsLikelyRegVar` to not access ref counts when they're not in a valid
state.
Change weight APIs over to use `weight_t`.
|
|
Instead of relying on ref count bumps, add a new attribute bit to local
vars to indicate that they may have implicit references (prolog, epilog,
gc, eh) and may not have any IR references.
Use this attribute bit to ensure that the ref count and weighted ref count for
such variables are never reported as zero, and as a result that these variables
end up being allocated and reportable.
This is another preparatory step for #18969 and frees the jit to recompute
explicit ref counts via an IR scan without having to special case the counts
for these variables.
The jit can no longer describe implicit counts other than 1 and implicit weights
otehr than BB_UNITY_WEIGHT, but that currently doesn't seem to be very important.
The new bit fits into an existing padding void so LclVarDsc remains at 128 bytes
(for windows x64).
|
|
During SSA construction SsaRenameState keeps a definition count for each variable in an array. But each variable has a lvPerSsaData array that does almost the same thing, count the definitions. "Almost" because lvPerSsaData is a JitExpandArray that tracks only the array size and not the number of elements actually stored in the array.
Replace JitExpandArray with purposely designed "array" that is in charge with allocating new SSA numbers and handles their intricacies - RESERVED_SSA_NUM, UNINIT_SSA_NUM and FIRST_SSA_NUM.
This also allows the removal of the allocator from the array. Allocating new SSA numbers happens only during SSA construction and it's reasonable to pass the allocator to AllocSsaNum rather than increasing the size of PerSsaArray and LclVarDsc.
|
|
This is a preparatory change for auditing and controlling how local
variable ref counts are observed and manipulated.
See #18969 for context.
No diffs seen locally. No TP impact expected.
There is a small chance we may see some asserts in broader testing
as there were places in original code where local ref counts were
incremented without checking for possible overflows. The new APIs
will assert for overflow cases.
|
|
* Enable genFnCalleeRegArgs for Arm64 Varargs
Before the method would early out and incorrectly expect the usage
of all incoming arguments to be their homed stack slots. It is
instead possible for incoming arguments to be homed to different
integer registers.
The change will mangle the float types for vararg cases in the same
way that is done during lvaInitUserArgs and fgMorphArgs.
* Apply format patch
* Account for softfp case
* Address feedback
* Apply format patch
* Use standard function header for mangleVarArgsType
* Remove confusing comment
|
|
Temporaries are only used during register allocation and code generation. They waste space (136 bytes) in the compiler object during inlining.
|
|
Passing CompAllocator objects by value is advantageous because it no longer needs to be dynamically allocated and cached. CompAllocator instances can now be freely created, copied and stored, which makes adding new CompMemKind values easier.
Together with other cleanup this also improves memory allocation performance by removing some extra levels of indirection that were previously required - jitstd::allocator had a pointer to CompAllocator, CompAllocator had a pointer to Compiler and Compiler finally had a pointer to ArenaAllocator. Without MEASURE_MEM_ALLOC enabled, both jitstd::allocator and CompAllocator now just contain a pointer to ArenaAllocator. When MEASURE_MEM_ALLOC is enabled CompAllocator also contains a pointer but to a MemStatsAllocator object that holds the relevant memory kind. This way CompAllocator is always pointer sized so that enabling MEASURE_MEM_ALLOC does not result in increased memory usage due to objects that store a CompAllocator instance.
In order to implement this, 2 additional signficant changes have been made:
* MemStats has been moved to ArenaAllocator, it's after all the allocator's job to maintain statistics. This also fixes some issues related to memory statistics, such as not tracking the memory allocated by the inlinee compiler (since that one used its own MemStats instance).
* Extract the arena page pooling logic out of the allocator. It doesn't make sense to pool an allocator, it has very little state that can actually be reused and everyting else (including MemStats) needs to be reset on reuse. What really needs to be pooled is just a page of memory.
Since this was touching allocation code the opportunity has been used to perform additional cleanup:
* Remove unnecessary LSRA ListElementAllocator
* Remove compGetMem and compGetMemArray
* Make CompAllocator and HostAllocator more like the std allocator
* Update HashTable to use CompAllocator
* Update ArrayStack to use CompAllocator
* Move CompAllocator & friends to alloc.h
|
|
When addressing a local with negative offsets from R11 (if it can't
be reached from SP), allow the full range of negative offsets allowed
in the instructions. Floating-point load/store especially has a much
bigger range than what was previously allowed.
|
|
Use sp-based offset only if r10 reserved or offset is lower than
encoding limit.
|
|
* [ARM64|Windows|Vararg] Add FEATURE_ARG_SPLIT
Enable splitting >8 byte <= 16 byte structs for arm64 varargs
between x7 and virtual stack slot 0.
* Force notHfa for vararg methods
* Correctly pass isVararg
* Correct var name
|
|
* Fix passing HFA of two floats to vararg methods
Previously, the type would be reported as HFA and enregistered; however,
this is not correct, as arm64 varargs abi requires passing using
int registers.
* Address linux build issue
* Apply final format patch
* Add _TARGET_WINDOWS_
|
|
Remove JIT LEGACY_BACKEND code
All code related to the LEGACY_BACKEND JIT is removed. This includes all code related to x87 floating-point code generation. Almost 50,000 lines of code have been removed.
Remove legacyjit/legacynonjit directories
Remove reg pairs
Remove tiny instruction descriptors
Remove compCanUseSSE2 (it's always true)
Remove unused FEATURE_FP_REGALLOC
|
|
* move compUpdateLifeVar and compUpdateTreeLife to separate files for legacy and non-legacy case
* create TreeLifeUpdater class
|
|
Eliminate `FEATURE_UNIX_AMD64_STRUCT_PASSING` and replace it with `UNIX_AMD64_ABI` when used alone. Both are currently defined; it is highly unlikely the latter will work alone; and it significantly clutters up the code, especially the JIT.
Also, fix the altjit support (now `UNIX_AMD64_ABI_ITF`) to *not* call `ClassifyEightBytes` if the struct is too large. Otherwise it asserts.
|
|
|
|
* Add FEATURE_CROSSBITNESS in crosscomponents.cmake
* Exclude mscordaccore mscordbi sos from CLR_CROSS_COMPONENTS_LIST when FEATURE_CROSSBITNESS is defined in crosscomponents.cmake
* Introduce target_size_t in src/jit/target.h
* Use size_t value in genMov32RelocatableImmediate in src/jit/codegen.h src/jit/codegencommon.cpp
* Fix definition/declaration inconsistency for emitter::emitIns_R_I in emitarm.cpp
* Zero HiVal when GetTree::SetOper GenTreeLngCon->GetTreeIntCon in src/jit/compiler.hpp
* Explicity specify roundUp(expr, TARGET_POINTER_SIZE)
* Use target_size_t* target in emitOutputDataSec in src/jit/emit.cpp
|
|
Fix inconsistent handling of zero extending casts
|