Age | Commit message (Collapse) | Author | Files | Lines |
|
* Fix MEASURE_NODE_SIZE and naming mistakes.
* The additional fields were deleted in #14582 (~1.5 years ago).
* Fix GT_INDEX_ADDR def.
We created them as `new (this, GT_INDEX_ADDR) GenTreeIndexAddr` but used smaller `GenTreeIndex` as nessecary size.
* Use LargeOpOpcode instead of GT_CALL.
|
|
* Extract `impAppendStmt` and `impExtractLastStmt`.
* Delete `BEG_STMTS` fake stmt.
Use new functions to keep the list updated.
* Retype `impTreeList` and `impTreeLast` as statements.
Rename `impTreeList` and `impTreeLast` to show that they are statements.
* Fix fields that have to be stmt.
* Start using GenTreeStmt.
Change `optVNAssertionPropCurStmt` to use GenTreeStmt.
Replace `GenTree* stmt = block->bbTreeList` with `GenTreeStmt* stmt = block->firstStmt()`.
Save results of `FirstNonPhiDef` as `GenTreeStmt`.
* Replace do-while with for loop.
* Change type inside VNAssertionPropVisitorInfo.
* Delete unused args fron `optVNConstantPropOnTree`.
* Update fields to be stmt.
Update optVNConstantPropCurStmt to use Stmt.
Change `lvDefStmt` to stmt.
Update LoopCloning structs.
Update `optDebugLogLoopCloning`.
Make `compCurStmt` a statement.
Update declaration name in `BuildNode`.
* Clean simple cpp files.
Clean valuenum.
Clean ssabuilder.
Clean simd.
Clean optcse.
Clean loopcloning.
Clean copyprop.
Clean optimizer part1.
* Start cleaning importer, morph, flowgraph, gentree.
* Continue clean functons.
Clean assertionprop.
Clean morph.
Clean gentree.
Clean flowgraph.
Clean compiler.
Clean rangecheck.
Clean indirectcalltransofrmer.
Clean others.
* Create some temp stmt.
* Delete unnecessary noway_assert and casts.
* Init `impStmtList` and `impLastStmt` in release.
* Response review 1.
|
|
* fix implicit constructor call
* extern c
format patch
* muti-line
* Remove direct constructor call
* Conversion
* Need paranthesis
* Return value on resize
* declspec(Thread)
* Ignore warnings for GCC
* Formatting issues
* Move cast to constant
|
|
Add a new marker instruction that we emit once we've enabled preepmtive gc in
the inline pinvoke method prolog. Use that to kill off callee saves registers
with GC references, instead of waiting until the call.
This closes a window of vulnerability we see in GC stress where if a stress
interrupt happens between the point at which we enable preeemptive GC and
the point at which we make the call, we may report callee saves as GC live
when they're actually dead.
Closes #19211.
|
|
Some IR cleanup
|
|
Issue #18201 / Hackathon
|
|
With this change, the JIT will recognize a call to BinaryPrimitives.ReverseEndianness and will emit a bswap instruction.
This logic is currently only hooked up for x86 and x64; ARM still uses fallback logic.
If the JIT can't emit a bswap instruction (for example, trying to emit a 64-bit bswap in a 32-bit process), it will fall back to a software implementation, so the APIs will work across all architectures.
|
|
This is only used to hold a pointer to a BasicBlock for GenTreeBoundsChk and GenTreeIndexAddr. This doesn't serve any purpose and it does not behave like a real operand (e.g. it's not included in the linear order).
|
|
On x86, `MUL_LONG` wasn't considered a multi-reg node, as it should be, so that when it gets spilled or copied, the additional register will be correctly handled.
Also, the ARM and X86 versions of genStoreLongLclVar should be identical and shared (neither version were handling the copy of a `MUL_LONG`).
Finally, fix the LSRA dumping of multi-reg nodes.
Fix #19397
|
|
* [ARM64|Windows|Vararg] Add FEATURE_ARG_SPLIT
Enable splitting >8 byte <= 16 byte structs for arm64 varargs
between x7 and virtual stack slot 0.
* Force notHfa for vararg methods
* Correctly pass isVararg
* Correct var name
|
|
LOCKADD nodes are generated rather early and there's no reason for that:
* The CORINFO_INTRINSIC_InterlockedAdd32/64 intrinsics are not actually used. Even if they would be used we can still import them as XADD nodes and rely on lowering to generate LOCKADD when needed.
* gtExtractSideEffList transforms XADD into LOCKADD but this can be done in lowering. LOCKADD is an XARCH specific optimization after all.
Additionally:
* Avoid the need for special handling in LSRA by making GT_LOCKADD a "no value" oper.
* Split LOCKADD codegen from XADD/XCHG codegen, attempting to use the same code for all 3 just makes things more complex.
* The address is always in a register so there's no real need to create an indir node on the fly, the relevant emitter functions can be called directly.
The last point above is actually a CQ issue - we always generate `add [reg], imm`, more complex address modes are not used. Unfortunately this problem starts early, when the importer spills the address to a local variable. If that ever gets fixed then we'll could probably generate a contained LEA in lowering.
|
|
Remove JIT LEGACY_BACKEND code
All code related to the LEGACY_BACKEND JIT is removed. This includes all code related to x87 floating-point code generation. Almost 50,000 lines of code have been removed.
Remove legacyjit/legacynonjit directories
Remove reg pairs
Remove tiny instruction descriptors
Remove compCanUseSSE2 (it's always true)
Remove unused FEATURE_FP_REGALLOC
|
|
And add a new range-check IR for x86 imm-intrinsics.
|
|
1. Fix `LEGACY_BACKEND`
2. `#if FEATURE_HW_INTRINSICS` => `#ifdef FEATURE_HW_INTRINSICS`
[tfs-changeset: 1686599]
|
|
Nothing should be defined as GenTreeUnOp; use GenTreeOp instead.
|
|
|
|
* Ifdef out legacy uses of GT_ASG_op
GT_ASG_op nodes are only generated when the legacy backend is used.
* Address feedback
* Cleanup gtOverflow/gtOverflowEx
|
|
Cleanup of Lowering & LsraInfo
|
|
These are preparatory changes for eliminating gtLsraInfo.
Register requirements should never be set on contained nodes. This includes setting isDelayFree and restricting to byte registers for x86.
- This results in net positive diffs for the framework (eliminating incorrect setting of hasDelayFreeSrc), though a net regression for the tests on x86 (including many instances of effectively the same code).
- The regressions are largely related to issue #11274.
Improve consistency of IsValue():
- Any node that can be contained should produce a value, and have a type (e.g. GT_FIELD_LIST).
- Some value nodes (GTK_NOVALUE isn't set) are allowed to have TYP_VOID, in which case IsValue() should return false.
- This simplifies IsValue().
- Any node that can be assigned a register should return true for IsValue() (e.g. GT_LOCKADD).
- PUTARG_STK doesn't produce a value; get type from its operand.
- This requires some fixing up of SIMD12 operands.
- Unused GT_LONG nodes shouldn't define any registers
Eliminate isNoRegCompare, by setting type of JTRUE operand to TYP_VOID
- Set GTF_SET_FLAGS on the operand to ensure it is not eliminated as dead code.
|
|
* GT_DIV_HI and GT_MOD_HI are not used anywhere
* genCodeForBinary doesn't handle GT_MUL_LONG
* OperIsHigh is not used anywhere
|
|
* JIT: Wrap some runtime lookups in new node type
Work based on the plan outlined in #14305.
Introduce a new unary node type GT_RUNTIMELOOKUP that wraps existing
runtime lookup trees created in `impLookupToTree` and holds onto the
handle that inspired the lookup.
Note there are other importer paths that create lookups directly that
we might also want to wrap, though wrapping is not a requirement for
correctness.
Keep this node type around through morph, then unwrap and just use
the wrapped node after that.
* JIT: More enhancements to type equality testing
The jit is now able to optimize some type equality checks in shared
method instances where one or both of the types require runtime lookup,
for instance comparing `T` to a value type, or comparing two different
value types `V1<T>` and `V2<T>`.
Add two new jit interface methods, one for testing for type equality and
the other for type casting.
These return Must/MustNot/Maybe results depending on whether or not
the equality or cast can be resolved at jit time.
Implement the equality check. Use this to enhance the type equality opts
in the jit for both direct comparison and the checking done by unbox.any.
|
|
Create new node type GT_JCMP to represent a
fused Relop + JTrue which does not set flags
Add lowering code to create GT_JCMP when
Arm64 could use cbz, cbnz, tbz, or tbnz
|
|
[RyuJIT/armarch] Put arguments with GT_BITCAST
|
|
Put arguments with GT_BITCAST instead of GT_COPY for arm32/arm64
Fix #14008
|
|
mark argplace node as no_lir
|
|
* show the problem with contained arg_place
We set contained on PUTARG_REG, but it doesn't pass IsContained check.
* Fix problem with gtControlExpr
* fix problem with ARGPLACE
* additional improvements1
We should never have a contained node that is the last node in the
execution order.
* additional impovement2 for xarch.
It is redundant, do not need to set as contained.
* additional improvement2 for arm
`GenTree* ctrlExpr = call->gtControlExpr;` was unused.
* additional improvement3: unify CheckLir.
|
|
|
|
This restores the `GT_INDEX_ADDR` changes.
|
|
This reverts commit a7ffdeca6fed927dbd457293d97b07237db95e82, reversing
changes made to f5f622db2a00d7687f256c0d1cdda5e6f6da7ad4.
|
|
We currently expand `GT_INDEX` nodes during morph into an explicit
bounds check followed by a load. For example, this tree:
```
[000059] ------------ /--* LCL_VAR int V09 loc6
[000060] R--XG------- /--* INDEX ref
[000058] ------------ | \--* LCL_VAR ref V00 arg0
[000062] -A-XG------- * ASG ref
[000061] D------N---- \--* LCL_VAR ref V10 loc7
```
is expanded into this tree:
```
[000060] R--XG+------ /--* IND ref
[000491] -----+------ | | /--* CNS_INT long 16 Fseq[#FirstElem]
[000492] -----+------ | \--* ADD byref
[000488] -----+-N---- | | /--* CNS_INT long 3
[000489] -----+------ | | /--* LSH long
[000487] -----+------ | | | \--* CAST long <- int
[000484] i----+------ | | | \--* LCL_VAR int V09 loc6
[000490] -----+------ | \--* ADD byref
[000483] -----+------ | \--* LCL_VAR ref V00 arg0
[000493] ---XG+------ /--* COMMA ref
[000486] ---X-+------ | \--* ARR_BOUNDS_CHECK_Rng void
[000059] -----+------ | +--* LCL_VAR int V09 loc6
[000485] ---X-+------ | \--* ARR_LENGTH int
[000058] -----+------ | \--* LCL_VAR ref V00 arg0
[000062] -A-XG+------ * ASG ref
[000061] D----+-N---- \--* LCL_VAR ref V10 loc7
```
Even in this simple case where both the array object and the index are
lclVars, this represents a rather large increase in the size of the IR.
In the worst case, the JIT introduces and additional lclVar for both the
array object and the index, adding several additional nodes to the tree.
When optimizing, exposing the structure of the array access may be
helpful, as it may allow the compiler to better analyze the program.
When we are not optimizing, however, the expansion serves little purpose
besides constraining the IR shapes that must be handled by the backend.
Due to its need for lclVars in the worst case, this expansion may even
bloat the size of the generated code, as all lclVar references are
generated as loads/stores from/to the stack when we are not optimizing.
In the case above, the expanded tree generates the following x64
assembly:
```
IN0018: 000092 mov rdi, gword ptr [V00 rbp-10H]
IN0019: 000096 mov edi, dword ptr [rdi+8]
IN001a: 000099 cmp dword ptr [V09 rbp-48H], edi
IN001b: 00009C jae G_M5106_IG38
IN001c: 0000A2 mov rdi, gword ptr [V00 rbp-10H]
IN001d: 0000A6 mov esi, dword ptr [V09 rbp-48H]
IN001e: 0000A9 movsxd rsi, esi
IN001f: 0000AC mov rdi, gword ptr [rdi+8*rsi+16]
IN0020: 0000B1 mov gword ptr [V10 rbp-50H], rdi
```
Inspired by other recent experiments (e.g. #13188), this change
introduces a new node that replaces the above expansion in MinOpts. This
node, `GT_INDEX_ADDR`, represents the bounds check and address
computation involved in an array access, and returns the address of the
element that is to be loaded or stored. Using this node, the example
tree given above expands to the following:
```
[000489] a--XG+------ /--* IND ref
[000059] -----+------ | | /--* LCL_VAR int V09 loc6
[000060] R--XG+--R--- | \--* INDEX_ADDR byref
[000058] -----+------ | \--* LCL_VAR ref V00 arg0
[000062] -A-XG+------ * ASG ref
[000061] D----+-N---- \--* LCL_VAR ref V10 loc7
```
This expansion requires only the addition of the `GT_IND` node that
represents the memory access itself. This savings in IR size translates
to about a 2% decrease in instructions retired during non-optimizing
compilation. Furthermore, this expansion tends to generate smaller
code; for example, the tree given above is generated in 29 rather than
35 bytes:
```
IN0018: 000092 mov edi, dword ptr [V09 rbp-48H]
IN0019: 000095 mov rsi, gword ptr [V00 rbp-10H]
IN001a: 000099 cmp rdi, qword ptr [rsi+8]
IN001b: 00009D jae G_M5106_IG38
IN001c: 0000A3 lea rsi, bword ptr [rsi+8*rdi+16]
IN001d: 0000A8 mov rdi, gword ptr [rsi]
IN001e: 0000AB mov gword ptr [V10 rbp-50H], rdi
```
|
|
|
|
SIMD8 values need to be converted to longs under a small number of
situations on x64/Windows:
- SIMD8 values are passed and returned as LONGs
- SIMD8 values may be stored to a LONG lclVar
Currently, LSRA performs some gymnastics when building use positions in
order to ensure that registers are properly allocated. This change is a
stab at a different approach: rather than pushing this work onto the RA,
lowering inserts `GT_BITCAST` nodes where necessary to indicate
that the source long- or SIMD8-typed value should be retinterpreted as
a SIMD8- or long-typed value as necessary. The RA performs one specific
optimization wherein it retypes stores of `GT_BITCAST` nodes to
non-register-candidate local vars with the type of the cast's operand
and preferences the cast to its source interval.
This approach trades slightly larger IR for some functions that
manipulate SIMD8 values for tighter code in buildRefPositions.
|
|
- Fix for putting `double` arguments between Lowering and Codegen phase
- Rename GenTreeMulLong to GenTreeMultiRegOp
GT_PUTARG_REG could be GenTreeMultiRegOp on RyuJIT/arm
Fix #12293
|
|
* [RyuJIT/ARM32] Enable passing large split struct
This enables passing split struct larger than 16 bytes.
To support splitted struct, it defines new GenTree type - GenTreePutArgSplit.
GenTreePutArgSplit is similar with GenTreePutArgStk,
but it is used for splitted struct only
and it has additional field to save register information.
GenTreePutArgSplit node is generated in lower phase.
* Apply reviews: split struct argument passing
- Fix some comments:
genPutArgSplit, GenTreePutArgStk, GenTreePutArgSplit, NuwPutArg, ArgComplete
- Add assertion check in genPutArgSplit, genCallInstruction
- Rename variable: baseReg
- Change flag for GenTreePutArgSplit: _TARGET_ARM && !LEGACY_BACKEND
- Change type of gtOtherRegs in GenTreePutArgSplit
- Remove duplicated code: NewPutArg
- Implement spill & restore flag for GenTreePutArgSplit
* Apply reviews
- Rebase
- Update managing spillFlag for split struct
- Implement spill & restore code generation
- Fix typos and rename variables
- Fix bug related to print gentree for split struct
* Fix bug and comments
- Fix bug in regset.cpp
- Add comments in morph.cpp's NYI_ARM
- Fix comments' typo in lsraarmarcp.cpp
|
|
Replace all uses of NodeName with OpName so we get proper names instead of symbols (e.g. ADD instead of +). Names stand out better, especially in JIT dumps where we use various symbols to draw tree lines.
|
|
|
|
|
|
This changes isContained to terminate early if it is not possible for the node in question
to ever be contained. There are many nodes that are statically non-containable: any node
that is not a value may not be contained along with a small number of value nodes. To avoid
the need for a `switch` to identify the latter, their node kinds have been marked with a
new flag, `GTK_NOCONTAIN`. The checks for identifying non-containable nodes are encapsulated
withing a new function, `canBeContained`.
|
|
|
|
|
|
|
|
GT_PUTARG_STK doesn't produce a value, so it should have the GTK_NOVALUE flag set.
Although the dstCount was being set to zero by the parent call, localDefUse was also being set, causing a register to be allocated. Fixing this produces a number of improvements due to reuse of constant registers that were otherwise unnecessarily "overwritten" by the localDefUse.
Also on x86, GT_LONG shouldn't be used to pass a long, since GT_LONG should always be a value-producing node. Instead, use the existing GT_FIELD_LIST approach.
|
|
Remove many of the restrictions on structs that were added to preserve behavior of the old IR form.
Change the init block representation to not require changing the parent when a copy block is changed to an init.
In addition, fix the bug that was causing the corefx ValueTuple tests to fail. This was a bug in assertion prop where it was not invaliding the assertions about the parent struct when a field is modified. Add a repro test case for that bug.
Incorporate the fix to #7954, which was encountered after the earlier version of these changes. This was a case where we reused a struct temp, and then it wound up eventually getting assigned the value it originally contained, so we had V09 = V09. This hit the assert in liveness where it wasn't expecting to find a struct assignment with the same lclVar on the lhs and rhs. This assert should have been eliminated with the IR change to struct assignments, but when this situation arises we may run into issues if we call a helper that doesn't expect the lhs and rhs to be the same. So, change morph to eliminate such assignments.
|
|
|
|
Remove many of the restrictions on structs that were added to preserve behavior of the old IR form.
Change the init block representation to not require changing the parent when a copy block is changed to an init.
|
|
Clean up GenTreeXxxx struct size dump; fix 32-bit build when MEASURE_NODE_SIZE is enabled.
|
|
Add support for GT_OBJ for x86, and allow them to be transformed into a list
of fields (in morph) if it is a promoted struct. Add a new list type for
this (GT_FIELD_LIST) with the type and offset, and use it for the multireg
arg passing as well for consistency.
Also refactor fgMorphArgs so that there is a positive check for reMorphing,
rather than relying on gtCallLateArgs, which can be null if there are no
register args.
In codegenxarch, modify the struct passing (genPutStructArgStk) to work for
both the x64/ux and x86 case, including the option of pushing fields onto
the stack.
Eliminate the redundant INS_movs_ptr, and replace with the pre-existing
INS_movsp.
|
|
This change adds support for shifting by a GT_CNS_INT without going
through a helper. If the shiftOp is a GT_CNS_INT we do several
transformations based on the shift amount:
If the shift amount is 0, the shift is a nop, so we just put together the
hi and lo ops as a GT_LONG.
If the shift amount is < 32, we generate a shl/shld pattern, a shr/shrd
pattern or a sar/shrd pattern, depending on the oper. The first operand of
the shrd/shld is a GT_LONG, which we crack in codegen, using it
essentially as two int operands, rather than creating a tri op GenTree
node (essentially so that we can have 3 operands, instead of the normal
two).
If the shift amount is 32, it differs between shifting left and shifting
right. For GT_LSH, we move the loOp into the hiResult and set the loResult
to 0. For GT_RSZ, we move the hiOp into the loResult, and set the hiResult
to 0. For GT_RSH, we move the hiOp into the loResult, and set the hiResult
to a 31 bit signed shift of the hiOp to sign extend.
If the shift amount is less than 64, but larger than 32: for GT_LSH, the
hiResult is a shift of the loOp by shift amount - 32 (the move from lo into hi is
the 32 bit shift). We set the loResult to 0. For GT_RSH and GT_RSZ, the
loResult is a right shift (signed for GT_RSH) of the hiOp by shift amount
- 32. The hiResult is 0 for GT_RSZ, and a 31 bit signed shift of hiOp1 for
GT_RSH.
If the shift amount is >= 64, we set both hiResult and loResult to 0 for
GT_LSH and GT_RSZ, and do a sign extend shift to set hiResult and loResult
to the sign of the original hiOp for GT_RSH.
|
|
* Add option (off by default) to report GenTree operator bashing stats.
|
|
|