Age | Commit message (Collapse) | Author | Files | Lines |
|
Also added an IL version of this test for regression repro on CoreCLR
|
|
See #14784 for details.
|
|
This optimization is not valid for unsigned LT/LE/GT/GE relops. Using the Carry flag this way indicates that the operation overflowed, not that the result is less than 0, that's impossible for unsigned integers.
This is also not valid for signed LT/LE/GT/GE relops due to integer overflow.
|
|
Handle GT_SIMD in GenTree::Compare
|
|
Fix #14028
|
|
newarr expects int32 or native int on the stack. x64 JIT hapilly allows an int64 as well but the x86 JIT asserts.
Also change the constant to 0xFFFFFFF1FFFFFFFF so OverflowException is thrown on 32 bit as well.
|
|
Fixed return constant value cache on 32 bit VMs
|
|
Don't early-propagate negative array lengths
|
|
[RyuJIT/ARM32] Remove LEA[b+0]
|
|
* extract CheckLclVarSemantics from CheckLIR.
* add a test that shows the silent bad execution.
* fix the checker.
* add the test to the exclude list.
* rename consumed to used
|
|
|
|
|
|
add repro for DevDiv_495792.
fix the issue.
|
|
There's no need for that and if the negative array length is not representable in 32 bit we'll end up producing a GT_CNS_INT node that has TYP_INT and a 64 bit value.
That's because the original type (always TYP_INT) of the GT_ARR_LENGTH is preserved when changing the node to GT_CNS_INT.
|
|
LEA[b+0] was not eliminated on ARM which cause assertion on code generation
|
|
Allow a RET_EXPR as a BYREF operand for SIMD intrinsics.
|
|
|
|
Fix #13568
|
|
The loop nest computation expects to be able to test block numbers for
lexical inclusion, so do a renumbering pass first if blocks have been
moved.
Fixes #13919.
|
|
Remove redundant zero-initialization of struct temps with GC fields.
Structs with GC pointer fields are fully zero-initialized in the prolog if compInitMem is true.
Therefore, we don't need to insert zero-initialization for the result of newobj or for inlinee locals
when they are structs with GC pointer fields and the basic bock is not in a loop.
|
|
|
|
The former is only necessary if it is set on any of the call's
arguments; the latter is necessary if the call may throw or if it is set
on any of the call's arguments.
Fixes DevDiv 491211.
|
|
Fix VSO 487701 and 487702.
|
|
Do not remove NOPs used by calls.
|
|
Both of these issues stem from attempting to unassign a copied interval
from the copied-to register and then reassigning the interval to the
same. This corrupts some of the bookkeeping necessary to track whether
or not the interval in the register needs to be spilled, and the RA ends
up attempting to spill the interval being allocated even though it is
not assigned a register.
|
|
|
|
|
|
* separate sources into 2 files: for 32 and for 64.
|
|
Repro test. Fix and additional assert.
|
|
Tests for #13056
|
|
Loop construction has a check for the case that an in-loop block has a
`bbNext` block that is a new block but not visited in the loop flow
walk; make sure that check fires for `entry` as well as other loop
blocks.
Fixes #13507.
|
|
fgMorphCast thinks that casting a i1 value to i2 via conv.ovf.i2.un is a widening conversion and removes the overflow check. But this is in fact a narrowing conversion because i1 is implicitly sign extended to i4 and then i4 is treated as u4. Going from i4 to u4 overflows for negative values so we can't treat the source type of the cast as i1, it has to be u4.
Of course, the existing code works fine if the source type is unsigned. Going from u1 to i4 and then to u4 never overflows so it's safe to treat the source type as u1.
|
|
Reprioritize tests to improve inner-loop throughput.
|
|
I'm seeing the affected code take the `impDevirtualizeCall` code path
with `CT_INDIRECT` calls. `gtCallMethHnd` is a `GenTreePtr` in that case
(it's a union) and passing that as as `CORINFO_METHOD_HANDLE` leads
to bad things.
|
|
Fix incorrect switch temp lcl type
|
|
|
|
Rearrange basic blocks during loop identification so that loop bodies
are kept contiguous when possible. Blocks in the lexical range of the
loop which do not participate in the flow cycle (which typically
correspond to code associated with early exits using `break` or
`return`) are moved out below the loop when possible without breaking EH
region nesting. The target insertion point, when possible, is chosen to
be the first spot below the loop that will not break up fall-through.
Layout can significantly affect the performance of loops, particularly
small search loops, by avoiding the taken branch on the hot path,
improving the locality of the code fetched while iterating the loop, and
potentially aiding loop stream detection.
Resolves #9692.
|
|
The check in CSE is supposed to leave code alone that constant prop
(done by VN-based Assertion Prop) is going to handle, but since that
constant prop code only propagates based on conservative VN, the check
in CSE needs to likewise use conservative VN to determine what to skip,
or else neither phase will eliminate the redundancy.
Fixes #6234.
|
|
|
|
Fix VSO 462269.
|
|
When decomposing a long compare on 32-bit platforms, the operands to the
decomposed compare must be sign- or zero-extended appropriately.
|
|
If we remove a NOP during rationalize that is unused, we need to ensure
that its operand is also marked as an unused value.
|
|
Boxing a value type produces a non-null result. If the result of the box is
only used to feed a compare against null, the jit tries to optimize the box
away entirely since the result of the comparison is known. Such idiomatic
expressions arise fairly often in generics instantiated over value types.
In the current implementation the box expands into two parts. The first is
an upstream statement to allocate a boxed object and assign a reference to
the boxed object to a local var known as the "box temp". The second is an
expression tree whose value is the box temp that also contains an an
encapsulated copy from the value being boxed to the payload section of the
boxed object. The box node also contains a pointer back to the first
statement (more on this later).
In the examples being discussed here this second tree is a child of a compare
node whose other child is a null pointer. When the optimization fires, the
upstream allocation statement is located via the pointer in the box node and
removed, and the entire compare is replaced with a constant 0 or 1 as
appropriate. Unfortunately the encapsulated copy in the box subtree may
include side effects that should be preserved, and so this transformation is
unsafe.
Note that the copy subtree as a whole will always contain side effects, since
the copy is storing values into the heap, and that copy now will not happen.
But the side effects that happen when producing the value to box must remain.
In the initial example from #12949 the side effects in question were
introduced by the jit's optimizer to capure a CSE definition. #13016 gives
several other examples where the side effects are present in the initial user
code. For instance the value being boxed might come from an array, in which
case the encapsulated copy in the box expression tree would contain the array
null check and bounds check. So removing the entire tree can alter behavior.
This fix attempts to carefully preserve the important side effects by
reworking how a box is imported. The copy is now moved out from under the box
into a second upstream statement. The box itself is then just a trivial
side-effect-free reference to the box temp. To ensure proper ordering of side
effects the jit spills the evaluation stack before appending the copy
statement.
When the optimization fires the jit removes the upstream heap allocation
as before, as well as the now-trivial compare tree. It analyzes the source
side of the upstream copy. If it is side effect free, the copy is removed
entirely. If not, the jit modifies the copy into a minimal load of the
boxed value, and this load should reproduce the necessary side effects.
The optimization is only performed when the tree shape of the copy matches
expected patterns.
There are some expected cases where the tree won't match, for instance if the
optimization is invoked while the jit is inlining. Because this optimization
runs at several points the jit can catch these cases once inlining completes.
There is one case that is not handled that could be -- if the assignment part
of the copy is itself a subtree of a comma. This doesn't happen often.
The optimization is now also extended to handle the case where the comparision
operation is `cgt.un`. This doesn't catch any new cases but causes the
optimization to happen earlier, typically during importation, which should
reduce jit time slightly.
Generally the split of the box into two upstream statements reduces code size,
especially when the box expression is incorporated into a larger tree -- for
example a call. However in some cases where the value being boxed comes from
an array, preserving the array bounds check now causes loop cloning to kick
in and increase code size. Hence the overall size impact on the jit-diff set is
essentially zero.
Added a number of new test cases showing the variety of situations that must
be handled and the need to spill before appending the copy statement.
Fixes #12949.
|
|
Treat byref-typed uses of int-typed lclVars as type int in LB.
|
|
This is consistent with the behvaior of both JIT32 and RyuJIT. This
resolves an assertion originating from the following scenario:
1. The input IL contains a lclVar of type `Foo*`, which the JIT imports as
`TYP_I_IMPL` (which is `TYP_INT` in this case).
2. This lclVar is used as the `this` argument to a number of method calls.
This is legal as per ECMA-335 section III.3.19 ("Correct CIL also allows
a native int to be passed as a byref (&); in which case following the
store the value will be tracked by garbage collection.")
3. All of the method calls to which this lclVar is passed as a byref are
inlined. This produces many uses of the lclVar as a byref (i.e. we see
nodes like `lclVar V06 byref` even though V06's varDsc has type int).
4. The lclVar is assigned a register `r`. At its first appearance--which is
the first occasion in which it is loaded into this register--it is used
as `TYP_BYREF`. When the code generator processes this appearance, it
first sets the appropriate bit for `r` in `gcInfo.gcRegByrefSetCur`
(`gcInfo.gcMarkRegPtrVal`, called by `genCodeForTree_REG_VAR1`) and then
sets the appropriate bit for `r` in `regSet.rsMaskVars`
(`genUpdateLife`).
5. The lclVar is used as `TYP_INT` as the operand to a `GT_RETURN` node.
When the code generator processes this appearance, it attempts to update
the GC-ness of `r` by calling `gcInfo.gcMarkRegPtrVal` (again via
`genCodeForTree_REG_VAR1`). This function, though, explicitly excludes
registers that contain live variables from its update, so `r` remains in
`gcInfo.gcRegByrefSetCur` after this call. After calling
`gcMarkRegPtrVal`, `genCodeForTree_REG_VAR1` calls `genUpdateLife`,
which removes the the lclVar from `regSet.rsMaskVars`.
6. An assertion intended to verify that the only registers that are live
after processing a statement are registers that contain lclVars fires,
as `gcRegByrefSetCur` still contains `r`.
Fixes VSO 468732.
|
|
P-DEP local vars are logically independent locals, but physically embeded
in some structure with fixed layout. So they cannot be made larger.
We already had safeguards for ths in place for x86 so extend these to kick
in for x64 too.
Also update Lowering's checker to account for the fact that some SIMD12s
can persist in x64.
Fixes #12950.
|
|
The bug was repro-ing on a dynamic method produced by Reflection::Emit.
It's not possible to repro the bug on normal C# code because of
C# definite assignment rules.
This change adds a simple il regression test.
|
|
Fixes VSO 462274.
|
|
|
|
Tail recursion elimination transforms a tail call into a loop.
If compInitMem is set, we may need to zero-initialize some locals. Normally it's done in the prolog
but the loop we are creating can't include the prolog. The fix is to insert zero-initialization
for all non-parameter non-temp locals in the loop. Liveness phase will remove unnecessary initializations.
We never hit this case with normal C# code since C# definite assignment rules ensure that there are
no uninitialized locals in the generated msil. In the repro the method with tail recursion is a dynamic method
and it has an uninitialized local.
|