Age | Commit message (Collapse) | Author | Files | Lines |
|
Re-enable Avx.PermuteVar tests
|
|
|
|
Fix two cases of FP-relative immediate offsets possibly not encodable
|
|
* Move more SSE2 tests to the template
* Improve Insert test template to involve more codegen situations
* Fix imm-operand encoding for SSE/AVX instructions
|
|
Related issue: https://github.com/dotnet/coreclr/issues/21698
|
|
encodability
For ARM32/ARM64, the immediate offsets in addressing modes have
limited range that varies by instruction. A couple cases were not
checking for that range, leading to generating potentially
un-encodable instruction.
In particular, the test case shows a case where a very large frame in a function
with a stored generic context would fail on ARM64.
There are no code diffs from this change for ARM64, except we sometimes get
better assembly comments where the local variable referenced is annotated on
the store instruction. For ARM32, the "secret stub param" is now stored using
SP-relative addressing, not FP-relative, if possible (which we generally prefer
in main function bodies).
|
|
This change enables object stack allocation for more cases.
1. Objects with gc fields can now be stack-allocated.
2. Object stack allocation is enabled for x86.
ObjectAllocator updates the types of trees containing references
to possibly-stack-allocated objects to TYP_BYREF or TYP_I_IMPL as appropriate.
That allows us to remove the hacks in gcencode.cpp and refine reporting of pointers:
the pointer is not reported when we can prove that it always points to a stack-allocated object or is null (typed as TYP_I_IMPL);
the pointer is reported as an interior pointer when it may point to either a stack-allocated object or a heap-allocated object (typed as TYP_BYREF);
the pointer is reported as a normal pointer when it points to a heap-allocated object (typed as TYP_REF).
ObjectAllocator also adds flags to indirections:
GTF_IND_TGTANYWHERE when the indirection may be the heap or the stack
(that results in checked write barriers used for writes)
or the new GTF_IND_TGT_NOT_HEAP when the indirection is null or stack memory
(that results in no barrier used for writes).
|
|
Fix issue with devirtualization and tailcalls
|
|
Don't optimize MultiplyNoFlags away
|
|
Range is -128 to 127, not -127 to 128.
|
|
Change test priority to 0
|
|
|
|
|
|
Fix CRC32 instruction encoding on containment form
|
|
(#21914)
|
|
Check the optimize settings of JIT test projects.
|
|
|
|
* Copy address-taken SIMD intrinsic
This occurs, for example, when boxing the result of a SIMD intrinsic. This was being handled for the HW intrinsic case, but not the SIMD Vector intrinsics. Also, eliminate `OperIsSimdHWIntrisic` since it redundantly checks for the case of a SIMD result, even though it was always called in a context where the result is known to be a struct type.
Fix #21854
|
|
the appropriate size (#21641)
* Fixing ContainCheckHWIntrinsic to ensure that scalar integer operands are the appropriate size
* Adding a regression test for issue 21625
* Fixing IsContainableHWIntrinsicOp to use the containing node type (rather than the simd base type) for Scalar intrinsics
* Fixing the containment check for `Sse41.Insert(V128<float>, V128<float>, byte)`
* Cleaning up the isContainableHWIntrinsicOp logic in lowerxarch.cpp
* Restrict containment to genActualType(baseType)
* Formatting
* Removing some comments and simplifying the supportsContainment checks for various HWIntrinsics that take a scalar operand
* Applying formatting patch
|
|
This test was doing in-place modification of a managed string using
unsafe code which breaks string interning.
|
|
|
|
This PR will output the projects that have mismatched optimize settings
but won't fix them. This is part of tasks in #19166
|
|
* Add tests for BMI1/2 intrinsic
* Implement the remaining BMI1/2 intrinsic
* Fix emitDispIns for BMI instruction
|
|
Besides the fact that volatile indirections aren't supposed to be removed doing so in this case results in incorrect code being generated.
The GT_IND node has GTF_DONT_CSE set and this gets copied to the GT_LCL_VAR node. Later this prevents the insertion of a normalization cast because GTF_DONT_CSE on a GT_LCL_VAR node is assumed to mean that the variable address is being taken.
|
|
Transform SIMD8 to FIELD_LIST if promoted
|
|
Fix #21546
|
|
Test for #20657 SIMD byref bug
|
|
Embed the following test files as the resources in their corresponding test assemblies:
* regexdna-input25.txt regexdna-input25000.txt in BenchmarksGame/regex-redux
* knucleotide-input.txt knucleotide-input-big.txt in BenchmarksGame/k-nucleotide
* revcomp-input25.txt revcomp-input25000.txt in BenchmarksGame/reverse-complement
|
|
|
|
Fix arm32 local variable references
|
|
This change modified the importer to create GenTreeAllocObj node for
box and newobj instead of a helper call in R2R mode. ObjectAllocator phase
decides whether the object can be allocated on the stack or has to be created
on the heap via a helper call.
To trigger object stack allocation COMPlus_JitObjectStackAllocation has
to be set (it's not set by default).
|
|
|
|
Improve hardware intrinsic tests
|
|
Arm32 has different addressing mode offset ranges for floating-point
and integer instructions. In addition, the ranges aren't too large.
So in functions with a frame pointer, we try to access some variables
using the frame pointer and some with the stack pointer, to expand the
total number of variables we can access without allocating a "reserved
register" just used for constructing large offsets.
This calculation was incorrect for struct variables that contained floats,
as float fields require calculating using the floating point range, but we
were calculating using the variable type (struct), instead of the instruction
type (floating-point). In addition, we were not correctly calculating the
frame pointer range using the actual variable offset plus "within variable"
offset (struct member offset).
Added a test that covers some of these cases.
Fixes #19537
|
|
Fix lclvar "cloning" in const division lowering
|
|
`codegenarm64`. (#21344)
* Add a repro.
* Exclude the failing test.
* fix formatting
|
|
|
|
|
|
Implement 64-bit-only hardware intrinsic
|
|
For quite a while now the jit has been propagating return types from
callees to the return spill temp. However this is only safe when the
callee has a single return site (or all return sites return the same
type).
Because return spill temps often end up getting assigned to still more
temps we haven't seen this overly aggressive type propgagation lead to
bugs, but now that we're tracking single def temps and doing more type
propagation during the late devirtualization callback, the fact that
these types are wrong has been exposed and can lead to incorrect
devirtualization.
The fix is to only consider the return spill temp as single def if the
callee has a single return site, and to check that the return spill temp
is single def before trying to propagate the type.
Fixes #21295.
|
|
The existing code was creating new uses of existing variables using the type of the division operation instead of the type of the existing variable use. There is one rare case when this does not work properly:
* if the variable is "normalize on load" (that implies that it is also a small int type variable)
* and if the variable is previously assigned to such that a subrange assertion is generated
* and if morph decides not to insert the load normalization cast due to the subrange assertion
* and if the variable is not enregistered
Then the assignment stores 2 bytes to the stack location but the uses generated by the division lowering code read 4 bytes because they have TYP_INT.
|
|
|
|
Fix the capitalization of COMPlus_JITMinOpts variable, which
is case-sensitive on Linux.
Fixes #21266
|
|
Fix normalize on load handling in assertion propagation
|
|
|
|
We can't assume objects don't escape in helpers marked as pure for the following reasons:
1. The helpers may call user code that may make objects escape, e.g.,
https://github.com/dotnet/coreclr/blob/c94d8e68222d931d4bb1c4eb9a52b4b056e85f12/src/vm/jithelpers.cpp#L2371
2. The helpers usually protect gc pointers with GCPROTECT_BEGIN() so the pointers are reported as normal pointers to the gc.
Pointers to stack-allocated objects need to be reported as interior so they have to be protected with
GCPROTECT_BEGININTERIOR().
3. The helpers may cause these asserts in ValidateInner on stack-allocated objects:
https://github.com/dotnet/coreclr/blob/c94d8e68222d931d4bb1c4eb9a52b4b056e85f12/src/vm/object.cpp#L723
https://github.com/dotnet/coreclr/blob/c94d8e68222d931d4bb1c4eb9a52b4b056e85f12/src/vm/object.cpp#L730
|
|
ObjectStackAllocationTests use GC.GetAllocatedBytesForCurrentThread to
verify object stack allocations. Under GCStress the vm may initiate additional
heap allocations in GCHeap::StressHeap (see the call to 'pGenGCHeap->allocate' below).
This change re-enables ObjectStackAllocationTests and updates then to not verify stack allocations under GCStress.
It's useful to run the tests even without the verification to catch crashes, gc asserts, etc.
```
if (Interlocked::Increment(&OneAtATime) == 0 &&
!TrackAllocations()) // Messing with object sizes can confuse the profiler (see ICorProfilerInfo::GetObjectSize)
{
StringObject* str;
// If the current string is used up
if (HndFetchHandle(m_StressObjs[m_CurStressObj]) == 0)
{
// Populate handles with strings
int i = m_CurStressObj;
while(HndFetchHandle(m_StressObjs[i]) == 0)
{
_ASSERTE(m_StressObjs[i] != 0);
unsigned strLen = (LARGE_OBJECT_SIZE - 32) / sizeof(WCHAR);
unsigned strSize = PtrAlign(StringObject::GetSize(strLen));
// update the cached type handle before allocating
SetTypeHandleOnThreadForAlloc(TypeHandle(g_pStringClass));
str = (StringObject*) pGenGCHeap->allocate (strSize, acontext);
if (str)
{
str->SetMethodTable (g_pStringClass);
str->SetStringLength (strLen);
HndAssignHandle(m_StressObjs[i], ObjectToOBJECTREF(str));
}
i = (i + 1) % NUM_HEAP_STRESS_OBJS;
if (i == m_CurStressObj) break;
}
// advance the current handle to the next string
m_CurStressObj = (m_CurStressObj + 1) % NUM_HEAP_STRESS_OBJS;
}
// Get the current string
str = (StringObject*) OBJECTREFToObject(HndFetchHandle(m_StressObjs[m_CurStressObj]));
if (str)
{
// Chop off the end of the string and form a new object out of it.
// This will 'free' an object at the begining of the heap, which will
// force data movement. Note that we can only do this so many times.
// before we have to move on to the next string.
unsigned sizeOfNewObj = (unsigned)Align(min_obj_size * 31);
if (str->GetStringLength() > sizeOfNewObj / sizeof(WCHAR))
{
unsigned sizeToNextObj = (unsigned)Align(size(str));
uint8_t* freeObj = ((uint8_t*) str) + sizeToNextObj - sizeOfNewObj;
pGenGCHeap->make_unused_array (freeObj, sizeOfNewObj);
str->SetStringLength(str->GetStringLength() - (sizeOfNewObj / sizeof(WCHAR)));
}
else
{
// Let the string itself become garbage.
// will be realloced next time around
HndAssignHandle(m_StressObjs[m_CurStressObj], 0);
}
}
}
Interlocked::Decrement(&OneAtATime);
```
|
|
When a method returns a multi-reg struct, we set GTF_DONT_CSE on return local: https://github.com/dotnet/coreclr/blob/497419bf8f19c649d821295da7e225e55581cce9/src/jit/importer.cpp#L8783
Setting GTF_DONT_CSE blocks assertion propagation here:
https://github.com/dotnet/coreclr/blob/9d49bf1ec6f102b89e5c2885e8f9d3d77f2ec144/src/jit/assertionprop.cpp#L2845-L2848
In the test we have a synchronized method so we change the return node to
return a comma that include a call to HELPER.CORINFO_HELP_MON_EXIT.
If the rightmost comma expression doesn't have GTF_DONT_CSE,
assertion propagation is not blocked and we end up with this tree:
```
[000040] -----+------ /--* CNS_INT struct 0
[000043] --C-G+------ /--* COMMA struct
[000036] --C-G+------ | \--* CALL help void HELPER.CORINFO_HELP_MON_EXIT
[000032] L----+------ arg1 in x1 | +--* ADDR long
[000031] ----G+-N---- | | \--* LCL_VAR ubyte (AX) V03 tmp1
[000033] -----+------ arg0 in x0 | \--* LCL_VAR ref V00 this
[000041] -AC-G+------ * COMMA struct
[000006] -----+-N---- | /--* LCL_VAR struct V01 loc0
[000039] -A---+------ \--* ASG struct (copy)
[000037] D----+-N---- \--* LCL_VAR struct V05 tmp3
```
Downstream phases can't handle struct zero return expressed as
```
[000040] -----+------ /--* CNS_INT struct 0
```
The fix is to propagate GTF_DONT_CSE to the rightmost comma expression
to block bad assertion propagation.
Fixes #21011.
|
|
* Updating the VectorNotSupportedTest to throw on failure
* Updating the HWIntrinsic test templates to check success on a per-scenario basis
* Regenerating the templated HWIntrinsic tests
|
|
Also, add a test for #20958
|