Age | Commit message (Collapse) | Author | Files | Lines |
|
Byref pointers need to point within their "host" object -- thus
the alternate name "interior pointers". If the JIT creates and
reports a pointer as a "byref", but it points outside the host
object, and a GC occurs that moves the host object, the byref
pointer will not be updated. If a subsequent calculation puts
the byref "back" into the host object, it will actually be pointing
to garbage, since the host object has moved.
This occurred on ARM with array index calculations, in particular
because ARM doesn't have a single-instruction "base + scale*index + offset"
addressing mode. Thus, we were generating, for the jaggedarr_cs_do
test case, `ProcessJagged3DArray()` function:
```
// r0 = array object, r6 = computed index offset. We mark r4 as a byref.
add r4, r0, r6
// r4 - 32 is the offset of the object we care about. Then we load the array element.
// In this case, the loaded element is a gcref, so r4 becomes a gcref.
ldr r4, [r4-32]
```
We get this math because the user code uses `a[i - 10]`, which is
essentially `a + (i - 10) * 4 + 8` for element size 4. This is optimized
to `a + i * 4 - 32`. In the above code, `r6` is `i * 4`. In this case,
after the first instruction, `r4` can point beyond the array.
If a GC happens, `r4` isn't updated, and the second instruction loads garbage.
There are several fixes:
1. Change array morphing in `fgMorphArrayIndex()` to rearrange the array index
IR node creation to only create a byref pointer that is precise; don't create
"intermediate" byref pointers that don't represent the actual array element
address being computed. The tree matching code that annotates the generated tree
with field sequences needs to be updated to match the new form.
2. Change `fgMoveOpsLeft()` to prevent the left-weighted reassociation optimization
`[byref]+ (ref, [int]+ (int, int)) => [byref]+ ([byref]+ (ref, int), int)`. This
optimization creates "incorrect" byrefs that don't necessarily point within
the host object.
3. Add an additional condition to the `Fold "((x+icon1)+icon2) to (x+(icon1+icon2))"`
morph optimization to prevent merging of constant TYP_REF nodes, which now were
being recognized due to different tree shapes. This was probably always a problem,
but the particular tree shape wasn't seen before.
These fixes are all-platform. However, to reduce risk at this point, the are
enabled for ARM only, under the `FEATURE_PREVENT_BAD_BYREFS` `#ifdef`.
Fixes #17517.
There are many, many diffs.
For ARM32 ngen-based desktop asm diffs, it is a 0.30% improvement across all
framework assemblies. A lot of the diffs seem to be because we CSE the entire
array address offset expression, not just the index expression.
|
|
The JIT write barrier helpers have a custom calling convention that
avoids killing most registers. The JIT was not taking advantage of
this, and thus was killing unnecessary registers when a write barrier
was necessary. In particular, some integer callee-trash registers
are unaffected by the write barriers, and no floating-point register
is affected.
Also, I got rid of the `FEATURE_WRITE_BARRIER` define, which is always
set. I also put some code under `LEGACY_BACKEND` for easier cleanup
later. I removed some unused defines in target.h for some platforms.
|
|
* [arm32] Fixed RBM_PROFILER_*
* Changed trash registers to RBM_NONE
|
|
* Add FEATURE_CROSSBITNESS in crosscomponents.cmake
* Exclude mscordaccore mscordbi sos from CLR_CROSS_COMPONENTS_LIST when FEATURE_CROSSBITNESS is defined in crosscomponents.cmake
* Introduce target_size_t in src/jit/target.h
* Use size_t value in genMov32RelocatableImmediate in src/jit/codegen.h src/jit/codegencommon.cpp
* Fix definition/declaration inconsistency for emitter::emitIns_R_I in emitarm.cpp
* Zero HiVal when GetTree::SetOper GenTreeLngCon->GetTreeIntCon in src/jit/compiler.hpp
* Explicity specify roundUp(expr, TARGET_POINTER_SIZE)
* Use target_size_t* target in emitOutputDataSec in src/jit/emit.cpp
|
|
When unrolling a STOREOBJ, we can generate multiple consecutive
byref helper calls. This helper has a unique calling convention
where the dst and src addresses are modified by adding pointer
size to their original value (thus allowing consecutive helper
calls without reloading the dst/src addresses). So, for liveness
purposes, the helper call kills the dst/src values. However, for
GC purposes, it does not, as the registers still contain byref
pointers. We were, in the ARM case, reporting the r0/r1 registers
dead after the first call, so a GC didn't update them, and a
second call updated garbage.
In fixing this, I cleaned up the helper call kill handling a bit.
I also fixed and improved RyuJIT/x86 write barrier kill modeling.
|
|
The GS cookie check was using r2/r3 registers, after they had been
reloaded as outgoing argument registers for the JMP call, thus
trashing them. Change the temp regs used to r12/lr, the only
non-argument, non-callee-saved registers available on arm32.
Partially fixes #14862
|
|
On ARM64 IP0 and IP1 are not in the register selection order, though there are some cases where they must be allocated. See #14607. So we may see them as free when looking for a register to spill.
Also, V15 was missing from the selection order (V16 was in the order twice).
Fix #14626
|
|
Revert to prior behavior.
|
|
* [Arm64] SIMD simple defines
* Fix #else
|
|
LSRA Arm64 consistent reg sets
|
|
* Codegen for fast tail call
Codegen call and epilog for fast tail call
* Implementation for GT_START_NONGC
This implementation removes two NYI_ARM
Code generation for GT_START_NONGC which is used to prevent GC in fast tail call
* Define fast tail call target register and mask on ARMARCH
Define REG_FASTTAILCALL_TARGET and RBM_FASTTAILCALL_TARGET on ARMARCH
Modify lsra init and codegen to use these definition
* Merge genFnEpilog
Merge genFnEpilog for ARM32 and ARM64
* Fix bug in getFirstArgWithStackSlot
Fix bug in getFirstArgWithStackSlot: AMD64 and X86
|
|
tryAllocateFreeReg() uses the RegOrder array to iterate over the available registers. This needs to be consistent with the available registers of the given type. Otherwise, allocateBusyReg() will assert when it finds a free register that should have been allocated in tryAllocateFreeReg().
Fix #14591
|
|
|
|
|
|
[Arm64] Add emitters for cbz, cbnz, tbz, and tbnz
|
|
|
|
hqueue/arm/ryujit/issue_12614_enable_unrolling_for_cpblk
[RyuJIT/ARM32] enable unrolling for cpblk
|
|
- Define INITBLK_UNROLL_LIMIT and CPBLK_UNROLL_LIMIT for ARM32
- Introduce same helper for codegen to ARM32 from ARM64,
i.e. genCodeForLoadOffset() and genCodeForStoreOffset()
- Introduce genCodeForCpBlkUnroll() to ARM32 from ARM64.
- Enable unrolling for cpblk
|
|
* implement profiler ELT callbacks for AMD64 Linux
* Some formatting fixes
* Fixed profiler
* Added aligning frame option
* Added aligning stack for quad values stores
|
|
* make REG_VIRTUAL_STUB_PARAM depended on Abi.
Add VirtualStubParamInfo that is part of compiler and must be init in
compCompile.
|
|
* delete DECLARE_TYPED_ENUM
delete the workaroung for g++ c++11, that was fixes in gcc 4.4.1 many
years ago.
The workaround makes code dirty and sometimes we have typos like:
};
END_DECLARE_TYPED_ENUM(insFlags,unsigned)
or
END_DECLARE_TYPED_ENUM(ChunkExtraAttribs, BYTE);
with double ;;
* jit-format
|
|
Eliminate the use of GTF_REG_VAL in RyuJIT and reuse the
bit to mark nodes as contained.
Abstract GTF_REG_VAL for legacy backend.
|
|
Make RBM_CALLEE_TRASH be consistent with CORINFO_HELP_STOP_FOR_GC helper
which is JIT_RareDisableHelper defined in vm/arm/asmhelpers.S
Signed-off-by: Hyung-Kyu Choi <hk0110.choi@samsung.com>
|
|
CORINFO_HELP_STOP_FOR_GC helper preserves integer and double return register.
Signed-off-by: Hyung-Kyu Choi <hk0110.choi@samsung.com>
|
|
[Ryujit/ARM32] LSRA support for arm32 floating-point register allocation
|
|
* Implement lowering for GT_STORE_OBJ
In #10657, I commented that the messages for NYI were printed about GT_STORE_OBJ on running the CodeGenBringUpTests.
'Lowering::LowerBlockStore(GenTreeBlk* blkNode)' method implementation is just copied.
but after lowering phase, in code generation, codegenarm.cpp, below would be used.
```cpp
blkNode->gtBlkOpKind = GenTreeBlk::BlkOpKindUnroll;
```
```cpp
blkNode->gtBlkOpKind = GenTreeBlk::BlkOpKindHelper;
```
```cpp
void CodeGen::genCodeForStoreBlk(GenTreeBlk* blkOp)
{
if (blkOp->gtBlkOpGcUnsafe)
{
getEmitter()->emitDisableGC();
}
bool isCopyBlk = blkOp->OperIsCopyBlkOp();
switch (blkOp->gtBlkOpKind)
{
case GenTreeBlk::BlkOpKindHelper:
if (isCopyBlk)
{
genCodeForCpBlk(blkOp);
}
else
{
genCodeForInitBlk(blkOp);
}
break;
case GenTreeBlk::BlkOpKindUnroll:
if (isCopyBlk)
{
genCodeForCpBlkUnroll(blkOp);
}
else
{
genCodeForInitBlkUnroll(blkOp);
}
break;
default:
unreached();
}
if (blkOp->gtBlkOpGcUnsafe)
{
getEmitter()->emitEnableGC();
}
}
```
'genCodeForCpBlk' and 'genCodeForInitBlk' are implemented in ARM/ARM64 by MEMCPY/MEMSET
but 'genCodeForCpBlkUnroll' and 'genCodeForInitBlkUnroll' are not implemented both ARM and ARM64.
Therefore those need to implement.
* Implement NYI : GT_STORE_OBJ is needed of write barriers implementation
It was copied from ARM64 and removed codes related with gc write barrier which doesn't support on ARM.
* Implement CodeGen::genCodeForCpObj
* Refactor some codes
* Use INS_OPTS_LDST_POST_INC option for post-indexing
When structure is copied, the results of asm codes have been strange.
IN0013: 000048 ldr r3, [r1+4]
IN0014: 00004A str r3, [r0+4]
IN0015: 00004C ldr r3, [r1+4]
IN0016: 00004E str r3, [r0+4]
It needs that the index would increment or the post-indexing.
So I use INS_OPTS_LDST_POST_INC option for post-indexing when the instruction is emitted.
* Fix conflicts
* Fix conflicts and Apply #11219
I want to merge genCodeForCpObj to codegenarmarch.cpp but the function has modified in #11219.
So I decided to keep the code divided now.
In future, If modifying the function on ARM is also needed, it would be able to modify.
* Fix conflicts
* Remove NYI
* Fix genCountBits assertion
|
|
[Arm64] Enable FEATURE_TAILCALL_OPT
|
|
* [Arm64/Unix] Enable FEATURE_USE_SOFTWARE_WRITE_WATCH_FOR_GC_HEAP
* [Arm64/Unix] Enable FEATURE_MANUALLY_MANAGED_CARD_BUNDLES
|
|
|
|
* When computing RegRecords at BasicBlock entry, let's
consider overlapping floating and double registers.
Signed-off-by: Hyung-Kyu Choi <hk0110.choi@samsung.com>
|
|
[RyuJIT/ARM32] [ReadyToRun] Fix target register for invocation to Thunk
|
|
Chagne _TARGET_ARM_ and _TARGET_ARM64_ to _TARGET_ARMARCH_
|
|
- Use parameter r4 to pass Indirection from code generated by R2R
- Define REG_R2R_INDIRECT_PARAM in ARM32 to merge with ARM64 routine
|
|
|
|
Enable Windows hosted, Linux target amd64 altjit
With this change, we build a JIT that runs on Windows amd64
and targets Linux amd64, as an altjit named linuxnonjit.dll.
This is useful for debugging, or generating asm code or diffs.
You can even easily create Windows/non-Windows asm diffs
(either to compare the asm, or compare the generated code size).
For this to work, the JIT-EE interface method
getSystemVAmd64PassStructInRegisterDescriptor() was changed
to always be built in, by defining `FEATURE_UNIX_AMD64_STRUCT_PASSING_ITF`
in all AMD64 builds. The `_ITF` suffix indicates that this is
functionality specific to implementing the JIT-EE interface
contract. There were many places in the VM that used this
interchangeably with `FEATURE_UNIX_AMD64_STRUCT_PASSING`. Now,
`FEATURE_UNIX_AMD64_STRUCT_PASSING` means code in the VM needed
to implement this feature, but not required to implement the
JIT-EE interface contract. In particular, MethodTables compute
and cache the "eightbyte" info of structs when loading a type.
This is not done when only `FEATURE_UNIX_AMD64_STRUCT_PASSING_ITF`
is set, to avoid altering MethodTable behavior on non-Unix
AMD64 builds. Instead, if `getSystemVAmd64PassStructInRegisterDescriptor()`
is called on a non-Unix build (by the altjit), the `ClassifyEightBytes()`
function is called, and nothing is cached. Hopefully (though it was
hard for me to guarantee by observation), calling `ClassifyEightBytes()`
does not have any side effects on MethodTables. It doesn't really matter,
since if called for altjit, we don't care too much about running.
The previously used `PLATFORM_UNIX` define is now insufficient.
I introduced the `#define` macros `_HOST_UNIX_` to indicate the
JIT being built will run on Unix, and `_TARGET_UNIX_` to indicate
the JIT is generating code targeting Unix. Some things were
converted to use the `UNIX_AMD64_ABI` define, which makes more
sense.
|
|
|
|
[RyuJIT/ARM32] Fix helper kill mask and call target consuming
|
|
|
|
The arrays were being initialized with register numbers, not
register masks. The arrays are only used on non-Windows amd64
for homing circular incoming argument conflicts. Perhaps we
never saw this happen?
|
|
|
|
Change JIT code to align stack in 16 byte used in modern compiler
|
|
* [x86/Linux] (Partially) Enable FEATURE_EH_FUNCLETS
* Update CLR ABI Document
* Add TODO (for Funclet Prolog/Epilog Gen)
|
|
Add UNIX_X86_ABI definition for Unix/Linux specific ABI parts
First will be for 16 byte stack alignment codes
|
|
|
|
There is no native load/store instruction for Vector3/TYP_SIMD12,
so we need to break this type down into two loads or two stores,
with an additional instruction to put the values together in the
xmm target register. AMD64 SIMD support already implements most of
this. For RyuJIT/x86, we need to implement stack argument support
(both incoming and outgoing), which is different from the AMD64 ABI.
In addition, this change implements accurate alignment-sensitive
codegen for all SIMD types. For RyuJIT/x86, the stack is only 4
byte aligned (unless we have double alignment), so SIMD locals are
not known to be aligned (TYP_SIMD8 could be with double alignment).
For AMD64, we were unnecessarily pessimizing alignment information,
and were always generating unaligned moves when on AVX2 hardware.
Now, all SIMD types are given their preferred alignment in
getSIMDTypeAlignment() and alignment determination in
isSIMDTypeLocalAligned() takes into account stack alignment (it
still needs support for x86 dynamic alignment). X86 still needs to
consider dynamic stack alignment for SIMD locals.
Fixes #7863
|
|
On x86, the P/Invoke cookie (when required) is passed on the stack
after all other stack arguments (if any).
|
|
The "JIT flags" currently passed between the EE and the JIT have traditionally
been bit flags in a 32-bit word. Recently, a second 32-bit word was added to
accommodate additional flags, but that set of flags is definitely "2nd class":
they are not universally passed, and require using a separate set of bit
definitions, and comparing those bits against the proper, 2nd word.
This change replaces all uses of bare DWORD or 'unsigned int' types
representing flags with CORJIT_FLAGS, which is now an opaque type. All
flag names were renamed from CORJIT_FLG_* to CORJIT_FLAG_* to ensure all
cases were changed to use the new names, which are also scoped within the
CORJIT_FLAGS type itself.
Another motivation to do this, besides cleaner code, is to allow enabling the
SSE/AVX flags for x86. For x86, we had fewer bits available in the "first
word", so would have to either put them in the "second word", which, as
stated, was very much 2nd class and not plumbed through many usages, or
we could move other bits to the "second word", with the same issues. Neither
was a good option.
RyuJIT compiles with both COR_JIT_EE_VERSION > 460 and <= 460. I introduced
a JitFlags adapter class in jitee.h to handle both JIT flag types. All JIT
code uses this JitFlags type, which operates identically to the new
CORJIT_FLAGS type.
In addition to introducing the new CORJIT_FLAGS type, the SSE/AVX flags are
enabled for x86.
The JIT-EE interface GUID is changed, as this is a breaking change.
|
|
|
|
|
|
Direct Call Instruction : (movw, movt, blx reg)
Indirect Call Instruction : (bl +-imm24)
It is pretty hard to determine direct/indirect instructions for general case.
However for recursive calls we can safely estimate the jump distance
as we know relative distance between jump source and destination.
So this change will generate direct call instructions for recursive calls
when jump distance is close enough.
Plus, it directly jumps to the function without the prestub.
For #7002
|