summaryrefslogtreecommitdiff
path: root/src/jit/target.h
AgeCommit message (Collapse)AuthorFilesLines
2018-04-14Avoid creating illegal byref pointers (#17524)Bruce Forstall1-0/+15
Byref pointers need to point within their "host" object -- thus the alternate name "interior pointers". If the JIT creates and reports a pointer as a "byref", but it points outside the host object, and a GC occurs that moves the host object, the byref pointer will not be updated. If a subsequent calculation puts the byref "back" into the host object, it will actually be pointing to garbage, since the host object has moved. This occurred on ARM with array index calculations, in particular because ARM doesn't have a single-instruction "base + scale*index + offset" addressing mode. Thus, we were generating, for the jaggedarr_cs_do test case, `ProcessJagged3DArray()` function: ``` // r0 = array object, r6 = computed index offset. We mark r4 as a byref. add r4, r0, r6 // r4 - 32 is the offset of the object we care about. Then we load the array element. // In this case, the loaded element is a gcref, so r4 becomes a gcref. ldr r4, [r4-32] ``` We get this math because the user code uses `a[i - 10]`, which is essentially `a + (i - 10) * 4 + 8` for element size 4. This is optimized to `a + i * 4 - 32`. In the above code, `r6` is `i * 4`. In this case, after the first instruction, `r4` can point beyond the array. If a GC happens, `r4` isn't updated, and the second instruction loads garbage. There are several fixes: 1. Change array morphing in `fgMorphArrayIndex()` to rearrange the array index IR node creation to only create a byref pointer that is precise; don't create "intermediate" byref pointers that don't represent the actual array element address being computed. The tree matching code that annotates the generated tree with field sequences needs to be updated to match the new form. 2. Change `fgMoveOpsLeft()` to prevent the left-weighted reassociation optimization `[byref]+ (ref, [int]+ (int, int)) => [byref]+ ([byref]+ (ref, int), int)`. This optimization creates "incorrect" byrefs that don't necessarily point within the host object. 3. Add an additional condition to the `Fold "((x+icon1)+icon2) to (x+(icon1+icon2))"` morph optimization to prevent merging of constant TYP_REF nodes, which now were being recognized due to different tree shapes. This was probably always a problem, but the particular tree shape wasn't seen before. These fixes are all-platform. However, to reduce risk at this point, the are enabled for ARM only, under the `FEATURE_PREVENT_BAD_BYREFS` `#ifdef`. Fixes #17517. There are many, many diffs. For ARM32 ngen-based desktop asm diffs, it is a 0.30% improvement across all framework assemblies. A lot of the diffs seem to be because we CSE the entire array address offset expression, not just the index expression.
2018-03-30Tighten arm32/arm64 write barrier kill reg setsBruce Forstall1-58/+94
The JIT write barrier helpers have a custom calling convention that avoids killing most registers. The JIT was not taking advantage of this, and thus was killing unnecessary registers when a write barrier was necessary. In particular, some integer callee-trash registers are unaffected by the write barriers, and no floating-point register is affected. Also, I got rid of the `FEATURE_WRITE_BARRIER` define, which is always set. I also put some code under `LEGACY_BACKEND` for easier cleanup later. I removed some unused defines in target.h for some platforms.
2018-03-30[arm32] Fixed RBM_PROFILER_* (#17291)sergey ignatov1-0/+5
* [arm32] Fixed RBM_PROFILER_* * Changed trash registers to RBM_NONE
2018-03-28Add crossbitness support to ClrJit:Egor Chesakov1-0/+8
* Add FEATURE_CROSSBITNESS in crosscomponents.cmake * Exclude mscordaccore mscordbi sos from CLR_CROSS_COMPONENTS_LIST when FEATURE_CROSSBITNESS is defined in crosscomponents.cmake * Introduce target_size_t in src/jit/target.h * Use size_t value in genMov32RelocatableImmediate in src/jit/codegen.h src/jit/codegencommon.cpp * Fix definition/declaration inconsistency for emitter::emitIns_R_I in emitarm.cpp * Zero HiVal when GetTree::SetOper GenTreeLngCon->GetTreeIntCon in src/jit/compiler.hpp * Explicity specify roundUp(expr, TARGET_POINTER_SIZE) * Use target_size_t* target in emitOutputDataSec in src/jit/emit.cpp
2018-01-10Fix ARM GCStress hole with byref write barrier helperBruce Forstall1-6/+12
When unrolling a STOREOBJ, we can generate multiple consecutive byref helper calls. This helper has a unique calling convention where the dst and src addresses are modified by adding pointer size to their original value (thus allowing consecutive helper calls without reloading the dst/src addresses). So, for liveness purposes, the helper call kills the dst/src values. However, for GC purposes, it does not, as the registers still contain byref pointers. We were, in the ARM case, reporting the r0/r1 registers dead after the first call, so a GC didn't update them, and a second call updated garbage. In fixing this, I cleaned up the helper call kill handling a bit. I also fixed and improved RyuJIT/x86 write barrier kill modeling.
2017-11-17Fix RyuJIT/arm32 GS cookie check before JMP callBruce Forstall1-0/+10
The GS cookie check was using r2/r3 registers, after they had been reloaded as outgoing argument registers for the JMP call, thus trashing them. Change the temp regs used to r12/lr, the only non-argument, non-callee-saved registers available on arm32. Partially fixes #14862
2017-11-06ARM64: Fix two register selection issuesCarol Eidt1-1/+1
On ARM64 IP0 and IP1 are not in the register selection order, though there are some cases where they must be allocated. See #14607. So we may see them as free when looking for a register to spill. Also, V15 was missing from the selection order (V16 was in the order twice). Fix #14626
2017-10-25Avoid allocating IP0 and IP1Carol Eidt1-1/+0
Revert to prior behavior.
2017-10-23[Arm64] SIMD simple defines (#14628)Steve MacLean1-0/+5
* [Arm64] SIMD simple defines * Fix #else
2017-10-20Merge pull request #14606 from CarolEidt/Fix14591Carol Eidt1-0/+1
LSRA Arm64 consistent reg sets
2017-10-20[RyuJIT/ARM32] Fast tail call: code generation (#14445)Hyeongseok Oh1-0/+4
* Codegen for fast tail call Codegen call and epilog for fast tail call * Implementation for GT_START_NONGC This implementation removes two NYI_ARM Code generation for GT_START_NONGC which is used to prevent GC in fast tail call * Define fast tail call target register and mask on ARMARCH Define REG_FASTTAILCALL_TARGET and RBM_FASTTAILCALL_TARGET on ARMARCH Modify lsra init and codegen to use these definition * Merge genFnEpilog Merge genFnEpilog for ARM32 and ARM64 * Fix bug in getFirstArgWithStackSlot Fix bug in getFirstArgWithStackSlot: AMD64 and X86
2017-10-19LSRA Arm64 consistent reg setsCarol Eidt1-0/+1
tryAllocateFreeReg() uses the RegOrder array to iterate over the available registers. This needs to be consistent with the available registers of the given type. Otherwise, allocateBusyReg() will assert when it finds a free register that should have been allocated in tryAllocateFreeReg(). Fix #14591
2017-10-03remove FEATURE_AVX_SUPPORT flagFei Peng1-2/+2
2017-09-25[Arm64] Use GTF_SET_FLAGS/GTF_USE_FLAGSSteve MacLean1-1/+1
2017-09-22Merge pull request #14139 from sdmaclea/PR-ARM64-EMIT-CBxZ-TBxZBrian Sullivan1-0/+3
[Arm64] Add emitters for cbz, cbnz, tbz, and tbnz
2017-09-22[Arm64] Add emitters for cbz, cbnz, tbz, or tbnzSteve MacLean1-0/+3
2017-09-20Merge pull request #13541 from ↵Bruce Forstall1-0/+4
hqueue/arm/ryujit/issue_12614_enable_unrolling_for_cpblk [RyuJIT/ARM32] enable unrolling for cpblk
2017-09-13[RyuJIT/ARM32] Enable unrolling for cpblkHyung-Kyu Choi1-0/+4
- Define INITBLK_UNROLL_LIMIT and CPBLK_UNROLL_LIMIT for ARM32 - Introduce same helper for codegen to ARM32 from ARM64, i.e. genCodeForLoadOffset() and genCodeForStoreOffset() - Introduce genCodeForCpBlkUnroll() to ARM32 from ARM64. - Enable unrolling for cpblk
2017-09-05implementing profiler ELT callbacks for AMD64 Linux (#12603)sergey ignatov1-1/+8
* implement profiler ELT callbacks for AMD64 Linux * Some formatting fixes * Fixed profiler * Added aligning frame option * Added aligning stack for quad values stores
2017-06-12make REG_VIRTUAL_STUB_PARAM depended on Abi. (#12209)Sergey Andreenko1-20/+0
* make REG_VIRTUAL_STUB_PARAM depended on Abi. Add VirtualStubParamInfo that is part of compiler and must be init in compCompile.
2017-06-09delete DECLARE_TYPED_ENUM (#12177)Sergey Andreenko1-41/+41
* delete DECLARE_TYPED_ENUM delete the workaroung for g++ c++11, that was fixes in gcc 4.4.1 many years ago. The workaround makes code dirty and sometimes we have typos like: }; END_DECLARE_TYPED_ENUM(insFlags,unsigned) or END_DECLARE_TYPED_ENUM(ChunkExtraAttribs, BYTE); with double ;; * jit-format
2017-06-07Make containedness explicitCarol Eidt1-2/+11
Eliminate the use of GTF_REG_VAL in RyuJIT and reuse the bit to mark nodes as contained. Abstract GTF_REG_VAL for legacy backend.
2017-05-26[RyuJIT/ARM32] Update RMB for helper functionHyung-Kyu Choi1-2/+2
Make RBM_CALLEE_TRASH be consistent with CORINFO_HELP_STOP_FOR_GC helper which is JIT_RareDisableHelper defined in vm/arm/asmhelpers.S Signed-off-by: Hyung-Kyu Choi <hk0110.choi@samsung.com>
2017-05-23[RyuJIT/ARM][LSRA] Update register mask for GC helperHyung-Kyu Choi1-1/+1
CORINFO_HELP_STOP_FOR_GC helper preserves integer and double return register. Signed-off-by: Hyung-Kyu Choi <hk0110.choi@samsung.com>
2017-05-12Merge pull request #10972 from hqueue/arm/ryujit/lsraCarol Eidt1-2/+2
[Ryujit/ARM32] LSRA support for arm32 floating-point register allocation
2017-05-11[RyuJIT/ARM32] Implement for GT_STORE_OBJ (#10721)Sujin Kim1-0/+7
* Implement lowering for GT_STORE_OBJ In #10657, I commented that the messages for NYI were printed about GT_STORE_OBJ on running the CodeGenBringUpTests. 'Lowering::LowerBlockStore(GenTreeBlk* blkNode)' method implementation is just copied. but after lowering phase, in code generation, codegenarm.cpp, below would be used. ```cpp blkNode->gtBlkOpKind = GenTreeBlk::BlkOpKindUnroll; ``` ```cpp blkNode->gtBlkOpKind = GenTreeBlk::BlkOpKindHelper; ``` ```cpp void CodeGen::genCodeForStoreBlk(GenTreeBlk* blkOp) { if (blkOp->gtBlkOpGcUnsafe) { getEmitter()->emitDisableGC(); } bool isCopyBlk = blkOp->OperIsCopyBlkOp(); switch (blkOp->gtBlkOpKind) { case GenTreeBlk::BlkOpKindHelper: if (isCopyBlk) { genCodeForCpBlk(blkOp); } else { genCodeForInitBlk(blkOp); } break; case GenTreeBlk::BlkOpKindUnroll: if (isCopyBlk) { genCodeForCpBlkUnroll(blkOp); } else { genCodeForInitBlkUnroll(blkOp); } break; default: unreached(); } if (blkOp->gtBlkOpGcUnsafe) { getEmitter()->emitEnableGC(); } } ``` 'genCodeForCpBlk' and 'genCodeForInitBlk' are implemented in ARM/ARM64 by MEMCPY/MEMSET but 'genCodeForCpBlkUnroll' and 'genCodeForInitBlkUnroll' are not implemented both ARM and ARM64. Therefore those need to implement. * Implement NYI : GT_STORE_OBJ is needed of write barriers implementation It was copied from ARM64 and removed codes related with gc write barrier which doesn't support on ARM. * Implement CodeGen::genCodeForCpObj * Refactor some codes * Use INS_OPTS_LDST_POST_INC option for post-indexing When structure is copied, the results of asm codes have been strange. IN0013: 000048 ldr r3, [r1+4] IN0014: 00004A str r3, [r0+4] IN0015: 00004C ldr r3, [r1+4] IN0016: 00004E str r3, [r0+4] It needs that the index would increment or the post-indexing. So I use INS_OPTS_LDST_POST_INC option for post-indexing when the instruction is emitted. * Fix conflicts * Fix conflicts and Apply #11219 I want to merge genCodeForCpObj to codegenarmarch.cpp but the function has modified in #11219. So I decided to keep the code divided now. In future, If modifying the function on ARM is also needed, it would be able to modify. * Fix conflicts * Remove NYI * Fix genCountBits assertion
2017-05-09Merge pull request #11406 from sdmaclea/PR-ARM64-ENABLE-FEATURE_TAILCALL_OPTBruce Forstall1-1/+1
[Arm64] Enable FEATURE_TAILCALL_OPT
2017-05-05[Arm64/Unix] Enable FEATURE_USE_SOFTWARE_WRITE_WATCH_FOR_GC_HEAP (#11375)Steve MacLean1-1/+1
* [Arm64/Unix] Enable FEATURE_USE_SOFTWARE_WRITE_WATCH_FOR_GC_HEAP * [Arm64/Unix] Enable FEATURE_MANUALLY_MANAGED_CARD_BUNDLES
2017-05-04[Arm64] Enable FEATURE_TAILCALL_OPTSteve MacLean, Qualcomm Datacenter Technologies, Inc1-1/+1
2017-04-14[Ryujit/ARM32] LSRA compute RegRecords for double registerHyung-Kyu Choi1-2/+2
* When computing RegRecords at BasicBlock entry, let's consider overlapping floating and double registers. Signed-off-by: Hyung-Kyu Choi <hk0110.choi@samsung.com>
2017-04-07Merge pull request #10656 from hseok-oh/ryujit/fix_10654Bruce Forstall1-0/+4
[RyuJIT/ARM32] [ReadyToRun] Fix target register for invocation to Thunk
2017-04-07Use _TARGET_ARMARCH_Hyeongseok Oh1-1/+0
Chagne _TARGET_ARM_ and _TARGET_ARM64_ to _TARGET_ARMARCH_
2017-04-06Modify THUNK_PARAM generated in SaveWorkHyeongseok Oh1-0/+5
- Use parameter r4 to pass Indirection from code generated by R2R - Define REG_R2R_INDIRECT_PARAM in ARM32 to merge with ARM64 routine
2017-04-05Remove unused PREDICT_REG_RER_INDIRECT_PARAM defineBruce Forstall1-1/+0
2017-03-13Build Linux altjit for x86 and amd64 (#10120)Bruce Forstall1-17/+3
Enable Windows hosted, Linux target amd64 altjit With this change, we build a JIT that runs on Windows amd64 and targets Linux amd64, as an altjit named linuxnonjit.dll. This is useful for debugging, or generating asm code or diffs. You can even easily create Windows/non-Windows asm diffs (either to compare the asm, or compare the generated code size). For this to work, the JIT-EE interface method getSystemVAmd64PassStructInRegisterDescriptor() was changed to always be built in, by defining `FEATURE_UNIX_AMD64_STRUCT_PASSING_ITF` in all AMD64 builds. The `_ITF` suffix indicates that this is functionality specific to implementing the JIT-EE interface contract. There were many places in the VM that used this interchangeably with `FEATURE_UNIX_AMD64_STRUCT_PASSING`. Now, `FEATURE_UNIX_AMD64_STRUCT_PASSING` means code in the VM needed to implement this feature, but not required to implement the JIT-EE interface contract. In particular, MethodTables compute and cache the "eightbyte" info of structs when loading a type. This is not done when only `FEATURE_UNIX_AMD64_STRUCT_PASSING_ITF` is set, to avoid altering MethodTable behavior on non-Unix AMD64 builds. Instead, if `getSystemVAmd64PassStructInRegisterDescriptor()` is called on a non-Unix build (by the altjit), the `ClassifyEightBytes()` function is called, and nothing is cached. Hopefully (though it was hard for me to guarantee by observation), calling `ClassifyEightBytes()` does not have any side effects on MethodTables. It doesn't really matter, since if called for altjit, we don't care too much about running. The previously used `PLATFORM_UNIX` define is now insufficient. I introduced the `#define` macros `_HOST_UNIX_` to indicate the JIT being built will run on Unix, and `_TARGET_UNIX_` to indicate the JIT is generating code targeting Unix. Some things were converted to use the `UNIX_AMD64_ABI` define, which makes more sense.
2017-03-01RyuJIT/ARM32: Enable P/Invoke lowering.Mikhail Skvortcov1-1/+1
2017-02-23Merge pull request #9681 from mskvortsov/ryujit-arm32-reloadBruce Forstall1-0/+4
[RyuJIT/ARM32] Fix helper kill mask and call target consuming
2017-02-23RyuJIT/ARM32: Fix helper kill mask and call target consumingMikhail Skvortcov1-0/+4
2017-02-21Fix non-Windows amd64 register mask initializationBruce Forstall1-6/+6
The arrays were being initialized with register numbers, not register masks. The arrays are only used on non-Windows amd64 for homing circular incoming argument conflicts. Perhaps we never saw this happen?
2017-02-09RyuJIT/ARM32: enable DecomposeLongs phaseMikhail Skvortcov1-0/+8
2017-02-07[x86/Linux] Stack align 16 bytes for JIT codeSaeHie Park1-0/+6
Change JIT code to align stack in 16 byte used in modern compiler
2017-01-23[x86/Linux] Enable FEATURE_EH_FUNCLETS (#8889)Jonghyun Park1-0/+4
* [x86/Linux] (Partially) Enable FEATURE_EH_FUNCLETS * Update CLR ABI Document * Add TODO (for Funclet Prolog/Epilog Gen)
2017-01-11[x86/Linux] Introduce UNIX_X86_ABI definition (#8863)SaeHie Park1-0/+7
Add UNIX_X86_ABI definition for Unix/Linux specific ABI parts First will be for 16 byte stack alignment codes
2016-12-22ARM: A step towards the RyuJIT/ARM32 backend.Mikhail Skvortcov1-0/+3
2016-12-02RyuJIT/x86: Implement TYP_SIMD12 supportBruce Forstall1-0/+3
There is no native load/store instruction for Vector3/TYP_SIMD12, so we need to break this type down into two loads or two stores, with an additional instruction to put the values together in the xmm target register. AMD64 SIMD support already implements most of this. For RyuJIT/x86, we need to implement stack argument support (both incoming and outgoing), which is different from the AMD64 ABI. In addition, this change implements accurate alignment-sensitive codegen for all SIMD types. For RyuJIT/x86, the stack is only 4 byte aligned (unless we have double alignment), so SIMD locals are not known to be aligned (TYP_SIMD8 could be with double alignment). For AMD64, we were unnecessarily pessimizing alignment information, and were always generating unaligned moves when on AVX2 hardware. Now, all SIMD types are given their preferred alignment in getSIMDTypeAlignment() and alignment determination in isSIMDTypeLocalAligned() takes into account stack alignment (it still needs support for x86 dynamic alignment). X86 still needs to consider dynamic stack alignment for SIMD locals. Fixes #7863
2016-10-30Fix P/Invoke cookie passing on x86.Pat Gavlin1-0/+3
On x86, the P/Invoke cookie (when required) is passed on the stack after all other stack arguments (if any).
2016-10-27Introduce new CORJIT_FLAGS typeBruce Forstall1-4/+4
The "JIT flags" currently passed between the EE and the JIT have traditionally been bit flags in a 32-bit word. Recently, a second 32-bit word was added to accommodate additional flags, but that set of flags is definitely "2nd class": they are not universally passed, and require using a separate set of bit definitions, and comparing those bits against the proper, 2nd word. This change replaces all uses of bare DWORD or 'unsigned int' types representing flags with CORJIT_FLAGS, which is now an opaque type. All flag names were renamed from CORJIT_FLG_* to CORJIT_FLAG_* to ensure all cases were changed to use the new names, which are also scoped within the CORJIT_FLAGS type itself. Another motivation to do this, besides cleaner code, is to allow enabling the SSE/AVX flags for x86. For x86, we had fewer bits available in the "first word", so would have to either put them in the "second word", which, as stated, was very much 2nd class and not plumbed through many usages, or we could move other bits to the "second word", with the same issues. Neither was a good option. RyuJIT compiles with both COR_JIT_EE_VERSION > 460 and <= 460. I introduced a JitFlags adapter class in jitee.h to handle both JIT flag types. All JIT code uses this JitFlags type, which operates identically to the new CORJIT_FLAGS type. In addition to introducing the new CORJIT_FLAGS type, the SSE/AVX flags are enabled for x86. The JIT-EE interface GUID is changed, as this is a breaking change.
2016-10-23Delete _TARGET_SET_ macroJan Kotas1-5/+0
2016-10-19Enable Enter/Leave/Tailcall hooks for RyuJIT/x86Bruce Forstall1-3/+10
2016-09-28[ARM] Generate direct call instructions for recursive callsHanjoung Lee1-0/+3
Direct Call Instruction : (movw, movt, blx reg) Indirect Call Instruction : (bl +-imm24) It is pretty hard to determine direct/indirect instructions for general case. However for recursive calls we can safely estimate the jump distance as we know relative distance between jump source and destination. So this change will generate direct call instructions for recursive calls when jump distance is close enough. Plus, it directly jumps to the function without the prestub. For #7002