Age | Commit message (Collapse) | Author | Files | Lines |
|
* [WIP] Struct & SIMD improvements
- Enable CSE of struct values when handle is available (and add code to get the handle of HW SIMD types)
- Don't require block nodes for SIMD assignments
- Don't set `GTF_GLOB_REF` on `GT_OBJ` if it is local
- Set `lvRegStruct` on promoted SIMD fields
- Add tests for #19910 (fixed with this PR) and #3539 & #19438 (fixed with #21314)
- Additional cleanup
Fix #19910
|
|
* Handle addressing modes for HW intrinsics
Also, eliminate some places where the code size estimates were over-estimating.
Contribute to #19550
Fix #19521
|
|
* Clean `assertionprop.cpp`.
* Clean `codegencommon.cpp`.:
* Clean `codegenlinear.cpp`.
* Clean `compiler.cpp`.
* Clean `earlyprop.cpp`.
* Clean `gschecks.cpp`.
* Clean 'lclvars.cpp`.
* Clean `jiteh.cpp`.
* Clean `liveness.cpp`.
* Clean `hashbv.cpp`.
* Clean `gcinfo.cpp`.
* Clean `optimizer.cpp`.
|
|
* Make genStackLevel accessible from CodeGenInterface
* Initializing stackLevel before fist BasicBlock's code is generated
* Typo
* Removing extra line on comments
|
|
* Flag USING_SCOPE_INGO definition
* Enclosing siScope's related functions uses with USING_SCOPE_INFO flag definition check
* Encapsulating genSetScopeInfo when using siVarScope
* Moving comment inside flag defined block
* Include siScope/psiScope functions only when flag USING_SCOPE_INFO is defined
* Disable scope info
* Typo
* Adding comment flag name on #endif
* Remove redundant access levels/flags
* Repeating last accessibility level in case flag is disabled
* Setting use of siScope and psiScope as default way of reporting variable homes
|
|
|
|
* fix implicit constructor call
* extern c
format patch
* muti-line
* Remove direct constructor call
* Conversion
* Need paranthesis
* Return value on resize
* declspec(Thread)
* Ignore warnings for GCC
* Formatting issues
* Move cast to constant
|
|
* Add an assert to compNoGCHelperCallKillSet.
That registers that lose GC or byref values also are in compHelperCallKillSet return set.
* Move compNoGCHelperCallKillSet from compiler to emitter.
* Rename `compNoGCHelperCallKillSet` to `emitGetGCRegsKilledByNoGCCall`.
* Fix GCRegsKill sets for arm CORINFO_HELP_PROF_FCN_ENTER and CORINFO_HELP_PROF_FCN_LEAVE.
|
|
* Clean up some arm64 prolog/epilog code
1. For frame types which establish a frame pointer before the final
SP adjustment, there is no need to report the SP adjustment in unwind
data, as it is unused.
2. Added some comments, esp. related to frameType 1, explaining the
totalFrameSize limit.
3. Fixed frameType 3 #outsz check to > 504 versus >= 504; 504 is a legal
offset for STP/LDP.
4. Fix frameType 3 epilogs to always restore SP from FP. Should give more
prolog/epilog unwind code matching, and simplifies the code.
Closes #22056, #22058, #22057, #22476, #22478.
* Formatting
* Update comment
|
|
* Moving siVarLoc and siVarLocType from compiler.h to CodeGenInterface.h
* Encapsulating siVarLoc construction with siScope and LclVarDsc
* Encapsulating siVarLoc construction from psiScope
* Adding some argument description on genSetScopInfo header
* Changing const siVarLoc& to const siVarLoc* on eeSetLVInfo
* Changing siVarLoc& to siVarLoc* in genSetScopeInfo arguments
* Rename var in genSetScopeInfo header
|
|
|
|
Currently, all frame types place saved FP/LR at low addresses on the
frame, below the GS cookie. If a function has localloc, the dynamically
allocate and unsafe buffer will be lower than the saved FP/LR and
not the GS cookie won't properly protect the saved FP/LR.
This change introduces new frame types, used only for functions needing
a GS cookie and using localloc, saving FP/LR along with the rest of
the callee-saved registers at the top (highest addresses) of the frame,
above the GS cookie.
|
|
* Fix the scratch register used for arm64 stack probing
In some circumstances, genFnProlog might pick a callee-saved
register to be the initReg. This doesn't work for stack probing
since we might need a scratch register before we save the
callee-save regs. So, always use R9 (REG_SCRATCH) instead.
Fixes #22284
* Formatting
|
|
|
|
|
|
encodability
For ARM32/ARM64, the immediate offsets in addressing modes have
limited range that varies by instruction. A couple cases were not
checking for that range, leading to generating potentially
un-encodable instruction.
In particular, the test case shows a case where a very large frame in a function
with a stored generic context would fail on ARM64.
There are no code diffs from this change for ARM64, except we sometimes get
better assembly comments where the local variable referenced is annotated on
the store instruction. For ARM32, the "secret stub param" is now stored using
SP-relative addressing, not FP-relative, if possible (which we generally prefer
in main function bodies).
|
|
Expand GT_JCC/SETCC condition support
|
|
Some IR cleanup
|
|
* Use `saveNext` opcode on arm64.
* Support using of `save_next` on int/float border.
* Delete the extra requirement that an epilog sequences can't start from `save_next`.
* response feedback
|
|
Issue #18201 / Hackathon
|
|
|
|
Some upcoming changes to reduce tiering overhead require that directly
invoked virtual methods be called indirectly via their slot, so that the
method body can be updated and callers patched up by patching the method
table slot.
Existing code for x64 implicitly assumes that a GT_JMP indirect target address
is near enough to the call site that a 32 bit RIP-relative displacement will
work. We can ensure this is true by always generating a reloc (and hence
potentially a jump stub) -- unless the target happens to fit in 32 bits and
so can be addressed absolutely.
|
|
|
|
|
|
The actual checking had gotten lost between JIT32 and RyuJIT.
I fixed the "on return from function" case for x86/x64, and
the "around every call site" case for x86.
I removed the arm64 case because it's not easy to store SP to a
stack local or directly compare SP against a stack local without
a temporary. Also, for the fixed outgoing arg space ABIs (all but x86),
these checks don't seem too useful anyway, so I also removed the
arm case.
|
|
This is only used to hold a pointer to a BasicBlock for GenTreeBoundsChk and GenTreeIndexAddr. This doesn't serve any purpose and it does not behave like a real operand (e.g. it's not included in the linear order).
|
|
This special local variable is only needed on x86 when a function
contains localloc.
|
|
* Print Tier-0 or Tier-1 to JIT dump output
Make it very obvious we've been asked to generate Tier-0 code.
* Print Tier-0 or Tier-1 in assembly output, as appropriate
|
|
|
|
* delete isProfLeaveCB from arm signature
The previous implementation was done many years ago and I do not why it was done that way.
* extract GetSavedSet
* add isNoGCHelper
* delete isNoGC arg
* move declarations closer to their uses
* delete isGc from genEmitCall
* delete unused method declaration.
* add emitNoGChelper that accepts CORINFO_METHOD_HANDLE
* fix missed switch cases
* add function headers
* Fix feedback
* Fix feedback2
|
|
|
|
* Fix callKillSet for CORINFO_HELP_ASSIGN_BYREF.
on x64.
* Fix typos.
|
|
* call fgCheckArgCnt only from stackLevelSetter
* delete changing fgPtrArgCntMax from codegencommon
* delete fgPtrArgCntCur
* reset write phase only once
* delete gtStkDepth
* add headers for the new fucntions
* fix comments
|
|
* Fix warnings due to "strlen return type is size_t" in src/jit/emitarm.cpp src/jit/unwindarm.cpp
* Use ptrdiff_t disp in emitter::emitOutputInstr in src/jit/emitarm.cpp
* Compiler::gtHashValue should depend on host-bitness in src/jit/gentree.cpp
* Simplify checking using ImmedValNeedsReloc() in src/jit/lowerarmarch.cpp
* Use target_ssize_t immVal in Lowering::IsContainableImmed in src/jit/lowerarmarch.cpp
* Remove int offs and use BYTE* addr and %p specifier in emitter::emitDispInsHelp in IF_T2_J3 case in src/jit/emitarm.cpp
* Cast gtIconVal to target_size_t in CodeGen::genLclHeap in src/jit/codegenarm.cpp
* Use int argSize in CodeGen::genEmitCall in src/jit/codegen.h src/jit/codegenlinear.cpp
* Use ssize_t disp in emitter::emitIns_Call in src/jit/emitarm.cpp src/jit/emitarm.h
* Use int argSize in emitter::emitIns_Call in src/jit/emitarm.cpp src/jit/emitarm.h
* Use target_size_t return type in Compiler::eeGetPageSize Compiler::getVeryLargeFrameSize in src/jit/codegencommon.cpp src/jit/compiler.h
* Cast gtIconVal to unsigned in CodeGen::genCodeForShift CodeGen::genCodeForShiftLong in src/jit/codegenarm.cpp src/jit/codegenarmarch.cpp
* Cast gtIconVal to unsigned in DecomposeLongs::DecomposeRotate in src/jit/decomposelongs.cpp
* Use unsigned size in CodeGen::genConsumePutStructArgStk in src/jit/codegenlinear.cpp
* Use target_ssize_t stmImm in cast in CodeGen::genZeroInitFrame in src/jit/codegencommon.cpp
* Cast to target_ssize_t in Compiler::gtSetEvalOrder in src/jit/gentree.cpp
* Address PR feedbask - use dspPtr(addr) in src/jit/emitarm.cpp
|
|
We use the following format when print the BasicBlock number: bbNum
This define is used with string concatenation to put this in printf format strings
|
|
|
|
(#19544)
|
|
Remove almost all of the code in the jit that tries to maintain local ref
counts incrementally. Also remove `lvaSortAgain` and related machinery.
Explicitly sort locals before post-lower-liveness when optimizing to get the
best set of tracked locals.
Explicitly recount after post-lower liveness to get accurate counts after
dead stores. This can lead to tracked unreferenced arguments; tolerate this
during codegen.
|
|
* Add support to use an indirected address for JMP instructions to ARM64
- Merge logic between ARM and ARM64
|
|
* Add IF_T2_N3 instruction form and make this a specific case of IF_T2_N when EA_IS_RELOC(attr) is true
* Move "movw/movt reg,relocatableImm" case to function emitIns_MovRelocatableImmediate
* Introduce new instruction descriptor instrDescReloc
* Delete unused CnsVal from ARM32 and ARM64 emitters
* Introduce target_ssize_t and use this type for non-relocatable constants
|
|
This is a preparatory change for auditing and controlling how local
variable ref counts are observed and manipulated.
See #18969 for context.
No diffs seen locally. No TP impact expected.
There is a small chance we may see some asserts in broader testing
as there were places in original code where local ref counts were
incremented without checking for possible overflows. The new APIs
will assert for overflow cases.
|
|
* Enable genFnCalleeRegArgs for Arm64 Varargs
Before the method would early out and incorrectly expect the usage
of all incoming arguments to be their homed stack slots. It is
instead possible for incoming arguments to be homed to different
integer registers.
The change will mangle the float types for vararg cases in the same
way that is done during lvaInitUserArgs and fgMorphArgs.
* Apply format patch
* Account for softfp case
* Address feedback
* Apply format patch
* Use standard function header for mangleVarArgsType
* Remove confusing comment
|
|
Temporaries are only used during register allocation and code generation. They waste space (136 bytes) in the compiler object during inlining.
|
|
Passing CompAllocator objects by value is advantageous because it no longer needs to be dynamically allocated and cached. CompAllocator instances can now be freely created, copied and stored, which makes adding new CompMemKind values easier.
Together with other cleanup this also improves memory allocation performance by removing some extra levels of indirection that were previously required - jitstd::allocator had a pointer to CompAllocator, CompAllocator had a pointer to Compiler and Compiler finally had a pointer to ArenaAllocator. Without MEASURE_MEM_ALLOC enabled, both jitstd::allocator and CompAllocator now just contain a pointer to ArenaAllocator. When MEASURE_MEM_ALLOC is enabled CompAllocator also contains a pointer but to a MemStatsAllocator object that holds the relevant memory kind. This way CompAllocator is always pointer sized so that enabling MEASURE_MEM_ALLOC does not result in increased memory usage due to objects that store a CompAllocator instance.
In order to implement this, 2 additional signficant changes have been made:
* MemStats has been moved to ArenaAllocator, it's after all the allocator's job to maintain statistics. This also fixes some issues related to memory statistics, such as not tracking the memory allocated by the inlinee compiler (since that one used its own MemStats instance).
* Extract the arena page pooling logic out of the allocator. It doesn't make sense to pool an allocator, it has very little state that can actually be reused and everyting else (including MemStats) needs to be reset on reuse. What really needs to be pooled is just a page of memory.
Since this was touching allocation code the opportunity has been used to perform additional cleanup:
* Remove unnecessary LSRA ListElementAllocator
* Remove compGetMem and compGetMemArray
* Make CompAllocator and HostAllocator more like the std allocator
* Update HashTable to use CompAllocator
* Update ArrayStack to use CompAllocator
* Move CompAllocator & friends to alloc.h
|
|
|
|
* Separate sections READONLY_VCHUNKS and READONLY_DICTIONARY
* Remove relocations for second-level indirection of Vtable in case FEATURE_NGEN_RELOCS_OPTIMIZATIONS is enabled.
Introduce FEATURE_NGEN_RELOCS_OPTIMIZATIONS, under which NGEN specific relocations optimizations are enabled
* Replace push/pop of R11 in stubs with
- str/ldr of R4 in space reserved in epilog for non-tail calls
- usage of R4 with hybrid-tail calls (same as for EmitShuffleThunk)
* Replace push/pop of R11 for function epilog with usage of LR as helper register right before its restore from stack
|
|
* [ARM64|Windows|Vararg] Add FEATURE_ARG_SPLIT
Enable splitting >8 byte <= 16 byte structs for arm64 varargs
between x7 and virtual stack slot 0.
* Force notHfa for vararg methods
* Correctly pass isVararg
* Correct var name
|
|
* Fix passing HFA of two floats to vararg methods
Previously, the type would be reported as HFA and enregistered; however,
this is not correct, as arm64 varargs abi requires passing using
int registers.
* Address linux build issue
* Apply final format patch
* Add _TARGET_WINDOWS_
|
|
* Unify struct arg handling
Eliminate unnecessary struct copies, especially on Linux, and reduce code duplication.
Across all targets, use GT_FIELD_LIST to pass promoted structs on stack, and avoid
requiring a copy and/or marking `lvDoNotEnregister` for those cases.
Unify the specification of multi-reg args:
- numRegs now indicates the actual number of reg args (not the size in pointer-size units)
- regNums contains all the arg register numbers
|
|
* Cleanup and remove unused parameters from genCreateAddrMode, fixes #18177
|