Age | Commit message (Collapse) | Author | Files | Lines |
|
Use DispatchToken::CreateDispatchToken to get token to resolve
virtual method in case of non interface MT.
|
|
Patch vtable slots and similar when tiering is enabled
For a method eligible for code versioning and vtable slot backpatch:
- It does not have a precode (`HasPrecode()` returns false)
- It does not have a stable entry point (`HasStableEntryPoint()` returns false)
- A call to the method may be:
- An indirect call through the `MethodTable`'s backpatchable vtable slot
- A direct call to a backpatchable `FuncPtrStub`, perhaps through a `JumpStub`
- For interface methods, an indirect call through the virtual stub dispatch (VSD) indirection cell to a backpatchable `DispatchStub` or a `ResolveStub` that refers to a backpatchable `ResolveCacheEntry`
- The purpose is that typical calls to the method have no additional overhead when code versioning is enabled
Recording and backpatching slots:
- In order for all vtable slots for the method to be backpatchable:
- A vtable slot initially points to the `MethodDesc`'s temporary entry point, even when the method is inherited by a derived type (the slot's value is not copied from the parent)
- The temporary entry point always points to the prestub and is never backpatched, in order to be able to discover new vtable slots through which the method may be called
- The prestub, as part of `DoBackpatch()`, records any slots that are transitioned from the temporary entry point to the method's at-the-time current, non-prestub entry point
- Any further changes to the method's entry point cause recorded slots to be backpatched in `BackpatchEntryPointSlots()`
- In order for the `FuncPtrStub` to be backpatchable:
- After the `FuncPtrStub` is created and exposed, it is patched to point to the method's at-the-time current entry point if necessary
- Any further changes to the method's entry point cause the `FuncPtrStub` to be backpatched in `BackpatchEntryPointSlots()`
- In order for VSD entities to be backpatchable:
- A `DispatchStub`'s entry point target is aligned and recorded for backpatching in `BackpatchEntryPointSlots()`
- The `DispatchStub` was modified on x86 and x64 such that the entry point target is aligned to a pointer to make it backpatchable
- A `ResolveCacheEntry`'s entry point target is recorded for backpatching in `BackpatchEntryPointSlots()`
Slot lifetime and management of recorded slots:
- A slot is recorded in the `LoaderAllocator` in which the slot is allocated, see `RecordAndBackpatchEntryPointSlot()`
- An inherited slot that has a shorter lifetime than the `MethodDesc`, when recorded, needs to be accessible by the `MethodDesc` for backpatching, so the dependent `LoaderAllocator` with the slot to backpatch is also recorded in the `MethodDesc`'s `LoaderAllocator`, see `MethodDescBackpatchInfo::AddDependentLoaderAllocator_Locked()`
- At the end of a `LoaderAllocator`'s lifetime, the `LoaderAllocator` is unregistered from dependency `LoaderAllocators`, see `MethodDescBackpatchInfoTracker::ClearDependencyMethodDescEntryPointSlots()`
- When a `MethodDesc`'s entry point changes, backpatching also includes iterating over recorded dependent `LoaderAllocators` to backpatch the relevant slots recorded there, see `BackpatchEntryPointSlots()`
Synchronization between entry point changes and backpatching slots
- A global lock is used to ensure that all recorded backpatchable slots corresponding to a `MethodDesc` point to the same entry point, see `DoBackpatch()` and `BackpatchEntryPointSlots()` for examples
Due to startup time perf issues:
- `IsEligibleForTieredCompilation()` is called more frequently with this change and in hotter paths. I chose to use a `MethodDesc` flag to store that information for fast retreival. The flag is initialized by `DetermineAndSetIsEligibleForTieredCompilation()`.
- Initially, I experimented with allowing a method versionable with vtable slot backpatch to have a precode, and allocated a new precode that would also be the stable entry point when a direct call is necessary. That also allows recording a new slot to be optional - in the event of an OOM, the slot may just point to the stable entry point. There are a large number of such methods and the allocations were slowing down startup perf. So, I had to eliminate precodes for methods versionable with vtable slot backpatch and that in turn means that recording slots is necessary for versionability.
|
|
|
|
- Replace JitHelpers.UnsafeCastToStackPtr with Unsafe.AsPointer
- Delete PinningHelper that was duplicate of RawData helper class
|
|
|
|
* Implementation of R2R vtable call thunks. These thunks will fetch the target code pointer from the vtable of the input thisPtr, and jump to that address.
This is especially helpful with generics, since we can avoid a generic dictionary lookup cost for a simple vtable call.
Overall, these thunks cause the CPU to have less branch mispredictions, and give a small performance boost to vtable calls.
These stubs are under VirtualCallStubManager so that the managed debugger can handle stepping through them.
|
|
|
|
* Remove IsRemotingIntercepted methods that always return false.
* Remove GetOptionalMembersAllocationSize parameters that are always false.
* Remove references to context static.
Remove references in comments and methodnames.
* Remove RemotingVtsInfo.
|
|
* Remove context statics stuff part 1
This change removes all context statics stuff from the runtime since
context statics are not supported and this code was obsolete.
* Remove context statics stuff from the debugger code
|
|
Add MethodImplOptions.AggressiveOptimization and use it for tiering
Part of fix for https://github.com/dotnet/corefx/issues/32235
Workaround for https://github.com/dotnet/coreclr/issues/19751
- Added and set CORJIT_FLAG_AGGRESSIVE_OPT to indicate that a method is flagged with AggressiveOptimization
- For a method flagged with AggressiveOptimization, tiering uses a foreground tier 1 JIT on first call to the method, skipping the tier 0 JIT and call counting
- When tiering is disabled, a method flagged with AggressiveOptimization does not use r2r-pregenerated code
- R2r crossgen does not generate code for a method flagged with AggressiveOptimization
|
|
(#19427)
- Sealed Runtime makes `is RuntimeType` and similar checks faster. These checks are fairly common in reflection.
- Delete support for introspection only loads from the runtime. We do not plan to use in .NET Core. The support for introspection loads inherited from RuntimeType and thus it is incompatible with sealed RuntimeType.
|
|
* Partial R2R IBC fixes
- Log use method code access in all cases, not just when the method is JITed
- Add workaround for CONTRACT_VIOLATION that shows up in checked builds when collecting IBC data
- Make /ReadyToRun switch work for CoreLib
|
|
Apply tiering's call counting delay more broadly
Issues
- When some time passes between process startup and first significant use of the app, startup perf with tiering can be slower because the call counting delay is no longer in effect
- This is especially true when the process is affinitized to one cpu
Fixes
- Initiate and prolong the call counting delay upon tier 0 activity (jitting or r2r code lookup for a new method)
- Stop call counting for a called method when the delay is in effect
- Stop (and don't start) tier 1 jitting when the delay is in effect
- After the delay resume call counting and tier 1 jitting
- If the process is affinitized to one cpu at process startup, multiply the delay by 10
No change in benchmarks.
|
|
* Enable genFnCalleeRegArgs for Arm64 Varargs
Before the method would early out and incorrectly expect the usage
of all incoming arguments to be their homed stack slots. It is
instead possible for incoming arguments to be homed to different
integer registers.
The change will mangle the float types for vararg cases in the same
way that is done during lvaInitUserArgs and fgMorphArgs.
* Apply format patch
* Account for softfp case
* Address feedback
* Apply format patch
* Use standard function header for mangleVarArgsType
* Remove confusing comment
|
|
* Ajusted -> Adjusted
* alot -> a lot
* Ambigous -> Ambiguous
* amoun -> amount
* amoung -> among
* Amperstand -> Ampersand
* Anbody -> Anybody
* anddoens't -> and doesn't
* anme -> name
* annoations -> annotations
* annother -> another
* anothr -> another
* ansynchronous -> asynchronous
* anticpation -> anticipation
* anway -> anyway
* aother -> another
* Apparant -> Apparent
* appartment -> apartment
* appdmomain -> appdomain
* Appdomian -> Appdomain
* appdomin -> appdomain
* approproiate -> appropriate
* approprate -> appropriate
* approp -> appropriate
* appened -> appended
* appropiately -> appropriately
* appropraitely -> appropriately
* Apperantly -> Apparently
* approp. -> appropriate
* Approriate -> Appropriate
|
|
This change addresses races that cause spurious failures in when running
GC stress on multithreaded applications.
* Instruction update race
Threads that hit a gc cover interrupt where gc is not safe can race to
overrwrite the interrupt instruction and change it back to the original
instruction.
This can cause confusion when handling stress exceptions as the exception code
raised by the kernel may be determined by disassembling the instruction that
caused the fault, and this instruction may now change between the time the
fault is raised and the instruction is disassembled. When this happens the
kernel may report an ACCESS_VIOLATION where there was actually an attempt to
execute a priveledged instruction.
x86 already had a tolerance mechanism here where when gc stress was active
and the exception status was ACCESS_VIOLATION the faulting instruction would
be retried to see if it faults the same way again. In this change we extend
this to tolerance to cover x64 and also enable it regardless of the gc mode.
We use the exception information to further screen as these spurious AVs look
like reads from address 0xFF..FF.
* Instrumentation vs execution race
The second race happens when one thread is jitting a method and another is
about to call the method. The first thread finishes jitting and publishes the
method code, then starts instrumenting the method for gc coverage. While this
instrumentation is ongoing, the second thread then calls the method and hits
a gc interrupt instruction. The code that recognizes the fault as a gc coverage
interrupt gets confused as the instrumentation is not yet complete -- in
particular the m_GcCover member of the MethodDesc is not yet set. So the second
thread triggers an assert.
The fix for this is to instrument for GcCoverage before publishing the code.
Since multiple threads can be jitting a method concurrently the instrument and
public steps are done under a lock to ensure that the instrumentation and code
are consistent (come from the same thread).
With this lock in place we have removed the secondary locking done in
SetupGcCoverage as it is no longer needed; only one thread can be instrumenting
a given jitted method for GcCoverage.
However we retain a bailout` clause that first looks to see if m_GcCover is
set and if so skips instrumentation, as there are prejit and rejit cases where we
will retry instrumentation.
* Instruction cache flushes
In some cases when replacing the interrupt instruction with the original the
instruction cache was either not flushed or not flushed with sufficient length.
This possibly leads to an increased frequency of the above races.
No impact expected for non-gc stress scenarios, though some of the code changes
are in common code paths.
Addresses the spurious GC stress failures seen in #17027 and #17610.
|
|
* Fix x86 steady state tiered compilation performance
Also included - a few tiered compilation only test hooks + small logging fix for JitBench
Tiered compilation wasn't correctly implementing the MayHavePrecode and RequiresStableEntryPoint policy functions. On x64 this was a non-issue, but due to compact entrypoints on x86 it lead to methods allocating both FuncPtrStubs and Precodes. The FuncPtrStubs would never get backpatched which caused never ending invocations of the Prestub for some methods. Although such code still runs correctly, it is much slower than it needs to be. On MusicStore x86 I am seeing a 20% improvement in steady state RPS after this fix, bringing us inline with what I've seen on x64.
|
|
Fix trigger for tier 1 call counting delay
The trigger was taking into account all non-tier-1 JIT invocations to delay call counting, even for those methods that are not eligible for tiering. In the AllReady benchmark, some dynamic methods were being jitted frequently enough to not allow tier 1 call counting to begin. Fixed to count only eligible methods jitted at tier 0, such that methods not eligible for tiering don't interfere with the tiering heuristics.
|
|
* [armel tizen] Fixed dynamic code allocation for ARM
* Added comment
* Used ThumbCodeToDataPointer
* Changed to PCODEToPINSTR
|
|
I missed updating one of the callers of `ReadyToRunInfo::GetEntryPoint`
in #15801. Fix by making the last arg explicit so this kind of type
confusion error is less likely, and updating the missed call site.
Closes #16177.
|
|
Add flags to track the presence of ReadyToRun codegen in an assembly
and module.
Add jitting flags to indicate when a method is jitted because the
precompiled code was rejected, either by a profiler or by ReadyToRun
dependence tracking.
Together these can be used to distingish between
* methods jitted because their assemblies were not precompiled
* methods jitted because they were not precompiled in an otherwise
precompiled assembly
|
|
Enable tiered jitting for R2R methods
- Included R2R methods and generics over value types in CoreLib for tiered jitting. Tier 0 for R2R methods is the precompiled code if available, and tier 1 is selectively scheduled based on call counting.
- Added a delay before starting to count calls for tier 1 promotion. The delay is a short duration after frequent tier 0 jitting stops (current heuristic for identifying startup).
- Startup time and steady-state performance have improved on JitBench. There is a regression shortly following startup due to call counting and tier 1 jitting, for a short duration before steady-state performance stabilizes.
- Added two new config values, one for configuring the call count threshold for promoting to tier 1, and another for specifying the delay from the last tier 0 JIT invocation before starting to count calls
|
|
|
|
(#13933)
* Implement optimization case for CreateDictionaryLookupHelper
Signed-off-by: Hyung-Kyu Choi <hk0110.choi@samsung.com>
* Reenable mainv1/mainv2 tests
|
|
* Change jit notifications so that they pass the native code address. This fixes !bpmd so that it will set the correct breakpoint on tiered jitted methods.
* code review feedback
* don't handle OnCodeGenerated
|
|
* [RyuJIT/ARM32] Implement CreateDictionaryLookupHelper only via run-time helper
Implement CreateDictionaryLookupHelper only via run-time helper
* Add assertion for checking CORINFO_USEHELPER
|
|
* Support GDBJIT on NI/IL_STUBS
* Move tls_isSymReaderInProgress into gdbjit.cpp
|
|
Partially remove relocations from SECTION_Readonly
|
|
|
|
Resolve FEATURE_GDBJIT/FEATURE_INTERPRETER conflict
|
|
|
|
|
|
|
|
Fixes github issue 13019.
|
|
This makes tiered compilation work properly with profiler ReJIT, and positions the runtime to integrate other versioning related features together in the future. See the newly added code-versioning design-doc in this commit for more information.
Breaking changes for profilers: See code-versioning-profiler-breaking-changes.md for more details.
|
|
|
|
|
|
helpers (#12369)
* Remove direct usage of type handle in JIT_NewArr1, with except of retrieving template method table.
* Assert that array type descriptor is loaded when array object's method table is set.
* Pass template method tables instead of array type descriptors to array allocation helpers.
|
|
accessed from jit code for Linux ARM (#11963)
|
|
Fixes #9321 and deletes CleanupToDoList.cs
Delete unmanaged security implementation
|
|
|
|
These flags provides a hook to change the JIT policy in the future and diverge tier0/tier1 compilation from min_opt/speed_opt respectively.
|
|
Tiered compilation is a new feature we are experimenting with that aims to improve startup times. Initially we jit methods non-optimized, then switch to an optimized version once the method has been called a number of times. More details about the current feature operation are in the comments of TieredCompilation.cpp.
This is only the first step in a longer process building the feature. The primary goal for now is to avoid regressing any runtime behavior in the shipping configuration in which the complus variable is OFF, while putting enough code in place that we can measure performance in the daily builds and make incremental progress visible to collaborators and reviewers. The design of the TieredCompilationManager is likely to change substantively, and the call counter may also change.
|
|
|
|
|
|
Remove more dead native defines
|
|
|
|
The hidden argument should be always passed last for x86
|
|
Fix parameter order in UnboxingILStub for Shared Generic
|
|
Two changes:
a) R2R code wasn't being reported to the Rejit Manager when it was used, this is a simple fix in prestub.cpp. This makes the ReJit API work.
b) The bulk of the changes handle adding support for an inlining table to R2R so that ICorProfilerInfo6::EnumNgenMethodsInliningThisMethod can supply that information to profilers.
This was only tested on Windows thus far, but there is no apparent reason this change would be OS specific.
|