Age | Commit message (Collapse) | Author | Files | Lines |
|
FNV images
|
|
* Implement instantiating and unboxing through portable stublinker code
- Handle only the cases with register to register moves
- Shares abi processing logic with delegate shuffle thunk creation
- Architecture specific logic is relatively simple
- Do not permit use of HELPERREG in computed instantiating stubs
- Fix GetArgLoc such that it works on all architectures and OS combinations
Add a JIT stress test case for testing all of the various combinations
- Use the same calling convention test architecture that was used as part of tail call work
Rename secure delegates to wrapper delegates
- Secure delegates are no longer a feature of the runtime
- But the wrapper delegate lives on as a workaround for a weird detail of the ARM32 abi
|
|
This fix is to update usages of SetupGcCoverage() under
FEATURE_PREJIT aligned to the signature change in #25261.
|
|
processing during shutdown (#27241)
* Protect against a rare invalid lock acquision attempt during etw processing during abrupt shutdown
Targeted and partial fix for https://github.com/dotnet/coreclr/issues/27129
- This is not a generic fix for the issue above, it is only a very targeted fix for an issue seen (a new issue introduced in 3.x). For a generic fix and more details, see the fix in 5.0: https://github.com/dotnet/coreclr/pull/27238.
- This change avoids taking a lock during process detach - a point in time when all other threads have already been abruptly shut down by the OS and locks may have been orphaned.
- The issue leads to a hang during shutdown when ETW tracing is enabled and the .NET process being traced begins the shutdown sequence at an unfortunate time - this is a probably rare timing issue. It would take the shutdown sequence to begin at just the point when a thread holds a particular lock and is terminated by the OS while holding the lock, then the OS sends the process detach event to the CLR, work during which then tries to acquire the lock and cannot because it is orphaned.
- The generic fix has broader consequences and is unlikely to be a reasonable change to make so late in the cycle, such a change needs some bake time and feedback. Hence this targeted fix for 3.x.
* Report tier as unknown when it cannot be determined
* Return unknown only on process detach
|
|
(#25261)
- SOS changes are in https://github.com/dotnet/diagnostics/pull/369
- Fixes https://github.com/dotnet/coreclr/issues/17646
|
|
* Add DOTNET_TRACE_CONTEXT and change macros to use DOTNET_TRACE_CONTEXT instead of MCGEN_TRACE_CONTEXT
* Fixing macro definitions
* eventing codegen scripts now generates EventPipe trace contexts
* Fix macros to use the EVENTPIPE_TRACE_Context
* Fix linux build
* Fix windows build
* Update Eventpipe provider context at EtwCallbackComon
* break in switch
* Update rundown provider context manually
* PR feedback
* Eventpipe->EventPipe
* cleanup in codegen script
|
|
* Basic support for precompiled pinvoke stubs
* Generate R2R file with multiple references to same IL stub (one per method which the IL stub is associated with)
* Not all il stubs are p/invokes. Don't fail when they aren't.
* Consistently use IsDynamicScope and GetModule to avoid unsafe memory access in IL stub compilation paths
* Enable full p/invoke il stubs when compiling System.Private.Corelib
* Disable IL Stub generation in crossgen for ARM32.
- The cross bitness logic is not correct for IL Stub generation
|
|
Add optimization tiers to the Linux perf maps for perfcollect
Fixes https://github.com/dotnet/coreclr/issues/23222:
- It looks like module unloads are currently not taken into account. Once they would be taken into account, Although we have method JIT events from `lttng` with the code address and optimization tier, samples can only be associated with method JIT events by associating the time range when the module is loaded with times of samples, and the event times from `lttng` would not necessarily correspond with times from samples taken by `perf`.
- Updated to include the optimization tier in the perf map for each jitted or R2R method code address
- Refactored common code between eventtrace and perfmap for getting jit tiers
|
|
Add some perf events/data for tiered compilation
New events:
- `Settings` - Sent when TC is enabled
- `Flags` - Currently indicates whether QuickJit and QuickJitForLoops are enabled
- `Pause` - Sent when TC is paused (due to a new method being called for the first time)
- `Resume` - Sent when TC resumes
- `NewMethodCount` - Number of methods called for the first time while tiering was paused
- `BackgroundJitStart` - Sent when starting to JIT methods in the background
- `PendingMethodCount` - Number of methods currently scheduled for background JIT
- `BackgroundJitStop` - Sent when background jitting stops
- `PendingMethodCount` - Same as above. When 0, background jitting has completed.
- `JittedMethodCount` - Number of methods jitted in the background since the previous BackgroundJitStart event on the same thread
Miscellaneous:
- Updated method JIT events to include the optimization tier
- Added a couple more cases where tiered compilation is disabled for methods that have JIT optimization disabled for some reason
- Renamed `Duration` field of the new version of the `ContentionEnd` to `DurationNs` to indicate the units of time
- Added `OptimizationTierOptimized` to `NativeCodeVersion::OptimizationTier` to distinguish it from `OptimizationTier1`. `OptimizationTierOptimized` is now used for methods that QuickJit is disabled for, and does not send the tier 1 flag.
- For info about the code being generated by the JIT, added info to `PrepareCodeConfig` and stored a pointer to it on the thread object for the current JIT invocation. Info is updated in `PrepareCodeConfig` and used for updating the tier on the code version and for sending the ETL event.
- If the JIT decides to use MinOpt when `MethodDesc::IsJitOptimizationDisabled()` is false, the info is not stored. The runtime method event will reflect the JIT's choice, the rundown event will not.
- Updated to show optimization tiers in SOS similarly to PerfView
|
|
versioning is in use (#24422)
Fix ETL event rejit IDs and sending of the IL to native map event when code versioning is in use
Fixes https://github.com/dotnet/coreclr/issues/22904
Fixes https://github.com/dotnet/coreclr/issues/22908
- Method events now always send the native code ID for the rejit ID, and the IL to native map event continues to send the IL code ID
- Took code versioning into account when sending rundown events for a method including the IL to native map
|
|
* Remove more MDA support code
* PR Feedback
|
|
Fix incorrect tier reported by SOS
- The tier of the initial code version was being assumed to be 0
- Whether call counting is enabled for a method needed to be available to the DAC
- Some small renames / cleanup to simplify code
|
|
default (#24252)
When QuickJit is enabled, disable it for methods that contain loops by default
Fixes https://github.com/dotnet/coreclr/issues/19751 by default when QuickJit is enabled
- Added config variable TC_QuickJitForLoops. When disabled (the default), the JIT identifies loops and explicit tail calls and switches to tier 1 JIT.
- This would prevent the possibility of spending too long in QuickJit code, but may decrease startup time a bit when QuickJit is enabled
- Removed TC_StartupTier_OptimizeCode, as now that there is TC_QuickJit, I didn't see a good use for it
- Removed references to "StartupTier" in config variables because we had previously decided not to call it that.
- When QuickJit is disabled, avoid creating native code slots for methods in non-R2R'ed modules, as tiering would be disabled for those anyway
- Marked TC_QuickJit config var as external
|
|
This refactoring is preparation for disabling fragile NGen support in the runtime. It keeps fragile-NGen specific code under FEATURE_PREJIT and moves the code required to support R2R to be outside FEATURE_PREJIT.
The eventual goal is to compile the runtime without FEATURE_PREJIT defined to avoid fragile-NGen specific overhead.
|
|
(#23599)
Disable tier 0 JIT (quick JIT) by default, rename config option
- Tier 0 JIT is being called quick JIT in config options, renamed DisableTier0Jit to StartupTierQuickJit
- Disabled quick JIT by default, the current plan is to do that for preview 4
- Concerns were that code produced by quick JIT may be slow, may allocate more, may use more stack space, and may be much larger than optimized code, and there there may be many cases where these things lead to regressions when the span of time between startup and steady-state is important
- The thought was that with quick JIT disabled, tiering overhead from call counting and backgorund jitting with optimizations would be less, and perf during any point in time would be closer to 2.x releases
- This mostly loses the startup perf gains from tiering. It may also be slightly slower compared with tiering off due to some overhead. When quick JIT is disabled for the startup tier, made a change to disable tiered compilation for methods in modules that are not R2R'ed since they will not be tiered currently anyway. The overhead and regression in R2R'ed modules will be looked into separately to see if it can be reduced.
Fixes https://github.com/dotnet/coreclr/issues/22998
Fixes https://github.com/dotnet/coreclr/issues/19751
|
|
`ConvToJitSig` is not expecting to see modifiers for return types in sigs,
so don't bother preserving them when creating instantiation stubs.
This comes up for instantiation stubs for methods of `ReadOnlySpan<T>`.
Also, if we are preserving modifiers in a sig, make sure to prefix their type
handles with `ELEM_TYPE_INTERNAL`.
Fixes #23136.
|
|
(#22285)
* GCHeapHash
- Hashtable implementation for runtime use
- Implementation written in C++
- Data storage in managed heap memory
- Based on SHash design, but using managed memory
CrossLoaderAllocatorHash
- Hash for c++ Pointer to C++ pointer where the lifetimes are controlled by different loader allocators
- Support for add/remove/visit all entries of 1 key/visit all entries/ remove all entries of 1 key
- Supports holding data which is unmanaged, but data items themselves can be of any size (key/value are templated types)
* Swap MethodDescBackpatchInfo to use the CrossLoaderAllocatorHash
* The MethodDescBackpatchCrst needs to be around an allocation
- Adjust the Crst so that it can safely be used around code which allocates
- Required moving its use out from within the EESuspend logic used in rejit
|
|
Use DispatchToken::CreateDispatchToken to get token to resolve
virtual method in case of non interface MT.
|
|
Add config option to disable tier 0 JIT
Fixes https://github.com/dotnet/coreclr/issues/21856
- For methods that don't have pregenerated code, using tier 0 JIT can improve startup perf, and disabling tier 0 JIT can be useful to sacrifice some startup time to avoid issues of running tier 0 code for too long. In some cases, it may also be desirable to avoid tiering up much later.
- A fixed value for the call count indicates that tier 0 call counting is disabled. When disabled, the method starts at tier 1.
- Also modified call counting to start from a predetermined threshold and count down to zero, as it simplifies some things, allows for methods to have different thresholds, and likely is what we would want eventually anyway
- Took a small step towards eliminating knowledge of specific tier levels in code that should not care, though more is to be done there
|
|
This allows sos bpmd breakpoints to work on prejitted methods.
Closes #22265.
|
|
|
|
Patch vtable slots and similar when tiering is enabled
For a method eligible for code versioning and vtable slot backpatch:
- It does not have a precode (`HasPrecode()` returns false)
- It does not have a stable entry point (`HasStableEntryPoint()` returns false)
- A call to the method may be:
- An indirect call through the `MethodTable`'s backpatchable vtable slot
- A direct call to a backpatchable `FuncPtrStub`, perhaps through a `JumpStub`
- For interface methods, an indirect call through the virtual stub dispatch (VSD) indirection cell to a backpatchable `DispatchStub` or a `ResolveStub` that refers to a backpatchable `ResolveCacheEntry`
- The purpose is that typical calls to the method have no additional overhead when code versioning is enabled
Recording and backpatching slots:
- In order for all vtable slots for the method to be backpatchable:
- A vtable slot initially points to the `MethodDesc`'s temporary entry point, even when the method is inherited by a derived type (the slot's value is not copied from the parent)
- The temporary entry point always points to the prestub and is never backpatched, in order to be able to discover new vtable slots through which the method may be called
- The prestub, as part of `DoBackpatch()`, records any slots that are transitioned from the temporary entry point to the method's at-the-time current, non-prestub entry point
- Any further changes to the method's entry point cause recorded slots to be backpatched in `BackpatchEntryPointSlots()`
- In order for the `FuncPtrStub` to be backpatchable:
- After the `FuncPtrStub` is created and exposed, it is patched to point to the method's at-the-time current entry point if necessary
- Any further changes to the method's entry point cause the `FuncPtrStub` to be backpatched in `BackpatchEntryPointSlots()`
- In order for VSD entities to be backpatchable:
- A `DispatchStub`'s entry point target is aligned and recorded for backpatching in `BackpatchEntryPointSlots()`
- The `DispatchStub` was modified on x86 and x64 such that the entry point target is aligned to a pointer to make it backpatchable
- A `ResolveCacheEntry`'s entry point target is recorded for backpatching in `BackpatchEntryPointSlots()`
Slot lifetime and management of recorded slots:
- A slot is recorded in the `LoaderAllocator` in which the slot is allocated, see `RecordAndBackpatchEntryPointSlot()`
- An inherited slot that has a shorter lifetime than the `MethodDesc`, when recorded, needs to be accessible by the `MethodDesc` for backpatching, so the dependent `LoaderAllocator` with the slot to backpatch is also recorded in the `MethodDesc`'s `LoaderAllocator`, see `MethodDescBackpatchInfo::AddDependentLoaderAllocator_Locked()`
- At the end of a `LoaderAllocator`'s lifetime, the `LoaderAllocator` is unregistered from dependency `LoaderAllocators`, see `MethodDescBackpatchInfoTracker::ClearDependencyMethodDescEntryPointSlots()`
- When a `MethodDesc`'s entry point changes, backpatching also includes iterating over recorded dependent `LoaderAllocators` to backpatch the relevant slots recorded there, see `BackpatchEntryPointSlots()`
Synchronization between entry point changes and backpatching slots
- A global lock is used to ensure that all recorded backpatchable slots corresponding to a `MethodDesc` point to the same entry point, see `DoBackpatch()` and `BackpatchEntryPointSlots()` for examples
Due to startup time perf issues:
- `IsEligibleForTieredCompilation()` is called more frequently with this change and in hotter paths. I chose to use a `MethodDesc` flag to store that information for fast retreival. The flag is initialized by `DetermineAndSetIsEligibleForTieredCompilation()`.
- Initially, I experimented with allowing a method versionable with vtable slot backpatch to have a precode, and allocated a new precode that would also be the stable entry point when a direct call is necessary. That also allows recording a new slot to be optional - in the event of an OOM, the slot may just point to the stable entry point. There are a large number of such methods and the allocations were slowing down startup perf. So, I had to eliminate precodes for methods versionable with vtable slot backpatch and that in turn means that recording slots is necessary for versionability.
|
|
|
|
- Replace JitHelpers.UnsafeCastToStackPtr with Unsafe.AsPointer
- Delete PinningHelper that was duplicate of RawData helper class
|
|
|
|
* Implementation of R2R vtable call thunks. These thunks will fetch the target code pointer from the vtable of the input thisPtr, and jump to that address.
This is especially helpful with generics, since we can avoid a generic dictionary lookup cost for a simple vtable call.
Overall, these thunks cause the CPU to have less branch mispredictions, and give a small performance boost to vtable calls.
These stubs are under VirtualCallStubManager so that the managed debugger can handle stepping through them.
|
|
|
|
* Remove IsRemotingIntercepted methods that always return false.
* Remove GetOptionalMembersAllocationSize parameters that are always false.
* Remove references to context static.
Remove references in comments and methodnames.
* Remove RemotingVtsInfo.
|
|
* Remove context statics stuff part 1
This change removes all context statics stuff from the runtime since
context statics are not supported and this code was obsolete.
* Remove context statics stuff from the debugger code
|
|
Add MethodImplOptions.AggressiveOptimization and use it for tiering
Part of fix for https://github.com/dotnet/corefx/issues/32235
Workaround for https://github.com/dotnet/coreclr/issues/19751
- Added and set CORJIT_FLAG_AGGRESSIVE_OPT to indicate that a method is flagged with AggressiveOptimization
- For a method flagged with AggressiveOptimization, tiering uses a foreground tier 1 JIT on first call to the method, skipping the tier 0 JIT and call counting
- When tiering is disabled, a method flagged with AggressiveOptimization does not use r2r-pregenerated code
- R2r crossgen does not generate code for a method flagged with AggressiveOptimization
|
|
(#19427)
- Sealed Runtime makes `is RuntimeType` and similar checks faster. These checks are fairly common in reflection.
- Delete support for introspection only loads from the runtime. We do not plan to use in .NET Core. The support for introspection loads inherited from RuntimeType and thus it is incompatible with sealed RuntimeType.
|
|
* Partial R2R IBC fixes
- Log use method code access in all cases, not just when the method is JITed
- Add workaround for CONTRACT_VIOLATION that shows up in checked builds when collecting IBC data
- Make /ReadyToRun switch work for CoreLib
|
|
Apply tiering's call counting delay more broadly
Issues
- When some time passes between process startup and first significant use of the app, startup perf with tiering can be slower because the call counting delay is no longer in effect
- This is especially true when the process is affinitized to one cpu
Fixes
- Initiate and prolong the call counting delay upon tier 0 activity (jitting or r2r code lookup for a new method)
- Stop call counting for a called method when the delay is in effect
- Stop (and don't start) tier 1 jitting when the delay is in effect
- After the delay resume call counting and tier 1 jitting
- If the process is affinitized to one cpu at process startup, multiply the delay by 10
No change in benchmarks.
|
|
* Enable genFnCalleeRegArgs for Arm64 Varargs
Before the method would early out and incorrectly expect the usage
of all incoming arguments to be their homed stack slots. It is
instead possible for incoming arguments to be homed to different
integer registers.
The change will mangle the float types for vararg cases in the same
way that is done during lvaInitUserArgs and fgMorphArgs.
* Apply format patch
* Account for softfp case
* Address feedback
* Apply format patch
* Use standard function header for mangleVarArgsType
* Remove confusing comment
|
|
* Ajusted -> Adjusted
* alot -> a lot
* Ambigous -> Ambiguous
* amoun -> amount
* amoung -> among
* Amperstand -> Ampersand
* Anbody -> Anybody
* anddoens't -> and doesn't
* anme -> name
* annoations -> annotations
* annother -> another
* anothr -> another
* ansynchronous -> asynchronous
* anticpation -> anticipation
* anway -> anyway
* aother -> another
* Apparant -> Apparent
* appartment -> apartment
* appdmomain -> appdomain
* Appdomian -> Appdomain
* appdomin -> appdomain
* approproiate -> appropriate
* approprate -> appropriate
* approp -> appropriate
* appened -> appended
* appropiately -> appropriately
* appropraitely -> appropriately
* Apperantly -> Apparently
* approp. -> appropriate
* Approriate -> Appropriate
|
|
This change addresses races that cause spurious failures in when running
GC stress on multithreaded applications.
* Instruction update race
Threads that hit a gc cover interrupt where gc is not safe can race to
overrwrite the interrupt instruction and change it back to the original
instruction.
This can cause confusion when handling stress exceptions as the exception code
raised by the kernel may be determined by disassembling the instruction that
caused the fault, and this instruction may now change between the time the
fault is raised and the instruction is disassembled. When this happens the
kernel may report an ACCESS_VIOLATION where there was actually an attempt to
execute a priveledged instruction.
x86 already had a tolerance mechanism here where when gc stress was active
and the exception status was ACCESS_VIOLATION the faulting instruction would
be retried to see if it faults the same way again. In this change we extend
this to tolerance to cover x64 and also enable it regardless of the gc mode.
We use the exception information to further screen as these spurious AVs look
like reads from address 0xFF..FF.
* Instrumentation vs execution race
The second race happens when one thread is jitting a method and another is
about to call the method. The first thread finishes jitting and publishes the
method code, then starts instrumenting the method for gc coverage. While this
instrumentation is ongoing, the second thread then calls the method and hits
a gc interrupt instruction. The code that recognizes the fault as a gc coverage
interrupt gets confused as the instrumentation is not yet complete -- in
particular the m_GcCover member of the MethodDesc is not yet set. So the second
thread triggers an assert.
The fix for this is to instrument for GcCoverage before publishing the code.
Since multiple threads can be jitting a method concurrently the instrument and
public steps are done under a lock to ensure that the instrumentation and code
are consistent (come from the same thread).
With this lock in place we have removed the secondary locking done in
SetupGcCoverage as it is no longer needed; only one thread can be instrumenting
a given jitted method for GcCoverage.
However we retain a bailout` clause that first looks to see if m_GcCover is
set and if so skips instrumentation, as there are prejit and rejit cases where we
will retry instrumentation.
* Instruction cache flushes
In some cases when replacing the interrupt instruction with the original the
instruction cache was either not flushed or not flushed with sufficient length.
This possibly leads to an increased frequency of the above races.
No impact expected for non-gc stress scenarios, though some of the code changes
are in common code paths.
Addresses the spurious GC stress failures seen in #17027 and #17610.
|
|
* Fix x86 steady state tiered compilation performance
Also included - a few tiered compilation only test hooks + small logging fix for JitBench
Tiered compilation wasn't correctly implementing the MayHavePrecode and RequiresStableEntryPoint policy functions. On x64 this was a non-issue, but due to compact entrypoints on x86 it lead to methods allocating both FuncPtrStubs and Precodes. The FuncPtrStubs would never get backpatched which caused never ending invocations of the Prestub for some methods. Although such code still runs correctly, it is much slower than it needs to be. On MusicStore x86 I am seeing a 20% improvement in steady state RPS after this fix, bringing us inline with what I've seen on x64.
|
|
Fix trigger for tier 1 call counting delay
The trigger was taking into account all non-tier-1 JIT invocations to delay call counting, even for those methods that are not eligible for tiering. In the AllReady benchmark, some dynamic methods were being jitted frequently enough to not allow tier 1 call counting to begin. Fixed to count only eligible methods jitted at tier 0, such that methods not eligible for tiering don't interfere with the tiering heuristics.
|
|
* [armel tizen] Fixed dynamic code allocation for ARM
* Added comment
* Used ThumbCodeToDataPointer
* Changed to PCODEToPINSTR
|
|
I missed updating one of the callers of `ReadyToRunInfo::GetEntryPoint`
in #15801. Fix by making the last arg explicit so this kind of type
confusion error is less likely, and updating the missed call site.
Closes #16177.
|
|
Add flags to track the presence of ReadyToRun codegen in an assembly
and module.
Add jitting flags to indicate when a method is jitted because the
precompiled code was rejected, either by a profiler or by ReadyToRun
dependence tracking.
Together these can be used to distingish between
* methods jitted because their assemblies were not precompiled
* methods jitted because they were not precompiled in an otherwise
precompiled assembly
|
|
Enable tiered jitting for R2R methods
- Included R2R methods and generics over value types in CoreLib for tiered jitting. Tier 0 for R2R methods is the precompiled code if available, and tier 1 is selectively scheduled based on call counting.
- Added a delay before starting to count calls for tier 1 promotion. The delay is a short duration after frequent tier 0 jitting stops (current heuristic for identifying startup).
- Startup time and steady-state performance have improved on JitBench. There is a regression shortly following startup due to call counting and tier 1 jitting, for a short duration before steady-state performance stabilizes.
- Added two new config values, one for configuring the call count threshold for promoting to tier 1, and another for specifying the delay from the last tier 0 JIT invocation before starting to count calls
|
|
|
|
(#13933)
* Implement optimization case for CreateDictionaryLookupHelper
Signed-off-by: Hyung-Kyu Choi <hk0110.choi@samsung.com>
* Reenable mainv1/mainv2 tests
|
|
* Change jit notifications so that they pass the native code address. This fixes !bpmd so that it will set the correct breakpoint on tiered jitted methods.
* code review feedback
* don't handle OnCodeGenerated
|
|
* [RyuJIT/ARM32] Implement CreateDictionaryLookupHelper only via run-time helper
Implement CreateDictionaryLookupHelper only via run-time helper
* Add assertion for checking CORINFO_USEHELPER
|
|
* Support GDBJIT on NI/IL_STUBS
* Move tls_isSymReaderInProgress into gdbjit.cpp
|
|
Partially remove relocations from SECTION_Readonly
|
|
|
|
Resolve FEATURE_GDBJIT/FEATURE_INTERPRETER conflict
|