Age | Commit message (Collapse) | Author | Files | Lines |
|
Fix watson bucketing/broken triage dumps
The DAC EnumMemoryRegions needs to include some missing code version
manager memory.
|
|
(#25333)
These types contain byrefs, and so when returned in registers we may need
to avoid GC stress at the return site.
Addresses part of #24263.
|
|
* Extract ReplaceInstrAfterCall.
* Avoid GCStress when return multireg with pointers.
Determinate when we need to protect the second register and do not cause GCStress in such cases.
* Add a repro test.
* Reenable MethodImplOptionsTests.
* Extract IsGcCoveregeInterruptInstruction.
That changes how we do checks for arm32 in `IsGcCoverageInterrupt`.
* Tolerate direct call to JIT_RareDisableHelper.
x86 ILStubClass:IL_STUB_PInvoke(byref,ref,int,byref):int generates it like:
Generating: N119 ( 4, 7) [000118] ------------ * RETURNTRAP int REG NA
IN0021: cmp dword ptr [0F9BF9F8H], 0
New Basic Block BB10 [0009] created.
IN0022: je L_M6496_BB10
Call: GCvars=00000001 {V01}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}
IN0023: call CORINFO_HELP_STOP_FOR_GC
* Support GC stress protect return 1/2/both Unix x64.
* Fix arm64.
Do not insert GC Stress instrucitons when we can't determinate the exact return kind.
* Fix review1.
* Fix review2.
* Change the test as Andy suggested.
* Fix some typos.
* Replace all SLOT with PBYTE.
* Disable assert that can fail because of multithreading.
|
|
* Move GetReturnKindFromMethodTable to method.hpp.
We would need this in other places in the next commits.
* Delete unnecessary checks from callhelpers.
* Do not check return types in CanDeduplicateCode.
GC info v.2 has this information and it is checked in another place.
* Change ComPlusMethodFrame to use the new function.
* Change gccover.cpp to use GetReturnKindFromMethodTable.
* Delete RETURNTYPE.
* Add check to ComPlusMethodFrame.
* Delete check from threadsuspend.
codeInfo->GetCodeManager()->GetReturnKind(gcInfoToken) must always return a valid kind nowdays (it could return an invalid lind only when GC Info v2 was not available).
* Rename functions/arguments.
* Add check for IsValidReturnKind.
* delete unused var.
|
|
Add some perf events/data for tiered compilation
New events:
- `Settings` - Sent when TC is enabled
- `Flags` - Currently indicates whether QuickJit and QuickJitForLoops are enabled
- `Pause` - Sent when TC is paused (due to a new method being called for the first time)
- `Resume` - Sent when TC resumes
- `NewMethodCount` - Number of methods called for the first time while tiering was paused
- `BackgroundJitStart` - Sent when starting to JIT methods in the background
- `PendingMethodCount` - Number of methods currently scheduled for background JIT
- `BackgroundJitStop` - Sent when background jitting stops
- `PendingMethodCount` - Same as above. When 0, background jitting has completed.
- `JittedMethodCount` - Number of methods jitted in the background since the previous BackgroundJitStart event on the same thread
Miscellaneous:
- Updated method JIT events to include the optimization tier
- Added a couple more cases where tiered compilation is disabled for methods that have JIT optimization disabled for some reason
- Renamed `Duration` field of the new version of the `ContentionEnd` to `DurationNs` to indicate the units of time
- Added `OptimizationTierOptimized` to `NativeCodeVersion::OptimizationTier` to distinguish it from `OptimizationTier1`. `OptimizationTierOptimized` is now used for methods that QuickJit is disabled for, and does not send the tier 1 flag.
- For info about the code being generated by the JIT, added info to `PrepareCodeConfig` and stored a pointer to it on the thread object for the current JIT invocation. Info is updated in `PrepareCodeConfig` and used for updating the tier on the code version and for sending the ETL event.
- If the JIT decides to use MinOpt when `MethodDesc::IsJitOptimizationDisabled()` is false, the info is not stored. The runtime method event will reflect the JIT's choice, the rundown event will not.
- Updated to show optimization tiers in SOS similarly to PerfView
|
|
|
|
Fixes #21009
|
|
* Basic infra for cuckoo filter of attributes
- Implement cuckoo filter lookup logic
- Implement new ready to run section
- Add dumper to R2RDump
- Parse section on load into data structure
- Implement function to query filter
- Add concept of enum of well known attributes
- So that attribute name hashes themselves may be cached
* Wrap all even vaguely perf critical uses of attribute by name parsing with use of R2R data
* Update emmintrin.h in the PAL header to contain the needed SSE2 intrinsics for the feature
- Disable the presence table for non Corelib cases. Current performance data does not warrant the size increase in other generated binaries
|
|
(#23599)
Disable tier 0 JIT (quick JIT) by default, rename config option
- Tier 0 JIT is being called quick JIT in config options, renamed DisableTier0Jit to StartupTierQuickJit
- Disabled quick JIT by default, the current plan is to do that for preview 4
- Concerns were that code produced by quick JIT may be slow, may allocate more, may use more stack space, and may be much larger than optimized code, and there there may be many cases where these things lead to regressions when the span of time between startup and steady-state is important
- The thought was that with quick JIT disabled, tiering overhead from call counting and backgorund jitting with optimizations would be less, and perf during any point in time would be closer to 2.x releases
- This mostly loses the startup perf gains from tiering. It may also be slightly slower compared with tiering off due to some overhead. When quick JIT is disabled for the startup tier, made a change to disable tiered compilation for methods in modules that are not R2R'ed since they will not be tiered currently anyway. The overhead and regression in R2R'ed modules will be looked into separately to see if it can be reduced.
Fixes https://github.com/dotnet/coreclr/issues/22998
Fixes https://github.com/dotnet/coreclr/issues/19751
|
|
required (#22560)
* These changes enable the inlining of some PInvokes that do not require any marshalling. With inlined pinvokes, R2R performance should become slightly better, since we'll avoid jitting some of the pinvoke IL stubs that we jit today for S.P.CoreLib. Performance gains not yet measured.
* Added JIT_PInvokeBegin/End helpers for all architectures. Linux stubs not yet implemented
* Add INLINE_GETTHREAD for arm/arm64
* Set CORJIT_FLAG_USE_PINVOKE_HELPERS jit flag for ReadyToRun compilations
* Updating R2RDump tool to handle pinvokes
|
|
(#22285)
* GCHeapHash
- Hashtable implementation for runtime use
- Implementation written in C++
- Data storage in managed heap memory
- Based on SHash design, but using managed memory
CrossLoaderAllocatorHash
- Hash for c++ Pointer to C++ pointer where the lifetimes are controlled by different loader allocators
- Support for add/remove/visit all entries of 1 key/visit all entries/ remove all entries of 1 key
- Supports holding data which is unmanaged, but data items themselves can be of any size (key/value are templated types)
* Swap MethodDescBackpatchInfo to use the CrossLoaderAllocatorHash
* The MethodDescBackpatchCrst needs to be around an allocation
- Adjust the Crst so that it can safely be used around code which allocates
- Required moving its use out from within the EESuspend logic used in rejit
|
|
|
|
This function was recently changed to just return the
MethodDesc::GetLoaderAllocator. This is a cleanup that removes the
function completely and replaces all of its usages.
|
|
Patch vtable slots and similar when tiering is enabled
For a method eligible for code versioning and vtable slot backpatch:
- It does not have a precode (`HasPrecode()` returns false)
- It does not have a stable entry point (`HasStableEntryPoint()` returns false)
- A call to the method may be:
- An indirect call through the `MethodTable`'s backpatchable vtable slot
- A direct call to a backpatchable `FuncPtrStub`, perhaps through a `JumpStub`
- For interface methods, an indirect call through the virtual stub dispatch (VSD) indirection cell to a backpatchable `DispatchStub` or a `ResolveStub` that refers to a backpatchable `ResolveCacheEntry`
- The purpose is that typical calls to the method have no additional overhead when code versioning is enabled
Recording and backpatching slots:
- In order for all vtable slots for the method to be backpatchable:
- A vtable slot initially points to the `MethodDesc`'s temporary entry point, even when the method is inherited by a derived type (the slot's value is not copied from the parent)
- The temporary entry point always points to the prestub and is never backpatched, in order to be able to discover new vtable slots through which the method may be called
- The prestub, as part of `DoBackpatch()`, records any slots that are transitioned from the temporary entry point to the method's at-the-time current, non-prestub entry point
- Any further changes to the method's entry point cause recorded slots to be backpatched in `BackpatchEntryPointSlots()`
- In order for the `FuncPtrStub` to be backpatchable:
- After the `FuncPtrStub` is created and exposed, it is patched to point to the method's at-the-time current entry point if necessary
- Any further changes to the method's entry point cause the `FuncPtrStub` to be backpatched in `BackpatchEntryPointSlots()`
- In order for VSD entities to be backpatchable:
- A `DispatchStub`'s entry point target is aligned and recorded for backpatching in `BackpatchEntryPointSlots()`
- The `DispatchStub` was modified on x86 and x64 such that the entry point target is aligned to a pointer to make it backpatchable
- A `ResolveCacheEntry`'s entry point target is recorded for backpatching in `BackpatchEntryPointSlots()`
Slot lifetime and management of recorded slots:
- A slot is recorded in the `LoaderAllocator` in which the slot is allocated, see `RecordAndBackpatchEntryPointSlot()`
- An inherited slot that has a shorter lifetime than the `MethodDesc`, when recorded, needs to be accessible by the `MethodDesc` for backpatching, so the dependent `LoaderAllocator` with the slot to backpatch is also recorded in the `MethodDesc`'s `LoaderAllocator`, see `MethodDescBackpatchInfo::AddDependentLoaderAllocator_Locked()`
- At the end of a `LoaderAllocator`'s lifetime, the `LoaderAllocator` is unregistered from dependency `LoaderAllocators`, see `MethodDescBackpatchInfoTracker::ClearDependencyMethodDescEntryPointSlots()`
- When a `MethodDesc`'s entry point changes, backpatching also includes iterating over recorded dependent `LoaderAllocators` to backpatch the relevant slots recorded there, see `BackpatchEntryPointSlots()`
Synchronization between entry point changes and backpatching slots
- A global lock is used to ensure that all recorded backpatchable slots corresponding to a `MethodDesc` point to the same entry point, see `DoBackpatch()` and `BackpatchEntryPointSlots()` for examples
Due to startup time perf issues:
- `IsEligibleForTieredCompilation()` is called more frequently with this change and in hotter paths. I chose to use a `MethodDesc` flag to store that information for fast retreival. The flag is initialized by `DetermineAndSetIsEligibleForTieredCompilation()`.
- Initially, I experimented with allowing a method versionable with vtable slot backpatch to have a precode, and allocated a new precode that would also be the stable entry point when a direct call is necessary. That also allows recording a new slot to be optional - in the event of an OOM, the slot may just point to the stable entry point. There are a large number of such methods and the allocations were slowing down startup perf. So, I had to eliminate precodes for methods versionable with vtable slot backpatch and that in turn means that recording slots is necessary for versionability.
|
|
The DynamicMethodTable::AddMethodsToList was incorrectly allocating the
MethodDescChunk from the domain's LoaderAllocator instead of the context
specific one. Thus the allocated memory was leaking after a collectible
AssemblyLoadContext was collected.
There was also a problem with the DynamicMethodDesc::Destroy being
called twice for collectible classes - once by
RuntimeMethodHandle::Destroy() and once when the DomainFile destructor
was called. Due to the primary issue, this problem was not visible,
since the domain's LoaderAllocator is never unmapped. But it started to
cause AV after the primary issue was fixed.
|
|
* Enable TypeEquivalence feature for Windows platform
* Basic test - verified test exercises TypeEquivalence code paths
|
|
|
|
|
|
or native depending on the situation (i.e. managed call, reflection, P/Invoke) instead of only for managed. Fixes #20702.
Clean up duplicate #ifdefs.
Move #ifdef into method impl instead of outside the implementations.
Move IsRegPassedStruct out of UNIX_AMD64_ABI #ifdef.
Move check for enregistered struct out of UNIX_AMD64_ABI #ifdef
Add dummy implementation of IsRegPassedStruct and IsNativeStructPassedInRegisters when UNIX_AMD64_ABI isn't defined.
|
|
* Add PInvoke/ExactSpelling tests
Refactor tests to fit with the rest of the Interop tests.
Fix up test to cleanly run.
Change CMakeLists.txt to match the rest of the tests.
Include Interop.cmake in CMakeLists.txt
Remove Service.
* On x86 enable stdcall mangling irrespective of ExactSpelling and account for the charset suffix when ExactSpelling = false.
Change variable name.
Clean up the FindEntryPoint. The logic flow now matches CoreRT + CoreCLR specific features (ordinals and stdcall mangling).
PR Feedback.
Fix format specifier.
Add back probing null check.
Fix offset calculation for stdcall mangling.
Probe the stdcall-mangled versions of the original entry-point names when ExactSpelling isn't set.
Cleanup.
|
|
* Remove IsRemotingIntercepted methods that always return false.
* Remove GetOptionalMembersAllocationSize parameters that are always false.
* Remove references to context static.
Remove references in comments and methodnames.
* Remove RemotingVtsInfo.
|
|
Bring back functionality for loading IJW assemblies and calling managed->native. Also add workaround to test case for the C++ compiler inserting calls to mscoree.
|
|
(#19427)
- Sealed Runtime makes `is RuntimeType` and similar checks faster. These checks are fairly common in reflection.
- Delete support for introspection only loads from the runtime. We do not plan to use in .NET Core. The support for introspection loads inherited from RuntimeType and thus it is incompatible with sealed RuntimeType.
|
|
* Separate sections READONLY_VCHUNKS and READONLY_DICTIONARY
* Remove relocations for second-level indirection of Vtable in case FEATURE_NGEN_RELOCS_OPTIMIZATIONS is enabled.
Introduce FEATURE_NGEN_RELOCS_OPTIMIZATIONS, under which NGEN specific relocations optimizations are enabled
* Replace push/pop of R11 in stubs with
- str/ldr of R4 in space reserved in epilog for non-tail calls
- usage of R4 with hybrid-tail calls (same as for EmitShuffleThunk)
* Replace push/pop of R11 for function epilog with usage of LR as helper register right before its restore from stack
|
|
Eliminate `FEATURE_UNIX_AMD64_STRUCT_PASSING` and replace it with `UNIX_AMD64_ABI` when used alone. Both are currently defined; it is highly unlikely the latter will work alone; and it significantly clutters up the code, especially the JIT.
Also, fix the altjit support (now `UNIX_AMD64_ABI_ITF`) to *not* call `ClassifyEightBytes` if the struct is too large. Otherwise it asserts.
|
|
* Fix x86 steady state tiered compilation performance
Also included - a few tiered compilation only test hooks + small logging fix for JitBench
Tiered compilation wasn't correctly implementing the MayHavePrecode and RequiresStableEntryPoint policy functions. On x64 this was a non-issue, but due to compact entrypoints on x86 it lead to methods allocating both FuncPtrStubs and Precodes. The FuncPtrStubs would never get backpatched which caused never ending invocations of the Prestub for some methods. Although such code still runs correctly, it is much slower than it needs to be. On MusicStore x86 I am seeing a 20% improvement in steady state RPS after this fix, bringing us inline with what I've seen on x64.
|
|
When running ready to run code on ARM, the ExternalMethodFixupWorker
doesn't handle the entrypoints of FCalls correctly. It tries to handle
them as compact entrypoints, but those use a different machine code
instructions and it results in an assert in debug / checked build.
This change detects the runtime supplied calls before trying to check
for the compact entrypoint.
|
|
interfaces (#16690)
* Add checks and regression test
* Uncomment InvokeConsumer tests
* Revert JIT Interface Check
|
|
|
|
Partially remove relocations from SECTION_Readonly
|
|
|
|
This makes tiered compilation work properly with profiler ReJIT, and positions the runtime to integrate other versioning related features together in the future. See the newly added code-versioning design-doc in this commit for more information.
Breaking changes for profilers: See code-versioning-profiler-breaking-changes.md for more details.
|
|
* Support non-virtual calls on interface private members correctly
* Support protected methods
* Properly handle precode
* Throw (tentative) exception when seeing conflict overrides and add a test case
(This updates CoreCLR dev/defaultintf the same with the build we are showing at //build)
|
|
* allow non-zero RVA on abstract interface method in ilasm
* Revert "allow non-zero RVA on abstract interface method in ilasm"
This reverts commit eecb8024e58f14a20e5e49359f38019f5768ac41.
* add a test case and allow virtual non-abstract method in ilasm
* allow non-abstract methods in the loader
* fixup dispatch map
* support for simple default interface method scenario
* fix a bug with incorrect usage of MethodIterator skpping the first method. add a test case for overriding but it may not be what we want
* add another simple test case for base class
* allow private/internal methods in ilasm and add a explict impl test
* update reference to mscorlib in il test
* add shared generics and variance case
* allow interface dispatch to return instantiating stubs with the right PARAM_TYPE calling conv
* simple factoring and add a valuetype test case
* add a test case for generic virtual methods
* a bit more refactoring by moving the CALLCONV_PARAMTYPE logic inside getMethodSigInternal
* support explicit methodimpl and remove implicit methodimpl test case
* update test cases to give more clear output
* Fix a bug where GetMethodDescForSlot chokes on interface MethodDesc without precode. This is accdentally discovered by methodimpl test case when iterating methods on a default interface method that has already been JITted
* cleanup code after review and add a bit more comments
* update comments
* only use custom ILAsm for default interface methods tests - some tests are choking on CoreCLR ilasm for security related stuff
* address comments and allow instance methods, and add a constraint value type call test scenario
* disable the failing protected method scenario
|
|
|
|
accessed from jit code for Linux ARM (#11963)
|
|
|
|
|
|
Fixes #9321 and deletes CleanupToDoList.cs
Delete unmanaged security implementation
|
|
|
|
Implement these two methods as optional-expand jit intrinsics.
Uses `GT_ARR_BOUNDS_CHECK` for the bounds check so in some cases
downstream code is able to eliminate redundant checks. Fully general
support (on par with arrays in most cases) is still work in progress.
Update one bit of code in the optimizer that assumed it knew the
tree types that appeared in a `GT_ARR_BOUNDS_CHECK`.
Add benchmark tests for Span and ReadOnlySpan indexers.
Tests ability of jit to reason about indexer properties with respect
to loop bounds and related indexer uses. Some cases inspired by span
indexer usage in Kestrel.
Closes #10785.
Also addresses lack of indexer inlining noted in #10031. Span indexers
should now always be inlined, even when invoked from shared methods.
|
|
Some of mcc_i* tests caused segmentation faults on Unix. This commit make these tests exit by throwing a System.EntryPointNotFoundException exception instead of causing a segmentation fault.
Fix #9530
|
|
Tiered compilation is a new feature we are experimenting with that aims to improve startup times. Initially we jit methods non-optimized, then switch to an optimized version once the method has been called a number of times. More details about the current feature operation are in the comments of TieredCompilation.cpp.
This is only the first step in a longer process building the feature. The primary goal for now is to avoid regressing any runtime behavior in the shipping configuration in which the complus variable is OFF, while putting enough code in place that we can measure performance in the daily builds and make incremental progress visible to collaborators and reviewers. The design of the TieredCompilationManager is likely to change substantively, and the call counter may also change.
|
|
|
|
|
|
|
|
|
|
|
|
Implements two field span struct which is comprised of a byref field
that may be an interior pointer to a managed object, or a native
pointer indicating the start of the span, and a length field which
describes the span of access.
Since there is no MSIL operation which assign a byref field, the jit
gets involved and treats the constructor and getter of a special struct
called ByReference that contains an declared IntPtr. This special
struct is then used as a field in the span implementation and recognized
by the runtime as a field that may contain a GC pointer. In
implementation, the ctor of ByReference is converted into an assignment
value is returned by a reverse assignment.
Since there are some dependencies on CoreFX for the span implementation
local testing of the implementation has been done using the
BasicSpanTest.cs in the CoreCLR tests. Once this is checked in and is
propagated to CoreFX the apporopate code in the packages will be enabled
and then may be referenced in CoreCLR tests. At that time more span
tests will be added.
Additional comments and fixes based on code review added.
|
|
Use FEATURE_CER to scope CER code,
and disable CER feature in CoreCLR.
|