Age | Commit message (Collapse) | Author | Files | Lines |
|
(#23599)
Disable tier 0 JIT (quick JIT) by default, rename config option
- Tier 0 JIT is being called quick JIT in config options, renamed DisableTier0Jit to StartupTierQuickJit
- Disabled quick JIT by default, the current plan is to do that for preview 4
- Concerns were that code produced by quick JIT may be slow, may allocate more, may use more stack space, and may be much larger than optimized code, and there there may be many cases where these things lead to regressions when the span of time between startup and steady-state is important
- The thought was that with quick JIT disabled, tiering overhead from call counting and backgorund jitting with optimizations would be less, and perf during any point in time would be closer to 2.x releases
- This mostly loses the startup perf gains from tiering. It may also be slightly slower compared with tiering off due to some overhead. When quick JIT is disabled for the startup tier, made a change to disable tiered compilation for methods in modules that are not R2R'ed since they will not be tiered currently anyway. The overhead and regression in R2R'ed modules will be looked into separately to see if it can be reduced.
Fixes https://github.com/dotnet/coreclr/issues/22998
Fixes https://github.com/dotnet/coreclr/issues/19751
|
|
required (#22560)
* These changes enable the inlining of some PInvokes that do not require any marshalling. With inlined pinvokes, R2R performance should become slightly better, since we'll avoid jitting some of the pinvoke IL stubs that we jit today for S.P.CoreLib. Performance gains not yet measured.
* Added JIT_PInvokeBegin/End helpers for all architectures. Linux stubs not yet implemented
* Add INLINE_GETTHREAD for arm/arm64
* Set CORJIT_FLAG_USE_PINVOKE_HELPERS jit flag for ReadyToRun compilations
* Updating R2RDump tool to handle pinvokes
|
|
(#22285)
* GCHeapHash
- Hashtable implementation for runtime use
- Implementation written in C++
- Data storage in managed heap memory
- Based on SHash design, but using managed memory
CrossLoaderAllocatorHash
- Hash for c++ Pointer to C++ pointer where the lifetimes are controlled by different loader allocators
- Support for add/remove/visit all entries of 1 key/visit all entries/ remove all entries of 1 key
- Supports holding data which is unmanaged, but data items themselves can be of any size (key/value are templated types)
* Swap MethodDescBackpatchInfo to use the CrossLoaderAllocatorHash
* The MethodDescBackpatchCrst needs to be around an allocation
- Adjust the Crst so that it can safely be used around code which allocates
- Required moving its use out from within the EESuspend logic used in rejit
|
|
|
|
This function was recently changed to just return the
MethodDesc::GetLoaderAllocator. This is a cleanup that removes the
function completely and replaces all of its usages.
|
|
Patch vtable slots and similar when tiering is enabled
For a method eligible for code versioning and vtable slot backpatch:
- It does not have a precode (`HasPrecode()` returns false)
- It does not have a stable entry point (`HasStableEntryPoint()` returns false)
- A call to the method may be:
- An indirect call through the `MethodTable`'s backpatchable vtable slot
- A direct call to a backpatchable `FuncPtrStub`, perhaps through a `JumpStub`
- For interface methods, an indirect call through the virtual stub dispatch (VSD) indirection cell to a backpatchable `DispatchStub` or a `ResolveStub` that refers to a backpatchable `ResolveCacheEntry`
- The purpose is that typical calls to the method have no additional overhead when code versioning is enabled
Recording and backpatching slots:
- In order for all vtable slots for the method to be backpatchable:
- A vtable slot initially points to the `MethodDesc`'s temporary entry point, even when the method is inherited by a derived type (the slot's value is not copied from the parent)
- The temporary entry point always points to the prestub and is never backpatched, in order to be able to discover new vtable slots through which the method may be called
- The prestub, as part of `DoBackpatch()`, records any slots that are transitioned from the temporary entry point to the method's at-the-time current, non-prestub entry point
- Any further changes to the method's entry point cause recorded slots to be backpatched in `BackpatchEntryPointSlots()`
- In order for the `FuncPtrStub` to be backpatchable:
- After the `FuncPtrStub` is created and exposed, it is patched to point to the method's at-the-time current entry point if necessary
- Any further changes to the method's entry point cause the `FuncPtrStub` to be backpatched in `BackpatchEntryPointSlots()`
- In order for VSD entities to be backpatchable:
- A `DispatchStub`'s entry point target is aligned and recorded for backpatching in `BackpatchEntryPointSlots()`
- The `DispatchStub` was modified on x86 and x64 such that the entry point target is aligned to a pointer to make it backpatchable
- A `ResolveCacheEntry`'s entry point target is recorded for backpatching in `BackpatchEntryPointSlots()`
Slot lifetime and management of recorded slots:
- A slot is recorded in the `LoaderAllocator` in which the slot is allocated, see `RecordAndBackpatchEntryPointSlot()`
- An inherited slot that has a shorter lifetime than the `MethodDesc`, when recorded, needs to be accessible by the `MethodDesc` for backpatching, so the dependent `LoaderAllocator` with the slot to backpatch is also recorded in the `MethodDesc`'s `LoaderAllocator`, see `MethodDescBackpatchInfo::AddDependentLoaderAllocator_Locked()`
- At the end of a `LoaderAllocator`'s lifetime, the `LoaderAllocator` is unregistered from dependency `LoaderAllocators`, see `MethodDescBackpatchInfoTracker::ClearDependencyMethodDescEntryPointSlots()`
- When a `MethodDesc`'s entry point changes, backpatching also includes iterating over recorded dependent `LoaderAllocators` to backpatch the relevant slots recorded there, see `BackpatchEntryPointSlots()`
Synchronization between entry point changes and backpatching slots
- A global lock is used to ensure that all recorded backpatchable slots corresponding to a `MethodDesc` point to the same entry point, see `DoBackpatch()` and `BackpatchEntryPointSlots()` for examples
Due to startup time perf issues:
- `IsEligibleForTieredCompilation()` is called more frequently with this change and in hotter paths. I chose to use a `MethodDesc` flag to store that information for fast retreival. The flag is initialized by `DetermineAndSetIsEligibleForTieredCompilation()`.
- Initially, I experimented with allowing a method versionable with vtable slot backpatch to have a precode, and allocated a new precode that would also be the stable entry point when a direct call is necessary. That also allows recording a new slot to be optional - in the event of an OOM, the slot may just point to the stable entry point. There are a large number of such methods and the allocations were slowing down startup perf. So, I had to eliminate precodes for methods versionable with vtable slot backpatch and that in turn means that recording slots is necessary for versionability.
|
|
The DynamicMethodTable::AddMethodsToList was incorrectly allocating the
MethodDescChunk from the domain's LoaderAllocator instead of the context
specific one. Thus the allocated memory was leaking after a collectible
AssemblyLoadContext was collected.
There was also a problem with the DynamicMethodDesc::Destroy being
called twice for collectible classes - once by
RuntimeMethodHandle::Destroy() and once when the DomainFile destructor
was called. Due to the primary issue, this problem was not visible,
since the domain's LoaderAllocator is never unmapped. But it started to
cause AV after the primary issue was fixed.
|
|
* Enable TypeEquivalence feature for Windows platform
* Basic test - verified test exercises TypeEquivalence code paths
|
|
|
|
|
|
or native depending on the situation (i.e. managed call, reflection, P/Invoke) instead of only for managed. Fixes #20702.
Clean up duplicate #ifdefs.
Move #ifdef into method impl instead of outside the implementations.
Move IsRegPassedStruct out of UNIX_AMD64_ABI #ifdef.
Move check for enregistered struct out of UNIX_AMD64_ABI #ifdef
Add dummy implementation of IsRegPassedStruct and IsNativeStructPassedInRegisters when UNIX_AMD64_ABI isn't defined.
|
|
* Add PInvoke/ExactSpelling tests
Refactor tests to fit with the rest of the Interop tests.
Fix up test to cleanly run.
Change CMakeLists.txt to match the rest of the tests.
Include Interop.cmake in CMakeLists.txt
Remove Service.
* On x86 enable stdcall mangling irrespective of ExactSpelling and account for the charset suffix when ExactSpelling = false.
Change variable name.
Clean up the FindEntryPoint. The logic flow now matches CoreRT + CoreCLR specific features (ordinals and stdcall mangling).
PR Feedback.
Fix format specifier.
Add back probing null check.
Fix offset calculation for stdcall mangling.
Probe the stdcall-mangled versions of the original entry-point names when ExactSpelling isn't set.
Cleanup.
|
|
* Remove IsRemotingIntercepted methods that always return false.
* Remove GetOptionalMembersAllocationSize parameters that are always false.
* Remove references to context static.
Remove references in comments and methodnames.
* Remove RemotingVtsInfo.
|
|
Bring back functionality for loading IJW assemblies and calling managed->native. Also add workaround to test case for the C++ compiler inserting calls to mscoree.
|
|
(#19427)
- Sealed Runtime makes `is RuntimeType` and similar checks faster. These checks are fairly common in reflection.
- Delete support for introspection only loads from the runtime. We do not plan to use in .NET Core. The support for introspection loads inherited from RuntimeType and thus it is incompatible with sealed RuntimeType.
|
|
* Separate sections READONLY_VCHUNKS and READONLY_DICTIONARY
* Remove relocations for second-level indirection of Vtable in case FEATURE_NGEN_RELOCS_OPTIMIZATIONS is enabled.
Introduce FEATURE_NGEN_RELOCS_OPTIMIZATIONS, under which NGEN specific relocations optimizations are enabled
* Replace push/pop of R11 in stubs with
- str/ldr of R4 in space reserved in epilog for non-tail calls
- usage of R4 with hybrid-tail calls (same as for EmitShuffleThunk)
* Replace push/pop of R11 for function epilog with usage of LR as helper register right before its restore from stack
|
|
Eliminate `FEATURE_UNIX_AMD64_STRUCT_PASSING` and replace it with `UNIX_AMD64_ABI` when used alone. Both are currently defined; it is highly unlikely the latter will work alone; and it significantly clutters up the code, especially the JIT.
Also, fix the altjit support (now `UNIX_AMD64_ABI_ITF`) to *not* call `ClassifyEightBytes` if the struct is too large. Otherwise it asserts.
|
|
* Fix x86 steady state tiered compilation performance
Also included - a few tiered compilation only test hooks + small logging fix for JitBench
Tiered compilation wasn't correctly implementing the MayHavePrecode and RequiresStableEntryPoint policy functions. On x64 this was a non-issue, but due to compact entrypoints on x86 it lead to methods allocating both FuncPtrStubs and Precodes. The FuncPtrStubs would never get backpatched which caused never ending invocations of the Prestub for some methods. Although such code still runs correctly, it is much slower than it needs to be. On MusicStore x86 I am seeing a 20% improvement in steady state RPS after this fix, bringing us inline with what I've seen on x64.
|
|
When running ready to run code on ARM, the ExternalMethodFixupWorker
doesn't handle the entrypoints of FCalls correctly. It tries to handle
them as compact entrypoints, but those use a different machine code
instructions and it results in an assert in debug / checked build.
This change detects the runtime supplied calls before trying to check
for the compact entrypoint.
|
|
interfaces (#16690)
* Add checks and regression test
* Uncomment InvokeConsumer tests
* Revert JIT Interface Check
|
|
|
|
Partially remove relocations from SECTION_Readonly
|
|
|
|
This makes tiered compilation work properly with profiler ReJIT, and positions the runtime to integrate other versioning related features together in the future. See the newly added code-versioning design-doc in this commit for more information.
Breaking changes for profilers: See code-versioning-profiler-breaking-changes.md for more details.
|
|
* Support non-virtual calls on interface private members correctly
* Support protected methods
* Properly handle precode
* Throw (tentative) exception when seeing conflict overrides and add a test case
(This updates CoreCLR dev/defaultintf the same with the build we are showing at //build)
|
|
* allow non-zero RVA on abstract interface method in ilasm
* Revert "allow non-zero RVA on abstract interface method in ilasm"
This reverts commit eecb8024e58f14a20e5e49359f38019f5768ac41.
* add a test case and allow virtual non-abstract method in ilasm
* allow non-abstract methods in the loader
* fixup dispatch map
* support for simple default interface method scenario
* fix a bug with incorrect usage of MethodIterator skpping the first method. add a test case for overriding but it may not be what we want
* add another simple test case for base class
* allow private/internal methods in ilasm and add a explict impl test
* update reference to mscorlib in il test
* add shared generics and variance case
* allow interface dispatch to return instantiating stubs with the right PARAM_TYPE calling conv
* simple factoring and add a valuetype test case
* add a test case for generic virtual methods
* a bit more refactoring by moving the CALLCONV_PARAMTYPE logic inside getMethodSigInternal
* support explicit methodimpl and remove implicit methodimpl test case
* update test cases to give more clear output
* Fix a bug where GetMethodDescForSlot chokes on interface MethodDesc without precode. This is accdentally discovered by methodimpl test case when iterating methods on a default interface method that has already been JITted
* cleanup code after review and add a bit more comments
* update comments
* only use custom ILAsm for default interface methods tests - some tests are choking on CoreCLR ilasm for security related stuff
* address comments and allow instance methods, and add a constraint value type call test scenario
* disable the failing protected method scenario
|
|
|
|
accessed from jit code for Linux ARM (#11963)
|
|
|
|
|
|
Fixes #9321 and deletes CleanupToDoList.cs
Delete unmanaged security implementation
|
|
|
|
Implement these two methods as optional-expand jit intrinsics.
Uses `GT_ARR_BOUNDS_CHECK` for the bounds check so in some cases
downstream code is able to eliminate redundant checks. Fully general
support (on par with arrays in most cases) is still work in progress.
Update one bit of code in the optimizer that assumed it knew the
tree types that appeared in a `GT_ARR_BOUNDS_CHECK`.
Add benchmark tests for Span and ReadOnlySpan indexers.
Tests ability of jit to reason about indexer properties with respect
to loop bounds and related indexer uses. Some cases inspired by span
indexer usage in Kestrel.
Closes #10785.
Also addresses lack of indexer inlining noted in #10031. Span indexers
should now always be inlined, even when invoked from shared methods.
|
|
Some of mcc_i* tests caused segmentation faults on Unix. This commit make these tests exit by throwing a System.EntryPointNotFoundException exception instead of causing a segmentation fault.
Fix #9530
|
|
Tiered compilation is a new feature we are experimenting with that aims to improve startup times. Initially we jit methods non-optimized, then switch to an optimized version once the method has been called a number of times. More details about the current feature operation are in the comments of TieredCompilation.cpp.
This is only the first step in a longer process building the feature. The primary goal for now is to avoid regressing any runtime behavior in the shipping configuration in which the complus variable is OFF, while putting enough code in place that we can measure performance in the daily builds and make incremental progress visible to collaborators and reviewers. The design of the TieredCompilationManager is likely to change substantively, and the call counter may also change.
|
|
|
|
|
|
|
|
|
|
|
|
Implements two field span struct which is comprised of a byref field
that may be an interior pointer to a managed object, or a native
pointer indicating the start of the span, and a length field which
describes the span of access.
Since there is no MSIL operation which assign a byref field, the jit
gets involved and treats the constructor and getter of a special struct
called ByReference that contains an declared IntPtr. This special
struct is then used as a field in the span implementation and recognized
by the runtime as a field that may contain a GC pointer. In
implementation, the ctor of ByReference is converted into an assignment
value is returned by a reverse assignment.
Since there are some dependencies on CoreFX for the span implementation
local testing of the implementation has been done using the
BasicSpanTest.cs in the CoreCLR tests. Once this is checked in and is
propagated to CoreFX the apporopate code in the packages will be enabled
and then may be referenced in CoreCLR tests. At that time more span
tests will be added.
Additional comments and fixes based on code review added.
|
|
Use FEATURE_CER to scope CER code,
and disable CER feature in CoreCLR.
|
|
The "JIT flags" currently passed between the EE and the JIT have traditionally
been bit flags in a 32-bit word. Recently, a second 32-bit word was added to
accommodate additional flags, but that set of flags is definitely "2nd class":
they are not universally passed, and require using a separate set of bit
definitions, and comparing those bits against the proper, 2nd word.
This change replaces all uses of bare DWORD or 'unsigned int' types
representing flags with CORJIT_FLAGS, which is now an opaque type. All
flag names were renamed from CORJIT_FLG_* to CORJIT_FLAG_* to ensure all
cases were changed to use the new names, which are also scoped within the
CORJIT_FLAGS type itself.
Another motivation to do this, besides cleaner code, is to allow enabling the
SSE/AVX flags for x86. For x86, we had fewer bits available in the "first
word", so would have to either put them in the "second word", which, as
stated, was very much 2nd class and not plumbed through many usages, or
we could move other bits to the "second word", with the same issues. Neither
was a good option.
RyuJIT compiles with both COR_JIT_EE_VERSION > 460 and <= 460. I introduced
a JitFlags adapter class in jitee.h to handle both JIT flag types. All JIT
code uses this JitFlags type, which operates identically to the new
CORJIT_FLAGS type.
In addition to introducing the new CORJIT_FLAGS type, the SSE/AVX flags are
enabled for x86.
The JIT-EE interface GUID is changed, as this is a breaking change.
|
|
NATIVE_DLL_SEARCH_DIRECTORIES has higher priority.
|
|
[NativeCallable] in all architectures.
|
|
Fixes https://github.com/dotnet/coreclr/issues/4422
The issue is that JIT didn't pass UMEntryThunk for the managed function
entry point which is invoked from the native function.
The fix is to enable NativeCallable attribute check to get the right entry point.
|
|
/tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/vm/method.cpp:5196:16: error: cannot initialize return object of type 'LPVOID' (aka 'void *') with an rvalue of type 'FARPROC' (aka 'long (*)()')
return GetProcAddress(hMod, (LPCSTR)(size_t)((UINT16)ordinal));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
|
|
|
|
This PR adds support for System V x86_64 ABI classification and calling
convention to the VM and the Jit, including, but not limited to Ubuntu
Linux and Mac OS X.
The general rules outlined in the System V x86_64 ABI (described at
http://www.x86-64.org/documentation/abi.pdf) are followed with a few
little exceptions, described below:
1. The hidden argument for by-value passed structs is always after
the ÎéÎíthisÎéÎí parameter (if there is one.). This is a difference with
the Sysetem V ABI and affects only the internal jit calling conventions.
For PInvoke calls the hidden argument is always the first parameter since
there is no ÎéÎíthisÎéÎí parameter in this case.
2. Managed structs that have no fields are always passed by-value on
the stack.
3. The jit proactively generates frame register frames (with RBP as a
frame register) in order to aid the native OS tooling for stack unwinding
and the like.
|
|
The Tracepoint Providers are built as a separate shared object called libcoreclrtraceptprovider and it is
available in the same directory as libcoreclr.so
By Default the ability of Tracing will not be present
To enable the ability of Tracing, the apps need to be run like:
LD_PRELOAD=libcoreclrtraceptprovider.so ./corerun HelloWorld.exe
For now to change Xplat Event Logging mechanism add any events to:
<root>/src/vm/ClrEtwAll.man
Then regenerate files by running :
<root>/src/inc/genXplatLtnng.pl
Conflicts:
Documentation/building/linux-instructions.md
|