Age | Commit message (Collapse) | Author | Files | Lines |
|
This function was recently changed to just return the
MethodDesc::GetLoaderAllocator. This is a cleanup that removes the
function completely and replaces all of its usages.
|
|
Patch vtable slots and similar when tiering is enabled
For a method eligible for code versioning and vtable slot backpatch:
- It does not have a precode (`HasPrecode()` returns false)
- It does not have a stable entry point (`HasStableEntryPoint()` returns false)
- A call to the method may be:
- An indirect call through the `MethodTable`'s backpatchable vtable slot
- A direct call to a backpatchable `FuncPtrStub`, perhaps through a `JumpStub`
- For interface methods, an indirect call through the virtual stub dispatch (VSD) indirection cell to a backpatchable `DispatchStub` or a `ResolveStub` that refers to a backpatchable `ResolveCacheEntry`
- The purpose is that typical calls to the method have no additional overhead when code versioning is enabled
Recording and backpatching slots:
- In order for all vtable slots for the method to be backpatchable:
- A vtable slot initially points to the `MethodDesc`'s temporary entry point, even when the method is inherited by a derived type (the slot's value is not copied from the parent)
- The temporary entry point always points to the prestub and is never backpatched, in order to be able to discover new vtable slots through which the method may be called
- The prestub, as part of `DoBackpatch()`, records any slots that are transitioned from the temporary entry point to the method's at-the-time current, non-prestub entry point
- Any further changes to the method's entry point cause recorded slots to be backpatched in `BackpatchEntryPointSlots()`
- In order for the `FuncPtrStub` to be backpatchable:
- After the `FuncPtrStub` is created and exposed, it is patched to point to the method's at-the-time current entry point if necessary
- Any further changes to the method's entry point cause the `FuncPtrStub` to be backpatched in `BackpatchEntryPointSlots()`
- In order for VSD entities to be backpatchable:
- A `DispatchStub`'s entry point target is aligned and recorded for backpatching in `BackpatchEntryPointSlots()`
- The `DispatchStub` was modified on x86 and x64 such that the entry point target is aligned to a pointer to make it backpatchable
- A `ResolveCacheEntry`'s entry point target is recorded for backpatching in `BackpatchEntryPointSlots()`
Slot lifetime and management of recorded slots:
- A slot is recorded in the `LoaderAllocator` in which the slot is allocated, see `RecordAndBackpatchEntryPointSlot()`
- An inherited slot that has a shorter lifetime than the `MethodDesc`, when recorded, needs to be accessible by the `MethodDesc` for backpatching, so the dependent `LoaderAllocator` with the slot to backpatch is also recorded in the `MethodDesc`'s `LoaderAllocator`, see `MethodDescBackpatchInfo::AddDependentLoaderAllocator_Locked()`
- At the end of a `LoaderAllocator`'s lifetime, the `LoaderAllocator` is unregistered from dependency `LoaderAllocators`, see `MethodDescBackpatchInfoTracker::ClearDependencyMethodDescEntryPointSlots()`
- When a `MethodDesc`'s entry point changes, backpatching also includes iterating over recorded dependent `LoaderAllocators` to backpatch the relevant slots recorded there, see `BackpatchEntryPointSlots()`
Synchronization between entry point changes and backpatching slots
- A global lock is used to ensure that all recorded backpatchable slots corresponding to a `MethodDesc` point to the same entry point, see `DoBackpatch()` and `BackpatchEntryPointSlots()` for examples
Due to startup time perf issues:
- `IsEligibleForTieredCompilation()` is called more frequently with this change and in hotter paths. I chose to use a `MethodDesc` flag to store that information for fast retreival. The flag is initialized by `DetermineAndSetIsEligibleForTieredCompilation()`.
- Initially, I experimented with allowing a method versionable with vtable slot backpatch to have a precode, and allocated a new precode that would also be the stable entry point when a direct call is necessary. That also allows recording a new slot to be optional - in the event of an OOM, the slot may just point to the stable entry point. There are a large number of such methods and the allocations were slowing down startup perf. So, I had to eliminate precodes for methods versionable with vtable slot backpatch and that in turn means that recording slots is necessary for versionability.
|
|
The functions were incorrectly using 4 byte loads to probe for
the address validity. While the comment on JIT_MemCpy requires
4 byte aligned address, it doesn't match the way JIT uses it and
the Windows version of the function works with unaligned addresses
too.
This bug was discovered as a crash in an application where the
JIT_MemCpy was called with count=2 and an address that was two
bytes below the end of a memory page where the following page
was not mapped.
|
|
* Implementation of R2R vtable call thunks. These thunks will fetch the target code pointer from the vtable of the input thisPtr, and jump to that address.
This is especially helpful with generics, since we can avoid a generic dictionary lookup cost for a simple vtable call.
Overall, these thunks cause the CPU to have less branch mispredictions, and give a small performance boost to vtable calls.
These stubs are under VirtualCallStubManager so that the managed debugger can handle stepping through them.
|
|
|
|
* Remove AppDomain unload
This change removes all code in AppDomain that's related to AppDomain
unloading which is obsolete in CoreCLR. It also removes all calls to the
removed methods.
In few places, I have made the change simpler by taking into account the
fact that there is always just one AppDomain.
|
|
* Disable ASMCONSTANTS_C_ASSERT in cross-bitness scenario in src/vm/ceeload.cpp
* Adjust MAXFIELDMARSHALERSIZE for cross-bitness scenario in src/vm/arm/cgencpu.h
* Make ALLOC_ALIGN_CONSTANT host specific in src/inc/stdmacros.h
* Make PRECODE_ALIGNMENT host specific in src/vm/arm/cgencpu.h
* Disable unreachable code in src/vm/arm/stubs.cpp
* Adjust CorDBIPC_BUFFER_SIZE for cross-bitness scenario in src/debug/inc/dbgipcevents.h
* Disable warning C4359 in src/vm/arm/cgencpu.h
* Deal with warning C4267: 'initializing': conversion from 'size_t' to 'int' in src/vm/stublink.cpp
* Deal with warning C4267: 'initializing': conversion from 'size_t' to 'int' in src/vm/callingconvention.h
* Disable unreachable REGDISPLAY constructor in src/inc/regdisp.h
|
|
|
|
Enable assembly unloading
* Allow PInvoke methods on collectible assemblies
* Fix test unloadability
Several hundreds of tests were using Helper class that created
GCHandle, but never freed it. That prevented unloading of those
tests. The change modifies the Helper class to keep the handle
in a finalizable object.
Several GCHandle related tests were not freeing the GCHandle they
allocated, so this change adds freeing them to enable the unloading.
* Add missing error messages to the resources
* Fix shuffle thunk cache for unloadability
* Add GetLoaderAllocator to ICLRPrivBinder
|
|
* Separate sections READONLY_VCHUNKS and READONLY_DICTIONARY
* Remove relocations for second-level indirection of Vtable in case FEATURE_NGEN_RELOCS_OPTIMIZATIONS is enabled.
Introduce FEATURE_NGEN_RELOCS_OPTIMIZATIONS, under which NGEN specific relocations optimizations are enabled
* Replace push/pop of R11 in stubs with
- str/ldr of R4 in space reserved in epilog for non-tail calls
- usage of R4 with hybrid-tail calls (same as for EmitShuffleThunk)
* Replace push/pop of R11 for function epilog with usage of LR as helper register right before its restore from stack
|
|
BaseSize for System.String was not set correctly. It caused unnecessary extra 8 bytes to be allocated at the end of strings that had `Length % 4 < 2` on 64-bit platforms.
This change makes affected strings proportionally cheaper. For example, `new string('a', 1)` in a long-running loop is 7% faster.
|
|
There were two problems:
* The illegal instruction 0xde01 used for INTERRUPT_INSTR_CALL doesn't
generate SIGILL, but SIGTRAP, since this is the code used for
breakpoints.
* The USE_REDIRECT_FOR_GCSTRESS was defined even for FEATURE_PAL for
ARM, which is incorrect and resulted in explicit redirect frame not
being created in DoGcStress and thus the GC stack walk was skipping
managed frames that it should walk.
|
|
The JIT write barrier helpers have a custom calling convention that
avoids killing most registers. The JIT was not taking advantage of
this, and thus was killing unnecessary registers when a write barrier
was necessary. In particular, some integer callee-trash registers
are unaffected by the write barriers, and no floating-point register
is affected.
Also, I got rid of the `FEATURE_WRITE_BARRIER` define, which is always
set. I also put some code under `LEGACY_BACKEND` for easier cleanup
later. I removed some unused defines in target.h for some platforms.
|
|
There was one more change needed to make the unwinding work properly.
Pushes in some prologs were missing the unwinder annotation.
The fix is to use PROLOG_PUSH for them.
To make things in this file consistent, I've also replaced pops in
epilogs with EPILOG_POP macro and vpush / vpop with PROLOG_VPUSH /
PROLOG_VPOP, although it is not functionally necessary.
With these changes, all the EH related issues are gone.
|
|
* Fix DelayLoad_MethodCall unwinding
Unwinding through DelayLoad_MethodCall was broken due to the overwriting
of R7 which is used as a frame pointer. That caused some managed
exceptions to cause abort with unhandled PAL_SEHException.
This change fixes the problem by using a different register.
* Fix one more spot with the same issue
|
|
|
|
|
|
GetLargestOnDieCacheSize into GetCacheSizePerLogicalCpu (#15975)
* refactor: combine GetLargestOnDieCacheSize and GetLogicalCpuCount in GetCacheSizePerLogicalCpu
* Perform PhysicalMemoryLimit check also for workstation GC
|
|
Enable tiered jitting for R2R methods
- Included R2R methods and generics over value types in CoreLib for tiered jitting. Tier 0 for R2R methods is the precompiled code if available, and tier 1 is selectively scheduled based on call counting.
- Added a delay before starting to count calls for tier 1 promotion. The delay is a short duration after frequent tier 0 jitting stops (current heuristic for identifying startup).
- Startup time and steady-state performance have improved on JitBench. There is a regression shortly following startup due to call counting and tier 1 jitting, for a short duration before steady-state performance stabilizes.
- Added two new config values, one for configuring the call count threshold for promoting to tier 1, and another for specifying the delay from the last tier 0 JIT invocation before starting to count calls
|
|
Removed all usages of AppDomainLeaks configuration option and
CHECK_APP_DOMAIN_LEAKS feature
Fix #12094
|
|
Improve UMEntryThunkCode::Poison to produce diagnostic message
when collected delegate was called.
|
|
|
|
* Added TailCall ELT hook for arm32 Linux
* fixed review
|
|
(#13933)
* Implement optimization case for CreateDictionaryLookupHelper
Signed-off-by: Hyung-Kyu Choi <hk0110.choi@samsung.com>
* Reenable mainv1/mainv2 tests
|
|
Linux and Windows arm64 are using the regular C/C++ thread local statics. This change unifies the remaining Windows architectures to be on the same plan.
|
|
* refactored arm, arm64, amd64 and x86 to signal about icache flush and ee restarts
* refactored gc init stage to stomp write barrier (hence flush icache) only once
* review fixes, care taken of icache invalidation during StompResize
* fixed heap boundaries initialization bug introduced after refactoring gc.cpp
* stylistic review fixe
* global variable rename
* global variable rename once more
|
|
* [RyuJIT/ARM32] Implement CreateDictionaryLookupHelper only via run-time helper
Implement CreateDictionaryLookupHelper only via run-time helper
* Add assertion for checking CORINFO_USEHELPER
|
|
* [Linux/ARM] Fix managed breakpoints
This commit introduces the following changes in order to enable
ICorDebug-based debuggers to use breakpoints on ARM Linux:
* Use 0xde01 as breakpoint instruction on ARM Linux.
ARM reference recommends to use 0xdefe as a breakpoint instruction,
but Linux kernel generates SIGILL for this instruction.
The 0xde01 instruction causes the kernel to generate SIGTRAP.
* Fix SIGTRAP handling on ARM Linux.
Unlike x86, when SIGTRAP happens on ARM Linux, the PC points
at the break instruction. But the rest of the code expects
that it points to an instruction after the break,
so we adjust the PC at the start of HandleHardwareException().
* Enable ARM single stepping for PAL.
Handle single stepping for PAL path the same way as for non-PAL path.
Also enable ArmSingleStepper executable buffer by allocating it
from system global loader executable heap.
* Hande ARM single step only when debugger is attached, fix comments and code style
* Pass existing Thread object to HandleArmSingleStep
|
|
|
|
|
|
* Add FEATURE_LOADER_HEAP_GUARD feature
* Invoke memset only for reclaimed regions
* Enable FEATURE_LOADER_HEAP_GUARD by default
* Insert trap inside UMEntryThunk::Terminate
* Make all exectuable heaps not to zero-initialize itself
Use fZeroInit (instead of fMakeRelazed)
* Add comment
* Revert unnecessary changes
* Add and use 'Poison' method to insert a trap
* Do NOT invoke FlushInstructionCache
* Update comment
* Add comment on ARM Poisoning instruction
* Use X86_INSTR_INT3 instead of 0xCC
|
|
helpers (#12369)
* Remove direct usage of type handle in JIT_NewArr1, with except of retrieving template method table.
* Assert that array type descriptor is loaded when array object's method table is set.
* Pass template method tables instead of array type descriptors to array allocation helpers.
|
|
|
|
Fixes #9321 and deletes CleanupToDoList.cs
Delete unmanaged security implementation
|
|
|
|
single-proc scenarios (#10065)
* Remove usage of the generation table from the EE by introducing an
EE-owned GC alloc context used for allocations on single-proc machines.
* Move the GC alloc lock to the EE side of the interface
* Repair the Windows ARM build
* Move the decision to use per-thread alloc contexts to the EE
* Rename the lock used by StartNoGCRegion and EndNoGCRegion to be more indicative of what it is protecting
* Address code review feedback 2 (enumerate the global alloc context as a part of GCToEEInterface)
* Code review feedback (3)
* Address code review feedback (move some GC-internal globals to gcimpl.h and gc.cpp)
* g_global_alloc_lock is a dword, not a qword - fixes a deadlock
* Move GlobalAllocLock to gchelpers.cpp and switch to preemptive mode when spinning
* Repair the Windows x86 build
|
|
* [x86/linux] Add IsIPinVirtualStub() on x86/linux #9691
To pass Loader.classloader.methodoverriding.regressions.549411.exploit
test failure on x86/linux. This patch is from #5542.
|
|
|
|
This change removes NakedThrowHelper function for Unix since it was not used.
It also ifdefs out its upstream callers.
|
|
|
|
|
|
|
|
|
|
* add clang 3.7 support
* Removing
__FakePrologName="DelayLoad_Helper\suffix\()_FakeProlog"
based on https://github.com/dotnet/coreclr/issues/4332#issuecomment-271990909
|
|
* Appends THUMB bit for RUNTIME_FIXUP_HELPER address
* Revise GetEEFuncEntryPoint (for ARM) and use it to set thumb bit
* Uses GetEEFuncEntryPoint instead of GFN_TADDR
|
|
Move the GC behind an interface and use that interface in the VM
|
|
modifying the VM to utilize this interface.
Introduce an interface separating the GC and the rest of the VM
Remove static members of both IGCHeap and IGCHeapInternal and move the management of the singular GC heap to the VM.
Rename uses of IGCHeap in the VM to GCHeapHolder, as well as other misc. renames throughout the VM and GC.
Split each interface function into categories, document them, use consistent formatting across the interface
Undo some accidental find/replace collateral damage
Remove all ifdefs from the GC interface
Deduplicate function declarations between IGCHeap and IGCHeapInternal, expose AllocAlign8 through the interface and the reference to alloc_context to repair the ARM build
Paper cut: false -> nullptr
Repair the ARM and x86 builds
Rename GCHeapHolder -> GCHeapUtilities and address documentation feedback
Rebase against master
Rename gcholder.h/cpp -> gcheaputilities.h/cpp
Fix an uninitialized field on alloc_context causing test failures on clang
Rename the include guard for gcheaputilities.h
Un-breaks SOS by making the following changes:
1) Instructs the DAC to look for IGCHeap::gcHeapType by name,
instead of assuming that it exists near g_pGCHeap,
2) Eliminate all virtual calls on IGCHeap in the DAC, since we cannot
dispatch on an object in another process,
3) Because of 2, expose the number of generations past the GC interface
using a static variable on IGCHeap that the DAC can read directly.
repair the Windows build
|
|
Enable FunctionEnter/FunctionLeave callbacks on ARM
|
|
|
|
In ARM/Linux, UMThunkStub currently pushes r0-r3 and r12 without .pad,
which results in #6787.
This commit revises the prolog and epilog of UMThunkStub to fix 6787.
(In addition, personality routine is addes as ARM64/AMD64 already does.)
|