platform/upstream/coreclr - Domain: Dotnet / Core; Licenses: MIT;

Age	Commit message (Collapse)	Author	Files	Lines
2019-01-11	Patch vtable slots and similar when tiering is enabled (#21292)	Koundinya Veluri	1	-1/+9
	Patch vtable slots and similar when tiering is enabled For a method eligible for code versioning and vtable slot backpatch: - It does not have a precode (`HasPrecode()` returns false) - It does not have a stable entry point (`HasStableEntryPoint()` returns false) - A call to the method may be: - An indirect call through the `MethodTable`'s backpatchable vtable slot - A direct call to a backpatchable `FuncPtrStub`, perhaps through a `JumpStub` - For interface methods, an indirect call through the virtual stub dispatch (VSD) indirection cell to a backpatchable `DispatchStub` or a `ResolveStub` that refers to a backpatchable `ResolveCacheEntry` - The purpose is that typical calls to the method have no additional overhead when code versioning is enabled Recording and backpatching slots: - In order for all vtable slots for the method to be backpatchable: - A vtable slot initially points to the `MethodDesc`'s temporary entry point, even when the method is inherited by a derived type (the slot's value is not copied from the parent) - The temporary entry point always points to the prestub and is never backpatched, in order to be able to discover new vtable slots through which the method may be called - The prestub, as part of `DoBackpatch()`, records any slots that are transitioned from the temporary entry point to the method's at-the-time current, non-prestub entry point - Any further changes to the method's entry point cause recorded slots to be backpatched in `BackpatchEntryPointSlots()` - In order for the `FuncPtrStub` to be backpatchable: - After the `FuncPtrStub` is created and exposed, it is patched to point to the method's at-the-time current entry point if necessary - Any further changes to the method's entry point cause the `FuncPtrStub` to be backpatched in `BackpatchEntryPointSlots()` - In order for VSD entities to be backpatchable: - A `DispatchStub`'s entry point target is aligned and recorded for backpatching in `BackpatchEntryPointSlots()` - The `DispatchStub` was modified on x86 and x64 such that the entry point target is aligned to a pointer to make it backpatchable - A `ResolveCacheEntry`'s entry point target is recorded for backpatching in `BackpatchEntryPointSlots()` Slot lifetime and management of recorded slots: - A slot is recorded in the `LoaderAllocator` in which the slot is allocated, see `RecordAndBackpatchEntryPointSlot()` - An inherited slot that has a shorter lifetime than the `MethodDesc`, when recorded, needs to be accessible by the `MethodDesc` for backpatching, so the dependent `LoaderAllocator` with the slot to backpatch is also recorded in the `MethodDesc`'s `LoaderAllocator`, see `MethodDescBackpatchInfo::AddDependentLoaderAllocator_Locked()` - At the end of a `LoaderAllocator`'s lifetime, the `LoaderAllocator` is unregistered from dependency `LoaderAllocators`, see `MethodDescBackpatchInfoTracker::ClearDependencyMethodDescEntryPointSlots()` - When a `MethodDesc`'s entry point changes, backpatching also includes iterating over recorded dependent `LoaderAllocators` to backpatch the relevant slots recorded there, see `BackpatchEntryPointSlots()` Synchronization between entry point changes and backpatching slots - A global lock is used to ensure that all recorded backpatchable slots corresponding to a `MethodDesc` point to the same entry point, see `DoBackpatch()` and `BackpatchEntryPointSlots()` for examples Due to startup time perf issues: - `IsEligibleForTieredCompilation()` is called more frequently with this change and in hotter paths. I chose to use a `MethodDesc` flag to store that information for fast retreival. The flag is initialized by `DetermineAndSetIsEligibleForTieredCompilation()`. - Initially, I experimented with allowing a method versionable with vtable slot backpatch to have a precode, and allocated a new precode that would also be the stable entry point when a direct call is necessary. That also allows recording a new slot to be optional - in the event of an OOM, the slot may just point to the stable entry point. There are a large number of such methods and the allocations were slowing down startup perf. So, I had to eliminate precodes for methods versionable with vtable slot backpatch and that in turn means that recording slots is necessary for versionability.
2018-12-26	desktop port (#21523)	Maoni Stephens	1	-2/+14
	+alloc lock split into SOH and LOH +provisional mode to fix too many gen2 GCs triggered in low mem situation when the heap has heavy pinning fragmentation +better free list usage +premature OOM fixes +3 new configs: GCHeapAffinitizeMask, GCHighMemPercent, GCLOHThreshold (will be documented) YieldProcessor scaling factor is different on core due to the different implementation on core.
2018-11-21	Delete dead/unused code (#21138)	Jan Kotas	1	-22/+0

2018-11-09	Delete dead/unreachable code related to remoting (#20880)	Jan Kotas	1	-9/+0

2018-10-04	Remove AppDomain unload (#20250)	Jan Vorlicek	1	-2/+0
	* Remove AppDomain unload This change removes all code in AppDomain that's related to AppDomain unloading which is obsolete in CoreCLR. It also removes all calls to the removed methods. In few places, I have made the change simpler by taking into account the fact that there is always just one AppDomain.
2018-08-17	Enable Tiered Compilation by default (#19525)	Koundinya Veluri	1	-3/+1
	Enable Tiered Compilation by default 1) Changes the default state of the tiered compilation feature check to be ON BY DEFAULT 2) Removed comments about the source about this being a work in progress. Although it will surely continue to evolve and improve, remaining issues would be better tracked in our issue tracking system with the same default presumption as other runtime features - assume it works unless noted otherwise. 3) Adjusts a number of tests and automated scripts that made assumptions that the default setting of this feature is off. 4) Stop accepting the deprecated env var COMPLUS_EXPERIMENTAL_TieredCompilation. I'm not aware it has any remaining usage but if so we're going to find out. 5) Adjust config names for JitBench
2018-07-23	Remove hosthook api (#19079)	Aaron Robinson	1	-4/+1
	* Remove CallNeedsHostHook() API * Remove IsHostHookEnabled() API and with it related dead code * Remove code enabling host hooks (i.e. COMPlus_GenerateStubForHost) Remove function declarations for creating host hooks Update comments
2018-07-16	Apply tiering's call counting delay more broadly (#18610)	Koundinya Veluri	1	-0/+14
	Apply tiering's call counting delay more broadly Issues - When some time passes between process startup and first significant use of the app, startup perf with tiering can be slower because the call counting delay is no longer in effect - This is especially true when the process is affinitized to one cpu Fixes - Initiate and prolong the call counting delay upon tier 0 activity (jitting or r2r code lookup for a new method) - Stop call counting for a called method when the delay is in effect - Stop (and don't start) tier 1 jitting when the delay is in effect - After the delay resume call counting and tier 1 jitting - If the process is affinitized to one cpu at process startup, multiply the delay by 10 No change in benchmarks.
2018-04-30	Add runtimeconfig.json support for tiered compilation (#17840)	Noah Falk	1	-1/+1

2018-04-10	Fix x86 steady state tiered compilation performance (#17476)	Noah Falk	1	-0/+5
	* Fix x86 steady state tiered compilation performance Also included - a few tiered compilation only test hooks + small logging fix for JitBench Tiered compilation wasn't correctly implementing the MayHavePrecode and RequiresStableEntryPoint policy functions. On x64 this was a non-issue, but due to compact entrypoints on x86 it lead to methods allocating both FuncPtrStubs and Precodes. The FuncPtrStubs would never get backpatched which caused never ending invocations of the Prestub for some methods. Although such code still runs correctly, it is much slower than it needs to be. On MusicStore x86 I am seeing a 20% improvement in steady state RPS after this fix, bringing us inline with what I've seen on x64.
2018-03-29	Fix AssemblyLoadContext.Unloading and ProcessExit for Windows Docker ↵	Daniel Harvey	1	-2/+0
	containers (#17265) On Windows, we need to shutdown the EE when receiving a CTRL_CLOSE_EVENT to we run ProcessExit handlers and other code that relies on ProcessExit working (like AssemblyLoadContext.Unloading). One way we receive this event is when there's a running process in a docker container that has the stop command run against it.
2018-03-28	Removing 'EXPERIMENTAL' from tiered compilation env var (#17283)	Noah Falk	1	-1/+4
	Things have progressed far enough that its time to use a friendlier name. The feature still still has performance aspects that need to be investigated and improved, but I don't want to scare people off simply because it isn't as fast as it could be. This also updates to use a newer CoreFX version for JitBench since that appeared to be broken, and updated some comments and usage of the tieredcompilation variable.
2018-01-25	Enable tiered jitting for R2R methods (#15967)	Koundinya Veluri	1	-0/+10
	Enable tiered jitting for R2R methods - Included R2R methods and generics over value types in CoreLib for tiered jitting. Tier 0 for R2R methods is the precompiled code if available, and tier 1 is selectively scheduled based on call counting. - Added a delay before starting to count calls for tier 1 promotion. The delay is a short duration after frequent tier 0 jitting stops (current heuristic for identifying startup). - Startup time and steady-state performance have improved on JitBench. There is a regression shortly following startup due to call counting and tier 1 jitting, for a short duration before steady-state performance stabilizes. - Added two new config values, one for configuring the call count threshold for promoting to tier 1, and another for specifying the delay from the last tier 0 JIT invocation before starting to count calls
2018-01-23	Delete dead code (#15990)	Jan Kotas	1	-11/+0

2017-12-28	Recognize STA\MTA Attribute For Main Function (#15652)	Anirudh Agnihotry	1	-1/+0
	* Apartment state set for main method * g_fWeownprocess removed and CLRConfig::GetConfigValue(CLRConfig::EXTERNAL_FinalizeOnShutdown) set
2017-12-20	Revert "Respect STA/MTAThread attributes (#15512)"	Jan Kotas	1	-0/+1
	This reverts commit 21cfdb6f5bb8c596aa55cc50892be0bfabee5de3.
2017-12-16	Respect STA/MTAThread attributes (#15512)	Anirudh Agnihotry	1	-1/+0
	* Apartment state set for main method * Default apartment state is Unknown * Removed Extra validation of token * Removed legacy apartment code
2017-11-27	Improve Monitor scaling (#14216)	Koundinya Veluri	1	-0/+10
	Improve Monitor scaling and reduce spinning Part 1: Improve Monitor scaling Fixes https://github.com/dotnet/coreclr/issues/13978 - Refactored AwareLock::m_MonitorHeld into a class LockState with operations to mutate the state - Allowed the lock to be taken by a non-waiter when there is a waiter to prevent creating lock convoys - Added a bit to LockState to indicate that a waiter is signaled to wake, to avoid waking more than one waiter at a time. A waiter that wakes by observing the signal unsets this bit. See AwareLock::EnterEpilogHelper(). - Added a spinner count to LockState. Spinners now register and unregister themselves and lock releasers don't wake a waiter when there is a registered spinner (the spinner guarantees to take the lock if it's available when unregistering itself) - This was necessary mostly on Windows to reduce CPU usage to the expected level in contended cases with several threads. I believe it's the priority boost Windows gives to signaled threads, which seems to cause waiters to much more frequently succeed in acquiring the lock. This causes a CPU usage problem because once the woken waiter releases the lock, on the next lock attempt it will become a spinner. This keeps repeating, converting several waiters into spinners unnecessarily. Before registering spinners, I saw typically 4-6 spinners under contention (with delays inside and outside the lock) when I expected to have only 1-2 spinners at most. - It costs an interlocked operation before and after the spin loop, doesn't seem to be too significant since spinning is a relatively slow path anyway, and the reduction in CPU usage in turn reduces contention on the lock and lets more useful work get done - Updated waiters to spin a bit before going back to waiting, reasons are explained in AwareLock::EnterEpilogHelper() - Removed AwareLock::Contention() and any references (this removes the 10 repeats of the entire spin loop in that function). With the lock convoy issue gone, this appears to no longer be necessary. Perf - On Windows, throughput has increased significantly starting at slightly lower than proc count threads. On Linux, latency and throughput have increased more significantly at similar proc counts. - Most of the larger regressions are in the unlocked fast paths. The code there hasn't changed and is almost identical (minor layout differences), I'm just considering this noise until we figure out how to get consistently faster code generated. - The smaller regressions are within noise range Part 2: Reduce Monitor spinning Fixes https://github.com/dotnet/coreclr/issues/13980 - Added new config value Monitor_SpinCount and Monitor spins for that many iterations, default is 30 (0x1e). This seems to give a somewhat decent balance between latency, fairness, and throughput. Lower spin counts improve latency and fairness significantly and regress throughput slightly, and higher spin counts improve throughput slightly and regress latency and fairness significantly. - The other constants can still be used to disable spinning but otherwise they are no longer used by Monitor - Decreased the number of bits used for tracking spinner count to 3. This seems to be more than enough since only one thread can take a lock at a time, and prevents spikes of unnecessary CPU usage. Tried some things that didn't pan out: - Sleep(0) doesn't seem to add anything to the spin loop, so left it out. Instead of Sleep(0) it can just proceed to waiting. Waiting is more expensive than Sleep(0), but I didn't see that benefit in the tests. Omitting Sleep(0) also keeps the spin loop very short (a few microseconds max). - Increasing the average YieldProcessor() duration per spin iteration improved thorughput slightly but regressed latency and fairness very quickly. Given that fairness is generally worse with part 1 of this change above, it felt like a better compromise to take a small reduction in throughput for larger improvements in latency and fairness. - Tried adding a very small % of lock releases by random wake a waiter despite there being spinners to improve fairness. This improved fairness noticeably but not as much as decreasing the spin count slightly, and it was making latency and throughput worse more quickly. After reducing the % to a point where I was hardly seeing fairness improvements, there were still noticeable latency and throughput regressions. Miscellaneous - Moved YieldProcessorNormalized code into separate files so that they can be included earlier and where needed - Added a max for "optimal max normalized yields per spin iteration" since it has a potential to be very large on machines where YieldProcessor may be implemented as no-op, in which case it's probably not worth spinning for the full duration - Refactored duplicate code in portable versions of MonEnterWorker, MonEnter, and MonReliableEnter. MonTryEnter has a slightly different structure, did not refactor that. Perf - Throughput is a bit lower than before at lower thread counts and better at medium-high thread counts. It's a bit lower at lower thread counts because of two reasons: - Shorter spin loop means the lock will be polled more frequently because the exponential backoff does not get as high, making it more likely for a spinner to steal the lock from another thread, causing the other thread to sometimes wait early - The duration of YieldProcessor() calls per spin iteration has decreased and a spinner or spinning waiter are more likely to take the lock, the rest is similar to above - For the same reasons as above, latency is better than before. Fairness is better on Windows and worse on Linux compared to baseline due to the baseline having differences between these platforms. Latency also has differences between Windows/Linux in the baseline, I suspect those are due to differences in scheduling. - Performance now scales appropriately on processors with different pause delays Part 3: Add mitigation for waiter starvation Normally, threads are allowed to preempt waiters to acquire the lock. There are cases where waiters can be easily starved as a result. For example, a thread that holds a lock for a significant amount of time (much longer than the time it takes to do a context switch), then releases and reacquires the lock in quick succession, and repeats. Though a waiter would be woken upon lock release, usually it will not have enough time to context-switch-in and take the lock, and can be starved for an unreasonably long duration. In order to prevent such starvation and force a bit of fair forward progress, it is sometimes necessary to change the normal policy and disallow threads from preempting waiters. A new bit was added to LockState and ShouldNotPreemptWaiters() indicates the current state of the policy. - When the first waiter begins waiting, it records the current time as a "waiter starvation start time". That is a point in time after which no forward progress has occurred for waiters. When a waiter acquires the lock, the time is updated to the current time. - Before a spinner begins spinning, and when a waiter is signaled to wake, it checks whether the starvation duration has crossed a threshold (currently 100 ms) and if so, sets ShouldNotPreemptWaiters() When unreasonable starvation is occurring, the lock will be released occasionally and if caused by spinners, spinners will be starting to spin. - Before starting to spin, if ShouldNotPreemptWaiters() is set, the spinner will skip spinning and wait instead. Spinners that are already registered at the time ShouldNotPreemptWaiters() is set will stop spinning as necessary. Eventually, all spinners will drain and no new ones will be registered. - After spinners have drained, only a waiter will be able to acquire the lock. When a waiter acquires the lock, or when the last waiter unregisters itself, ShouldNotPreemptWaiters() is cleared to restore the normal policy.
2017-10-16	Delete dead code (#14521)	Jan Kotas	1	-1/+0

2017-08-22	Introduce COMPlus_GDBJitEmitDebugFrame (#13515)	Jonghyun Park	1	-0/+7
	* Introduce COMPlus_GDBJitEmitDebugFrame * Use a proper #ifdef macro
2017-08-18	Introduce COMPlus_GDBJitElfDump (#13448)	Jonghyun Park	1	-0/+12
	* Add COMPlus_GDBJitElfDump * Fix Release build error * Add flags in EEConfig
2017-08-07	Cleanup code access security from the unmanaged runtime (#13241)	Jan Kotas	1	-4/+0

2017-06-01	[Local GC] Obtaining configuration information (#11379)	Sean Gillespie	1	-8/+0
	* [Local GC] Skeleton for GC configuration * Initial tweaks after design feedback: 1) Use string keys instead of enums. Upon receiving a string key, the EE looks at it and, if it's something that comes from startup flags, responds using the startup flag information. Otherwise, it forwards the string onto CLRConfig. 2) Add a mechanism for getting string configuration values from the EE. This includes adding a RAII wrapper around strings so that they are freed correctly. * Remove uses of g_pConfig from the GC and replace with GCConfig * Use the GCConfig system for the GC log * Fix poorly-named parameter * Add documentation and caching of bool and int configs obtained from the EE * Remove AppDomainLeaks as dead code * Remove GC trace configs as dead code * Repair unix build * Fix an issue where we started the GC in the wrong latency mode * Fix the unix build * Pipe GCRetainVM configuration to the GC * Dead code removal in the GC sample * EEConfig -> GCConfig for heap verification constants in the GC * Populate config information for bools and ints eagerly at startup * Initialize g_theGCToCLR before initializing GCConfig * Propegate HoardVM config to the GC * Fix an incorrect comment
2017-05-17	Finish deleting dead CAS code from CoreLib (#11436)	Jan Kotas	1	-31/+0
	Fixes #9321 and deletes CleanupToDoList.cs Delete unmanaged security implementation
2017-04-27	Remove support for the x86 compat JIT from .NET Core.	Pat Gavlin	1	-31/+0
	These changes remove support for the x86 compat JIT from the build, the runtime, and the various perf/test scripts. Fixes #10733, #10734.
2017-03-29	Tiered Compilation step 1	noahfalk	1	-0/+8
	Tiered compilation is a new feature we are experimenting with that aims to improve startup times. Initially we jit methods non-optimized, then switch to an optimized version once the method has been called a number of times. More details about the current feature operation are in the comments of TieredCompilation.cpp. This is only the first step in a longer process building the feature. The primary goal for now is to avoid regressing any runtime behavior in the shipping configuration in which the complus variable is OFF, while putting enough code in place that we can measure performance in the daily builds and make incremental progress visible to collaborators and reviewers. The design of the TieredCompilationManager is likely to change substantively, and the call counter may also change.
2017-03-13	Merge pull request #10153 from adityamandaleeka/remove_stress_thread	Aditya Mandaleeka	1	-4/+0
	Remove STRESS_THREAD
2017-03-13	Remove STRESS_THREAD.	Aditya Mandaleeka	1	-4/+0

2017-03-05	Delete IsNonW8PFrameworkAPI checks (#9964)	Jan Kotas	1	-4/+0
	Dead code in CoreCLR
2017-02-15	Remove never defined FEATURE_WIN_DB_APPCOMPAT	danmosemsft	1	-28/+0

2017-02-14	Remove never defined FEATURE_REMOTING	danmosemsft	1	-3/+0

2017-02-14	Remove never defined FEATURE_INCLUDE_ALL_INTERFACES	danmosemsft	1	-11/+0

2017-02-12	Remove never defined FEATURE_FUSION	danmosemsft	1	-34/+0

2017-02-10	Revert "Remove always defined FEATURE_CORRUPTING_EXCEPTIONS"	danmosemsft	1	-0/+2
	This reverts commit b0dab0d6de90a38dfbf0d6b2039a7b8f5269d802.
2017-02-10	Remove always defined FEATURE_CORRUPTING_EXCEPTIONS	danmosemsft	1	-2/+0

2017-02-10	Remove always defined FEATURE_CORECLR	danmosemsft	1	-376/+0

2016-07-05	only use config on coreclr	Maoni Stephens	1	-0/+5
	[tfs-changeset: 1616092]
2016-07-02	Added 2 configs for Server GC	Maoni0	1	-0/+5
	complus var GCNoAffinitize or project.json System.GC.NoAffinitize - specify 1/true to disable hard affinity of Server GC threads to CPUs complus var GCHeapCount or project.json System.GC.HeapCount - specify the # of Server GC threads/heaps, must be smaller than the # of logical CPUs the process is allowed to run on, ie, if you don't specifically affinitize your process it means the # of total logical CPUs on the machine; otherwise this is the # of logical CPUs you affinitized your process to.
2016-06-07	Fix build issue http://buildstatus/Issues/Issues.aspx?iid=802303	Gaurav Khanna	1	-0/+3
	Ready2Run opt-out should be under #ifdef since closed Arm64 build does not compile with Ready2Run enabled. [tfs-changeset: 1611275]
2016-06-06	Add config switch to selectively disable R2R images	John Chen (CLR)	1	-0/+21
	New config switch ReadyToRunExcludeList can be used to specify a list of assembly simple names (separated by ';') that cannot use Ready to Run images.
2016-05-25	Explicitly check CLRConfig value to determine whether concurrent GC was forced.	Aditya Mandaleeka	1	-7/+15

2016-03-25	Add new configuration mechanism for CoreCLR.	Aditya Mandaleeka	1	-5/+16

2016-02-19	This Change Adds initial Support for LongFiles in the VM,	Rama Krishnan Raghupathy	1	-26/+42
	They are: 1. Wrappers for OS APIs which take or return PATHS 2. Fixing the usage of following Api's: GetEnvironmentVariableW SearchPathW GetShortPathNameW GetLongPathNameW GetModuleFileName Work remaining: Remove fixed size buffers in the VM
2016-01-27	Update license headers	dotnet-bot	1	-4/+3

2015-01-30	Initial commit to populate CoreCLR repo	dotnet-bot	1	-0/+2186
	[tfs-changeset: 1407945]