summaryrefslogtreecommitdiff
path: root/src/vm/threads.h
AgeCommit message (Collapse)AuthorFilesLines
2019-04-09Remove Unix CPU groups emulationJan Vorlicek1-1/+2
This change removes CPU groups emulation from Unix PAL and modifies the GC and thread pool code accordingly.
2019-04-03Remove ADID and ADIndex from CoreCLR (#23588)David Wrighton1-327/+3
- Remove concept of AppDomain from object api in VM - Various infrastructure around entering/leaving appdomains is removed - Add small implementation of GetAppDomain for use by DAC (to match existing behavior) - Simplify finalizer thread operations - Eliminate AppDomain::Terminate - Remove use of ADID from stresslog - Remove thread enter/leave tracking from AppDomain - Remove unused asm constants across all architectures - Re-order header inclusion order to put gcenv.h before handletable - Remove retail only sync block code involving appdomain index
2019-03-14fix conversion issuesSinan Kaya1-1/+1
2019-03-05Remove dead AppDomain unload code (#23026)Steve MacLean1-18/+4
2019-03-05Remove dead ContainToAppDomain (#23021)Steve MacLean1-2/+1
* Remove dead ContainToAppDomain * Respond to feedback
2019-03-01Implement Serialization GuardMorgan Brown1-0/+6
Add Serialization Guard API and consume it in CoreLib targets
2019-02-09Move eventpipe buffer to TLS (#21817)Sung Yoon Whang1-29/+0
* start ripping out eventpipe buffer to tls * can now emit events from gc threads * cleanup * more cleanup * more cleanup * tested on linux * Addressing PR comments * Move things around a bit to build in Linux * change eventpipe buffer deallocation code * more cleanup * this while loop doesnt do anything now * Fix build * fixing build * More cleanup * more pr comments * Fix unix build * more pr comments * trying to add a message to assertion that seems to be causing CIs to fail * more pr feedback * handle non-2-byte aligned string payloads inside payload buffers * some more cleanup * Fix off by one error in null index calculation * Make Get/SetThreadEventBufferList a static member of ThreadEventBufferList * make only the methods public in ThreadEventBufferList * Addressing noah's comments * fix comment and last off by 1 error
2019-01-26Cleanup stackoverflow handling leftovers (#22228)Jan Kotas1-21/+1
2019-01-24Remove obsolete thread abortion flags. (#22185)Filip Navara1-14/+0
2019-01-24Remove no-op holder stack validation. (#22182)Filip Navara1-6/+2
2019-01-23Remove all traces of FEATURE_STACK_PROBE. (#22149)Filip Navara1-79/+0
2019-01-23Remove unused thread abortion methods. (#22147)Filip Navara1-9/+0
2019-01-04Delete unused fFullReset argument (#21814)Jan Kotas1-4/+1
2019-01-03Cleanup current culture handling in the unmanaged runtime (#21706)Jan Kotas1-82/+6
Large portion of the current culture handling in the unmanaged runtime inherited from desktop has been no-op. The nativeInitCultureAccessors QCall that it used to depend on desktop got (almost) never called in CoreCLR. - Delete resetting of current culture on threadpool threads. It was needed in desktop because of a very tricky flow of current culture between appdomains. It is superseded by the flowing the current culture via AsyncLocal in CoreCLR. - Comment out fetch of managed current culture for unmanaged resource lookup. It has number of problems that are not easy to fix. We are not localizing the unmanaged runtime currently anyway, so it is ok to just comment it out. - Fix the rest to call CultureInfo directly without going through Thread.CurrentThread
2018-12-10Delete vm/context.* (#21459)Jan Kotas1-85/+10
* Delete vm/context.* Leftover from remoting
2018-12-03Remove IsNeutralDomain() (#21318)Steve MacLean1-18/+0
* Remove IsNeutralDomain() * PR feedback
2018-11-30Remove dead codeSteve MacLean1-24/+0
2018-11-21Delete dead/unused code (#21138)Jan Kotas1-1/+1
2018-11-16Fix unloadability races (#21024)Jan Vorlicek1-12/+76
* Fix LoaderAllocator::AllocateHandle When another thread wins the race in growing the handle table, the code was not refreshing the slotsUsed local to the new up to date value. This was leading to overwriting / reusing a live handle. This change fixes it. * Embed ThreadLocalBlock in Thread Instead of allocating ThreadLocalBlock dynamically, embed it in the Thread. That solves race issue between thread destruction and LoaderAllocator destruction. The ThreadLocalBlock could have been deleted during Thread shutdown while the LoaderAllocator's destruction would be working with it.
2018-11-15Delete HAS_FLS_SUPPORT and related code (#21035)Jan Kotas1-112/+1
2018-11-13Change GetAppDomain to return AppDomain from the global static (#20910)Jan Vorlicek1-1/+1
* Change GetAppDomain to return it from the global static The current implementation of the GetAppDomain takes it from the TLS for the current thread. But we only have one AppDomain in the system, so we can change it to return just that one. I have still left the ThreadLocalInfo.m_pAppDomain and its setter present, because SOS uses that to access the AppDomain and the SOS needs to be runtime versino agnostic. This makes it to perform better for Unix where accessing TLS is not trivial. * Move the AppDomain instance pointer to own static To enable access to the one and only AppDomain without unnecessary indirections, I have moved the pointer out of the SystemDomain class.
2018-11-06Completed the lock reversal workAndrew Au1-2/+0
2018-11-06Firing the GC events within the thread suspensionAndrew Au1-0/+2
2018-10-31Clean up string literal implicit const casting and some two-phase lookup ↵Jeremy Koritzinsky1-1/+1
nits on Windows (#20730) * Remove implicit c-string const casting and clean up some C++ standards conformance bugs. * Fix const string conversion in FCSigCheck.
2018-10-26Remove some dead code related to cross-appdomain exceptions. (#20634)Austin Wise1-25/+0
2018-10-16Remove per-AppDomain TLB (#20423)Jan Vorlicek1-35/+0
Since there is only one AppDomain, there is no need for a per-AppDomain TLB table for each Thread. This change removes that table and thus gets rid of the extra indirection needed to access the TLB.
2018-10-04Enable thread statics for collectible classes (#19944)Jan Vorlicek1-0/+4
* Enable thread statics for collectible classes This change removes checks that were preventing usage of thread statics in collectible classes and also implements all the necessary changes. The handles that hold arrays with thread statics are allocated from LoaderAllocator for collectible classes instead of using the global strong handle like in the case of non-collectible classes. The change very much mimics what is done for regular statics. This change also adds ability to reuse freed handles to the LoaderAllocator handle table. Freed handle indexes are stored into a stack and when a new handle allocation is requested, the indices from this stack are used first. Due to the code path from which the FreeTLM that in turn frees the handles is called, I had to modify the critical section flags and also refactor the handle allocation so that the actual managed array representing the handle table is allocated out of the critical section. When I was touching the code, I have also moved the code that was dealing with handles that are not stored in the LoaderAllocator handle tables out of the critical section, since there is no point in having it inside of it.
2018-10-04Remove AppDomain unload (#20250)Jan Vorlicek1-44/+1
* Remove AppDomain unload This change removes all code in AppDomain that's related to AppDomain unloading which is obsolete in CoreCLR. It also removes all calls to the removed methods. In few places, I have made the change simpler by taking into account the fact that there is always just one AppDomain.
2018-07-10GS cookie check fix for debugger stackwalks portJuan Sebastian Hoyos Ayala1-0/+8
This bug fix is a port from the equivalent fix in framework. The debugger tried performing a stackwalk in the epilog due to the JIT incorrectly reporting epilogue information. This caused an invalid GS cookie to be checked and caused the debugger to crash. A flag was added to allow debug stackwalks to skip the cookie check.
2018-06-07Use atomic ops in CommitGCStressInstructionUpdate()Steve MacLean1-5/+15
2018-05-31Fix HasPendingGCStressInstructionUpdate CONSISTENCY_CHECKSteve MacLean1-2/+3
2018-05-30Merge pull request #18173 from sdmaclea/PR-FIX-GCSTRESS-ASSERTIONBruce Forstall1-4/+4
Fix GCStress assertion
2018-05-30Improve the labeling of .NET Threads. (#18193)Vance Morrison1-1/+1
There was already some support for labeling threads using the Window SetThreadDescription API, however it was missing some important cases (like labeling the ThreadPool and GC server and Background threads). Fix this. Also make the naming consistant (they all start with .NET). These names show up in PerfView traces and can be used by debuggers or other profilers as well.
2018-05-29Fix GCStress assertionSteve MacLean1-4/+4
2018-04-19GCStress: try to reduce races and tolerate races better (#17330)Andy Ayers1-4/+4
This change addresses races that cause spurious failures in when running GC stress on multithreaded applications. * Instruction update race Threads that hit a gc cover interrupt where gc is not safe can race to overrwrite the interrupt instruction and change it back to the original instruction. This can cause confusion when handling stress exceptions as the exception code raised by the kernel may be determined by disassembling the instruction that caused the fault, and this instruction may now change between the time the fault is raised and the instruction is disassembled. When this happens the kernel may report an ACCESS_VIOLATION where there was actually an attempt to execute a priveledged instruction. x86 already had a tolerance mechanism here where when gc stress was active and the exception status was ACCESS_VIOLATION the faulting instruction would be retried to see if it faults the same way again. In this change we extend this to tolerance to cover x64 and also enable it regardless of the gc mode. We use the exception information to further screen as these spurious AVs look like reads from address 0xFF..FF. * Instrumentation vs execution race The second race happens when one thread is jitting a method and another is about to call the method. The first thread finishes jitting and publishes the method code, then starts instrumenting the method for gc coverage. While this instrumentation is ongoing, the second thread then calls the method and hits a gc interrupt instruction. The code that recognizes the fault as a gc coverage interrupt gets confused as the instrumentation is not yet complete -- in particular the m_GcCover member of the MethodDesc is not yet set. So the second thread triggers an assert. The fix for this is to instrument for GcCoverage before publishing the code. Since multiple threads can be jitting a method concurrently the instrument and public steps are done under a lock to ensure that the instrumentation and code are consistent (come from the same thread). With this lock in place we have removed the secondary locking done in SetupGcCoverage as it is no longer needed; only one thread can be instrumenting a given jitted method for GcCoverage. However we retain a bailout` clause that first looks to see if m_GcCover is set and if so skips instrumentation, as there are prejit and rejit cases where we will retry instrumentation. * Instruction cache flushes In some cases when replacing the interrupt instruction with the original the instruction cache was either not flushed or not flushed with sufficient length. This possibly leads to an increased frequency of the above races. No impact expected for non-gc stress scenarios, though some of the code changes are in common code paths. Addresses the spurious GC stress failures seen in #17027 and #17610.
2018-04-17Unix/x64 ABI cleanupCarol Eidt1-1/+1
Eliminate `FEATURE_UNIX_AMD64_STRUCT_PASSING` and replace it with `UNIX_AMD64_ABI` when used alone. Both are currently defined; it is highly unlikely the latter will work alone; and it significantly clutters up the code, especially the JIT. Also, fix the altjit support (now `UNIX_AMD64_ABI_ITF`) to *not* call `ClassifyEightBytes` if the struct is too large. Otherwise it asserts.
2018-02-26Fixed mixed mode attach/JIT debugging. (#16552)Mike McLaughlin1-15/+0
Fixed mixed mode attach/JIT debugging. The mixed mode debugging attach uses TLS slot to communicate between debugger break-in thread and the right side. Unfortunately, the __thread static variables cannot be used on debugger breakin thread because of it does not have storage allocated for them. The fix is to switch the storage for debugger word to classic TlsAlloc allocated slot that works fine on debugger break-in thread. There was also problem (that is also in 2.0) where the WINNT_OFFSETOF__TEB__ThreadLocalStoragePointer was using the define for 64/32 bit and ended up always the 32 bit Windows value. This caused the right side GetEEThreadValue, GetEETlsDataBlock unmanaged thread functions to always fail.
2018-02-14Implement WaitHandle.SignalAndWait on Unix (#16383)Koundinya Veluri1-4/+0
Part of fix for https://github.com/dotnet/coreclr/issues/10441
2018-01-29Add ActivityId Support to EventPipe (#16055)Brian Robbins1-0/+18
2017-12-20Fix build error when using VS2015. (#15598)Brian Robbins1-1/+1
2017-11-18Delete unused Thread::YieldTask (#15091)Jan Kotas1-12/+2
2017-10-31Clean up YieldProcessorNormalized (#14739)Koundinya Veluri1-72/+0
Move YieldProcessorNormalized into separate files Clean up YieldProcessorNormalized
2017-10-11Delete !FEATURE_IMPLICIT_TLS (#14398)Jan Kotas1-32/+0
Linux and Windows arm64 are using the regular C/C++ thread local statics. This change unifies the remaining Windows architectures to be on the same plan.
2017-09-19Move initialization of YieldProcessorNormalized to the finalizer thread (#14058)Koundinya Veluri1-65/+72
Move initialization of YieldProcessorNormalized to the finalizer thread Fixes https://github.com/dotnet/coreclr/issues/13984 - Also moved relevant functions out of the Thread class as requested in the issue - For some reason, after moving the functions out of the Thread class, YieldProcessorNormalized was not getting inlined anymore. It seems to be important to have it be inlined such that the memory loads are hoisted out of outer loops. To remove the dependency on the compiler to do it (even with forceinline it's not possible to hoist sometimes, for instance InterlockedCompareExchnage loops), changed the signatures to do what is intended.
2017-09-08Change lock used for initializing YieldProcessorNormalized from Crst to ↵Koundinya Veluri1-0/+1
CrstStatic (#13857) Fixes https://github.com/dotnet/coreclr/issues/13779
2017-09-01Add normalized equivalent of YieldProcessor, retune some spin loops (#13670)Koundinya Veluri1-0/+64
* Add normalized equivalent of YieldProcessor, retune some spin loops Part of fix for https://github.com/dotnet/coreclr/issues/13388 Normalized equivalent of YieldProcessor - The delay incurred by YieldProcessor is measured once lazily at run-time - Added YieldProcessorNormalized that yields for a specific duration (the duration is approximately equal to what was measured for one YieldProcessor on a Skylake processor, about 125 cycles). The measurement calculates how many YieldProcessor calls are necessary to get a delay close to the desired duration. - Changed Thread.SpinWait to use YieldProcessorNormalized Thread.SpinWait divide count by 7 experiment - At this point I experimented with changing Thread.SpinWait to divide the requested number of iterations by 7, to see how it fares on perf. On my Sandy Bridge processor, 7 * YieldProcessor == YieldProcessorNormalized. See numbers in PR below. - Not too many regressions, and the overall perf is somewhat as expected - not much change on Sandy Bridge processor, significant improvement on Skylake processor. - I'm discounting the SemaphoreSlim throughput score because it seems to be heavily dependent on Monitor. It would be more interesting to revisit SemaphoreSlim after retuning Monitor's spin heuristics. - ReaderWriterLockSlim seems to perform worse on Skylake, the current spin heuristics are not translating well Spin tuning - At this point, I abandoned the experiment above and tried to retune spins that use Thread.SpinWait - General observations - YieldProcessor stage - At this stage in many places we're currently doing very long spins on YieldProcessor per iteration of the spin loop. In the last YieldProcessor iteration, it amounts to about 70 K cycles on Sandy Bridge and 512 K cycles on Skylake. - Long spins on YieldProcessor don't let other work run efficiently. Especially when many scheduled threads all issue a long YieldProcessor, a significant portion of the processor can go unused for a long time. - Long spins on YieldProcessor is in some cases helping to reduce contention in high-contention cases, effectively taking away some threads into a long delay. Sleep(1) works much better but has a much higher delay so it's not always appropriate. In other cases, I found that it's better to do more iterations with a shorter YieldProcessor. It would be even better to reduce the contention in the app or to have a proper wait in the sync object, where appropriate. - Updated the YieldProcessor measurement above to calculate the number of YieldProcessorNormalized calls that amount to about 900 cycles (this was tuned based on perf), and modified SpinWait's YieldProcessor stage to cap the number of iterations passed to Thread.SpinWait. Effectively, the first few iterations have a longer delay than before on Sandy Bridge and a shorter delay than before on Skylake, and the later iterations have a much shorter delay than before on both. - Yield/Sleep(0) stage - Observed a couple of issues: - When there are no threads to switch to, Yield and Sleep(0) become no-op and it turns the spin loop into a busy-spin that may quickly reach the max spin count and cause the thread to enter a wait state, or may just busy-spin for longer than desired before a Sleep(1). Completing the spin loop too early can cause excessive context switcing if a wait follows, and entering the Sleep(1) stage too early can cause excessive delays. - If there are multiple threads doing Yield and Sleep(0) (typically from the same spin loop due to contention), they may switch between one another, delaying work that can make progress. - I found that it works well to interleave a Yield/Sleep(0) with YieldProcessor, it enforces a minimum delay for this stage. Modified SpinWait to do this until it reaches the Sleep(1) threshold. - Sleep(1) stage - I didn't see any benefit in the tests to interleave Sleep(1) calls with some Yield/Sleep(0) calls, perf seemed to be a bit worse actually. If the Sleep(1) stage is reached, there is probably a lot of contention and the Sleep(1) stage helps to remove some threads from the equation for a while. Adding some Yield/Sleep(0) in-between seems to add back some of that contention. - Modified SpinWait to use a Sleep(1) threshold, after which point it only does Sleep(1) on each spin iteration - For the Sleep(1) threshold, I couldn't find one constant that works well in all cases - For spin loops that are followed by a proper wait (such as a wait on an event that is signaled when the resource becomes available), they benefit from not doing Sleep(1) at all, and spinning in other stages for longer - For infinite spin loops, they usually seemed to benefit from a lower Sleep(1) threshold to reduce contention, but the threshold also depends on other factors like how much work is done in each spin iteration, how efficient waiting is, and whether waiting has any negative side-effects. - Added an internal overload of SpinWait.SpinOnce to take the Sleep(1) threshold as a parameter - SpinWait - Tweaked the spin strategy as mentioned above - ManualResetEventSlim - Changed to use SpinWait, retuned the default number of iterations (total delay is still significantly less than before). Retained the previous behavior of having Sleep(1) if a higher spin count is requested. - Task - It was using the same heuristics as ManualResetEventSlim, copied the changes here as well - SemaphoreSlim - Changed to use SpinWait, retuned similarly to ManualResetEventSlim but with double the number of iterations because the wait path is a lot more expensive - SpinLock - SpinLock was using very long YieldProcessor spins. Changed to use SpinWait, removed process count multiplier, simplified. - ReaderWriterLockSlim - This one is complicated as there are many issues. The current spin heuristics performed better even after normalizing Thread.SpinWait but without changing the SpinWait iterations (the delay is longer than before), so I left this one as is. - The perf (see numbers in PR below) seems to be much better than both the baseline and the Thread.SpinWait divide by 7 experiment - On Sandy Bridge, I didn't see many significant regressions. ReaderWriterLockSlim is a bit worse in some cases and a bit better in other similar cases, but at least the really low scores in the baseline got much better and not the other way around. - On Skylake, some significant regressions are in SemaphoreSlim throughput (which I'm discounting as I mentioned above in the experiment) and CountdownEvent add/signal throughput. The latter can probably be improved later.
2017-08-14Added SetThreadDescription to set the unmanaged thread name (#12593)Alois-xx1-1/+1
* Added SetThreadDescription to set the unmanaged thread name as well when a managed thread name was set. This will show up in future debuggers which know how to read that information or in ETW traces in the Thread Name column. * use printf instead of wprintf which exists on all platforms. * Removed printf Ensure that GetProceAddress is only called once to when the method is not present. Potential perf hit should be negligible since setting a thread name can only happen once per managed thread. * - Moved SetThreadName code to winfix.cpp as proposed - Finalizer and threadpool threads get their name - GCToEEInterface::CreateBackgroundThread is also named - but regular GC threads have no name because when I included utilcode.h things did break apart. * Fix for data race in g_pfnSetThreadDescription * Fix string literals on unix builds. * Fixed nits Settled thread name on ".NET Core ThreadPool"
2017-08-02PAL enable Thread affinity support (#12936)Steve MacLean1-2/+0
2017-08-01Fix build errors when TRACK_SYNC is defined (#13122)Jonghyun Park1-0/+2
* Fix build errors when TRACK_SYNC is defined * Remove unnecessary default constructor
2017-07-04Hide methods in IdDispenser (instead of using DacNotImpl) (#12624)Jonghyun Park1-14/+6