summaryrefslogtreecommitdiff
path: root/src/vm/jithelpers.cpp
AgeCommit message (Collapse)AuthorFilesLines
2019-01-20Cleanup array related FCalls (#22097)Jan Kotas1-12/+0
* Cleanup Array FCalls * Disable outdated CoreFX tests https://github.com/dotnet/corefx/pull/34700
2018-11-21Delete dead/unused code (#21138)Jan Kotas1-11/+0
2018-11-09Delete dead/unreachable code related to remoting (#20880)Jan Kotas1-28/+22
2018-11-05Updating the importer to throw a NotImplementedException if it finds a ↵Tanner Gooding1-0/+18
mustExpand intrinsic that it can't expand (#20792) * Updating the importer to throw a NotImplementedException if it finds a mustExpand hwintrinsic that it can't expand * Updating the JITEEVersionIdentifier
2018-10-04Enable thread statics for collectible classes (#19944)Jan Vorlicek1-24/+45
* Enable thread statics for collectible classes This change removes checks that were preventing usage of thread statics in collectible classes and also implements all the necessary changes. The handles that hold arrays with thread statics are allocated from LoaderAllocator for collectible classes instead of using the global strong handle like in the case of non-collectible classes. The change very much mimics what is done for regular statics. This change also adds ability to reuse freed handles to the LoaderAllocator handle table. Freed handle indexes are stored into a stack and when a new handle allocation is requested, the indices from this stack are used first. Due to the code path from which the FreeTLM that in turn frees the handles is called, I had to modify the critical section flags and also refactor the handle allocation so that the actual managed array representing the handle table is allocated out of the critical section. When I was touching the code, I have also moved the code that was dealing with handles that are not stored in the LoaderAllocator handle tables out of the critical section, since there is no point in having it inside of it.
2018-08-23Enable unloading of AssemblyLoadContext (#18476)Jan Vorlicek1-2/+4
Enable assembly unloading * Allow PInvoke methods on collectible assemblies * Fix test unloadability Several hundreds of tests were using Helper class that created GCHandle, but never freed it. That prevented unloading of those tests. The change modifies the Helper class to keep the handle in a finalizable object. Several GCHandle related tests were not freeing the GCHandle they allocated, so this change adds freeing them to enable the unloading. * Add missing error messages to the resources * Fix shuffle thunk cache for unloadability * Add GetLoaderAllocator to ICLRPrivBinder
2018-05-27Typo (#18141)John Doe1-1/+1
* Ajusted -> Adjusted * alot -> a lot * Ambigous -> Ambiguous * amoun -> amount * amoung -> among * Amperstand -> Ampersand * Anbody -> Anybody * anddoens't -> and doesn't * anme -> name * annoations -> annotations * annother -> another * anothr -> another * ansynchronous -> asynchronous * anticpation -> anticipation * anway -> anyway * aother -> another * Apparant -> Apparent * appartment -> apartment * appdmomain -> appdomain * Appdomian -> Appdomain * appdomin -> appdomain * approproiate -> appropriate * approprate -> appropriate * approp -> appropriate * appened -> appended * appropiately -> appropriately * appropraitely -> appropriately * Apperantly -> Apparently * approp. -> appropriate * Approriate -> Appropriate
2018-03-24Delete unused files from src/inc (#17186)Jan Kotas1-1/+0
2018-01-30CORINFO_HELP_THROW_TYPE_NOT_SUPPORTEDSteve MacLean1-0/+18
2018-01-23Merge pull request #15949 from mikedn/shift-inconsistencyBruce Forstall1-2/+2
Fix 64 bit shift inconsistencies (on 32 bit targets)
2018-01-21Remove AppDomainLeaks configuration option (#15956)antofik1-49/+0
Removed all usages of AppDomainLeaks configuration option and CHECK_APP_DOMAIN_LEAKS feature Fix #12094
2018-01-20Fix 64 bit shift inconsistencies (on 32 bit targets)Mike Danes1-2/+2
Recent shift changes made the JIT_LLsh helper mask the shift count to 6 bits. The other 2 helpers (JIT_LRsh and JIT_LRsz) so now we get inconsistencies such as `(x >> 64) != (x << 64)`. The ECMA spec says that "the return value is unspecified if shiftAmount is greater than or equal to the width of value" so the JIT has no obligation to implement a particular behavior. But it seems preferable to have all shift instructions behave similarly, it avoids complications and reduces risks. This also changes `ValueNumStore::EvalOpIntegral` to mask the shift count for 64 bit shifts so it matches `gtFoldExprConst`. Otherwise the produced value depends on the C/C++ compiler's behavior.
2018-01-13Revert "Remove relocations for MethodTable::m_pParentMethodTable for Linux ARM"Jan Kotas1-4/+4
This reverts commit cf1fb9e17fc8b6ee849edab5a696d0ec5c6eadd2.
2018-01-05JIT: fix decompose long left shift for overshift cases (#15704)Andy Ayers1-1/+1
Need to reduce the shift amount modulo 64 to match the helper and `gtFoldExpr` behavior. Since reduced amount must be less than 64 we can remove handling for that case. Also updating the arm LLSH helper. Re-enable the test case disabled by #15567 and also enable for arm/arm64. Closes #15566.
2017-11-27Improve Monitor scaling (#14216)Koundinya Veluri1-115/+5
Improve Monitor scaling and reduce spinning Part 1: Improve Monitor scaling Fixes https://github.com/dotnet/coreclr/issues/13978 - Refactored AwareLock::m_MonitorHeld into a class LockState with operations to mutate the state - Allowed the lock to be taken by a non-waiter when there is a waiter to prevent creating lock convoys - Added a bit to LockState to indicate that a waiter is signaled to wake, to avoid waking more than one waiter at a time. A waiter that wakes by observing the signal unsets this bit. See AwareLock::EnterEpilogHelper(). - Added a spinner count to LockState. Spinners now register and unregister themselves and lock releasers don't wake a waiter when there is a registered spinner (the spinner guarantees to take the lock if it's available when unregistering itself) - This was necessary mostly on Windows to reduce CPU usage to the expected level in contended cases with several threads. I believe it's the priority boost Windows gives to signaled threads, which seems to cause waiters to much more frequently succeed in acquiring the lock. This causes a CPU usage problem because once the woken waiter releases the lock, on the next lock attempt it will become a spinner. This keeps repeating, converting several waiters into spinners unnecessarily. Before registering spinners, I saw typically 4-6 spinners under contention (with delays inside and outside the lock) when I expected to have only 1-2 spinners at most. - It costs an interlocked operation before and after the spin loop, doesn't seem to be too significant since spinning is a relatively slow path anyway, and the reduction in CPU usage in turn reduces contention on the lock and lets more useful work get done - Updated waiters to spin a bit before going back to waiting, reasons are explained in AwareLock::EnterEpilogHelper() - Removed AwareLock::Contention() and any references (this removes the 10 repeats of the entire spin loop in that function). With the lock convoy issue gone, this appears to no longer be necessary. Perf - On Windows, throughput has increased significantly starting at slightly lower than proc count threads. On Linux, latency and throughput have increased more significantly at similar proc counts. - Most of the larger regressions are in the unlocked fast paths. The code there hasn't changed and is almost identical (minor layout differences), I'm just considering this noise until we figure out how to get consistently faster code generated. - The smaller regressions are within noise range Part 2: Reduce Monitor spinning Fixes https://github.com/dotnet/coreclr/issues/13980 - Added new config value Monitor_SpinCount and Monitor spins for that many iterations, default is 30 (0x1e). This seems to give a somewhat decent balance between latency, fairness, and throughput. Lower spin counts improve latency and fairness significantly and regress throughput slightly, and higher spin counts improve throughput slightly and regress latency and fairness significantly. - The other constants can still be used to disable spinning but otherwise they are no longer used by Monitor - Decreased the number of bits used for tracking spinner count to 3. This seems to be more than enough since only one thread can take a lock at a time, and prevents spikes of unnecessary CPU usage. Tried some things that didn't pan out: - Sleep(0) doesn't seem to add anything to the spin loop, so left it out. Instead of Sleep(0) it can just proceed to waiting. Waiting is more expensive than Sleep(0), but I didn't see that benefit in the tests. Omitting Sleep(0) also keeps the spin loop very short (a few microseconds max). - Increasing the average YieldProcessor() duration per spin iteration improved thorughput slightly but regressed latency and fairness very quickly. Given that fairness is generally worse with part 1 of this change above, it felt like a better compromise to take a small reduction in throughput for larger improvements in latency and fairness. - Tried adding a very small % of lock releases by random wake a waiter despite there being spinners to improve fairness. This improved fairness noticeably but not as much as decreasing the spin count slightly, and it was making latency and throughput worse more quickly. After reducing the % to a point where I was hardly seeing fairness improvements, there were still noticeable latency and throughput regressions. Miscellaneous - Moved YieldProcessorNormalized code into separate files so that they can be included earlier and where needed - Added a max for "optimal max normalized yields per spin iteration" since it has a potential to be very large on machines where YieldProcessor may be implemented as no-op, in which case it's probably not worth spinning for the full duration - Refactored duplicate code in portable versions of MonEnterWorker, MonEnter, and MonReliableEnter. MonTryEnter has a slightly different structure, did not refactor that. Perf - Throughput is a bit lower than before at lower thread counts and better at medium-high thread counts. It's a bit lower at lower thread counts because of two reasons: - Shorter spin loop means the lock will be polled more frequently because the exponential backoff does not get as high, making it more likely for a spinner to steal the lock from another thread, causing the other thread to sometimes wait early - The duration of YieldProcessor() calls per spin iteration has decreased and a spinner or spinning waiter are more likely to take the lock, the rest is similar to above - For the same reasons as above, latency is better than before. Fairness is better on Windows and worse on Linux compared to baseline due to the baseline having differences between these platforms. Latency also has differences between Windows/Linux in the baseline, I suspect those are due to differences in scheduling. - Performance now scales appropriately on processors with different pause delays Part 3: Add mitigation for waiter starvation Normally, threads are allowed to preempt waiters to acquire the lock. There are cases where waiters can be easily starved as a result. For example, a thread that holds a lock for a significant amount of time (much longer than the time it takes to do a context switch), then releases and reacquires the lock in quick succession, and repeats. Though a waiter would be woken upon lock release, usually it will not have enough time to context-switch-in and take the lock, and can be starved for an unreasonably long duration. In order to prevent such starvation and force a bit of fair forward progress, it is sometimes necessary to change the normal policy and disallow threads from preempting waiters. A new bit was added to LockState and ShouldNotPreemptWaiters() indicates the current state of the policy. - When the first waiter begins waiting, it records the current time as a "waiter starvation start time". That is a point in time after which no forward progress has occurred for waiters. When a waiter acquires the lock, the time is updated to the current time. - Before a spinner begins spinning, and when a waiter is signaled to wake, it checks whether the starvation duration has crossed a threshold (currently 100 ms) and if so, sets ShouldNotPreemptWaiters() When unreasonable starvation is occurring, the lock will be released occasionally and if caused by spinners, spinners will be starting to spin. - Before starting to spin, if ShouldNotPreemptWaiters() is set, the spinner will skip spinning and wait instead. Spinners that are already registered at the time ShouldNotPreemptWaiters() is set will stop spinning as necessary. Eventually, all spinners will drain and no new ones will be registered. - After spinners have drained, only a waiter will be able to acquire the lock. When a waiter acquires the lock, or when the last waiter unregisters itself, ShouldNotPreemptWaiters() is cleared to restore the normal policy.
2017-10-25Enable Crc32 , Popcnt, Lzcnt intrinsicsFei Peng1-0/+18
2017-09-26Remove Monitor asm helpers (#14146)Koundinya Veluri1-303/+16
- Removed asm helpers on Windows and used portable C++ helpers instead - Rearranged fast path code to improve them a bit and match the asm more closely Perf: - The asm helpers are a bit faster. The code generated for the portable helpers is almost the same now, the remaining differences are: - There were some layout issues where hot paths were in the wrong place and return paths were not cloned. Instrumenting some of the tests below with PGO on x64 resolved all of the layout issues. I couldn't get PGO instrumentation to work on x86 but I imagine it would be the same there. - Register usage - x64: All of the Enter functions are using one or two (TryEnter is using two) callee-saved registers for no apparent reason, forcing them to be saved and restored. r10 and r11 seem to be available but they're not being used. - x86: Similarly to x64, the compiled functions are pushing and popping 2-3 additional registers in the hottest fast paths. - I believe this is the main remaining gap and PGO is not helping with this - On Linux, perf is >= before for the most part - Perf tests used for below are updated in PR https://github.com/dotnet/coreclr/pull/13670
2017-08-08Merge pull request #12168 from gbalykov/remove-relocations-readonlyBruce Forstall1-4/+4
Partially remove relocations from SECTION_Readonly
2017-08-07Cleanup code access security from the unmanaged runtime (#13241)Jan Kotas1-2/+0
2017-07-10Remove relocations for MethodTable::m_pParentMethodTable for Linux ARMGleb Balykov1-4/+4
2017-06-29Fix GC contract violation (#12542)Jan Kotas1-0/+1
2017-06-27Fix JIT_NewArr1 8-byte alignment for ELEMENT_TYPE_R8 on x86.Ruben Ayrapetyan1-1/+1
2017-06-27Implement JIT_NewArr1_R2R as R2R wrapper for JIT_NewArr1 to support both ↵Ruben Ayrapetyan1-0/+15
MethodTable-based and TypeDesc-based helpers. (#12475) Related issue: #12463
2017-06-26Replace array type handle with method table in arguments of array allocation ↵Ruben Ayrapetyan1-26/+26
helpers (#12369) * Remove direct usage of type handle in JIT_NewArr1, with except of retrieving template method table. * Assert that array type descriptor is loaded when array object's method table is set. * Pass template method tables instead of array type descriptors to array allocation helpers.
2017-05-17Finish deleting dead CAS code from CoreLib (#11436)Jan Kotas1-284/+2
Fixes #9321 and deletes CleanupToDoList.cs Delete unmanaged security implementation
2017-03-16[Local GC] Break EE dependency on GC's generation table and alloc lock in ↵Sean Gillespie1-4/+4
single-proc scenarios (#10065) * Remove usage of the generation table from the EE by introducing an EE-owned GC alloc context used for allocations on single-proc machines. * Move the GC alloc lock to the EE side of the interface * Repair the Windows ARM build * Move the decision to use per-thread alloc contexts to the EE * Rename the lock used by StartNoGCRegion and EndNoGCRegion to be more indicative of what it is protecting * Address code review feedback 2 (enumerate the global alloc context as a part of GCToEEInterface) * Code review feedback (3) * Address code review feedback (move some GC-internal globals to gcimpl.h and gc.cpp) * g_global_alloc_lock is a dword, not a qword - fixes a deadlock * Move GlobalAllocLock to gchelpers.cpp and switch to preemptive mode when spinning * Repair the Windows x86 build
2017-02-28Use BIT64 instead of _WIN64 for LONG helpers (#9845)Jonghyun Park1-3/+2
2017-02-14Remove never defined FEATURE_REMOTINGdanmosemsft1-114/+0
2017-02-14Remove never defined FEATURE_MIXEDMODEdanmosemsft1-35/+0
2017-02-12Remove never defined FEATURE_COMPRESSEDSTACKdanmosemsft1-3/+0
2017-02-12Remove never defined FEATURE_CER and headerdanmosemsft1-1/+0
2017-02-12Remove always defined FEATURE_EXCEPTIONDISPATCHINFOdanmosemsft1-2/+0
2017-02-11Revert "Remove always defined FEATURE_CORESYSTEM"danmosemsft1-1/+1
This reverts commit 751771a8976f909af772e35c167bd7e3ffbe44c8.
2017-02-10Revert "Remove always defined FEATURE_CORRUPTING_EXCEPTIONS"danmosemsft1-0/+2
This reverts commit b0dab0d6de90a38dfbf0d6b2039a7b8f5269d802.
2017-02-10Remove always defined FEATURE_CORRUPTING_EXCEPTIONSdanmosemsft1-2/+0
2017-02-10Remove always defined FEATURE_CORECLRdanmosemsft1-11/+0
2017-02-10Remove always defined FEATURE_CORESYSTEMdanmosemsft1-1/+1
2016-12-05[x86/Linux] Use Portable LMul JIT Helper (#8449)Jonghyun Park1-2/+2
2016-12-04Use Portable Floating-point Arithmetic Helpers (#8447)Jonghyun Park1-2/+2
This commit enables portable floating-point arithmetic helpers for x86/Linux build.
2016-09-12Merge pull request #7133 from mikedn/x86-cast-long-floatPat Gavlin1-1/+1
Implement long to float cast for x86
2016-09-12Implement long to float cast for x86Mike Danes1-1/+1
Convert long to float/double casts to helper calls. Despite the call this version is 2x faster than JIT32's FILD implementation which suffers a significant store forwarding stall penalty.
2016-09-08Merge pull request #6764 from swgillespie/gc-interface-3Sean Gillespie1-10/+10
Move the GC behind an interface and use that interface in the VM
2016-09-08Introduce an interface separating the GC and the VM,Sean Gillespie1-10/+10
modifying the VM to utilize this interface. Introduce an interface separating the GC and the rest of the VM Remove static members of both IGCHeap and IGCHeapInternal and move the management of the singular GC heap to the VM. Rename uses of IGCHeap in the VM to GCHeapHolder, as well as other misc. renames throughout the VM and GC. Split each interface function into categories, document them, use consistent formatting across the interface Undo some accidental find/replace collateral damage Remove all ifdefs from the GC interface Deduplicate function declarations between IGCHeap and IGCHeapInternal, expose AllocAlign8 through the interface and the reference to alloc_context to repair the ARM build Paper cut: false -> nullptr Repair the ARM and x86 builds Rename GCHeapHolder -> GCHeapUtilities and address documentation feedback Rebase against master Rename gcholder.h/cpp -> gcheaputilities.h/cpp Fix an uninitialized field on alloc_context causing test failures on clang Rename the include guard for gcheaputilities.h Un-breaks SOS by making the following changes: 1) Instructs the DAC to look for IGCHeap::gcHeapType by name, instead of assuming that it exists near g_pGCHeap, 2) Eliminate all virtual calls on IGCHeap in the DAC, since we cannot dispatch on an object in another process, 3) Because of 2, expose the number of generations past the GC interface using a static variable on IGCHeap that the DAC can read directly. repair the Windows build
2016-08-26modify gtFoldExprConst for casting to integer from floating point overflowHyeongseok Oh1-23/+0
2016-08-23fix for infinite and NaN doubleHyeongseok Oh1-6/+21
2016-08-19fix for correct conversion from negative double to unsigned long in ARM ↵Hyeongseok Oh1-9/+8
(modify helper)
2016-08-16fix for correct conversion from negative double to unsigned long in ARMHyeongseok Oh1-0/+9
2016-06-21Generic dictionary minor performance improvement and simplification for R2R ↵Fadi Hanna1-26/+79
(#5690) A set of refactoring changes to slighly improve the performance of generic dictionary access on R2R images, and simplifying the codebase: 1) Removing dependency on CEEInfo::ComputeRuntimeLookupForSharedGenericTokenStatic (and deleting the API). 2) Stop parsing the generic type/method signatures when generating the R2R dictionary lookup assembly stub. 3) Avoid re-encoding the generic type/method signatures in a new in-memory blob using a SigBuilder (signatures used directly from the R2R image) 4) Moved the parsing/loading of type/method signatures to Dictionary::PopulateEntry() 5) Dictionary index and slot number are now encoded in the generated assembly stub instead of the signature (stub takes a pointer to a data blob, which contains the signature, the dictionary index and slot, and the module pointer)
2016-05-19Fix x86 only ICastable feature bug.Yi Zhang1-5/+4
When we call ICastable.IsInstanceOfInterface, we treat RuntimeTypeHandle as a OBJECTREF, which is incorrect as-per x86 calling convention since RuntimeTypehandle is a struct that contains a RuntimeType ref field and needs to be passed in stack. Our VM simple call helpers CALL_MANAGED_METHOD doesn't handle this correctly (it does the simple thing that always assume all the arguments are passed in register first). I'm fixing this by using a static method that takes RuntimeType instead of RuntimeTypeHandle, then convert it to RuntimeTypehandle. Also switch to use PREPARE_NONVIRTUAL_CALLSITE(METHODID) as per Jan's suggestion. ICastable test is now enabled in x86 for both JITs.
2016-05-09JIT-EE interface changes to support CoreRTJan Kotas1-0/+19
- Add flags and constants for reverse PInvoke transitions (https://github.com/dotnet/corert/issues/611) - Add new multi-dim array constructor that does not use varargs [tfs-changeset: 1603336]