Age | Commit message (Collapse) | Author | Files | Lines |
|
GC heap globals like ephemeral_heap_segment and finalize_queue are
null/invalid for a server GC. Add a check to skip the workstation GC
memory enumeration if server. The server memory enumeration already
skips if workstation GC.
|
|
Issue #21485: fix EnumProcessModules hPseudoCurrentProcess bug.
Added handle reference.
Issue #21484: createdump segfaults with ASP.NET app
The problem is the ClrDataModule Request faulted on a dynamic module
getting the file layout flag.
Fixed the Request code not get the file layout and in the crash dump
code skip any dynamic modules.
|
|
* Delete vm/context.*
Leftover from remoting
|
|
* Remove IsNeutralDomain()
* PR feedback
|
|
|
|
|
|
* Fix LoaderAllocator::AllocateHandle
When another thread wins the race in growing the handle table, the code
was not refreshing the slotsUsed local to the new up to date value. This
was leading to overwriting / reusing a live handle.
This change fixes it.
* Embed ThreadLocalBlock in Thread
Instead of allocating ThreadLocalBlock dynamically, embed it in the
Thread. That solves race issue between thread destruction and
LoaderAllocator destruction. The ThreadLocalBlock could have been
deleted during Thread shutdown while the LoaderAllocator's destruction
would be working with it.
|
|
|
|
|
|
|
|
* Remove old reference to Rotor in documentation.
All remaining references relate to rotor's role in CoreCLR history.
* Remove rotor comment from enummem.cpp.
I can find no evidence that the presence of g_pStressLog is conditional
on FEATURE_PAL being defined.
* Remove old todo, DbgDllMain looks for thread detach.
* Update nativepipeline.h comment refernce to rotor.
All unix-like systems except android have FEATURE_DBGIPC_TRANSPORT_DI
defined, hence "most unix-like platforms".
* Update some comments to not refer to Rotor.
* Remove some more references to Rotor from comments.
* Remove old comment.
Though maybe this macro should be removed and everywhere use the & operator.
It appears there are only two places that use this macro.
|
|
* Remove IsRemotingIntercepted methods that always return false.
* Remove GetOptionalMembersAllocationSize parameters that are always false.
* Remove references to context static.
Remove references in comments and methodnames.
* Remove RemotingVtsInfo.
|
|
* Remove context statics stuff part 1
This change removes all context statics stuff from the runtime since
context statics are not supported and this code was obsolete.
* Remove context statics stuff from the debugger code
|
|
* Remove AppDomain unload
This change removes all code in AppDomain that's related to AppDomain
unloading which is obsolete in CoreCLR. It also removes all calls to the
removed methods.
In few places, I have made the change simpler by taking into account the
fact that there is always just one AppDomain.
|
|
EnumSvrGlobalMemoryRegions (#20233)
|
|
Return E_FAIL instead of S_FALSE from ClrDataFrame::GetLocalSig().
Also issue https://github.com/dotnet/diagnostics/issues/61
|
|
(#19864)
|
|
* Add support for collectible types to SOS
Collectible types indirectly reference managed LoaderAllocator via
pointer to native AssemblyLoaderAllocator stored in their MethodTable.
GC uses this relation when scanning object graph to determine which
objects are rooted and which ones are not.
The gcroot command in SOS doesn't understand this relation and so it
is unable to find all roots for LoaderAllocator.
This change fixes it.
* PR feedback
Make the failure to get the collectible info non-fatal to make it
compatible with older runtimes.
|
|
There was a couple of places where the DAC (IsValidObject, GetAppDomainForObject)
assumed that a NULL target/debuggee address would throw an exception that would
be caught by try/catch. Any other invalid address is handled with a software
exception throwed by the read memory functions. In general it is a better overall
design not to have any of the DBI/DAC, etc. code depend on hardware exceptions
being caught. On Linux the C++ runtime sometimes can't handle it. There is a
slight risk that there are other places in the DAC that make the NULL address
assumption but testing so far has found any.
Added PAL_SetInitializeDLLFlags as a fallback to allow the PAL_InitializeDLL flags
to be set for a PAL instance for the DAC where we could still register h/w signals
but not the altstack switching to reduce this risk. The flags can't be build time
conditional because we only build one coreclrpal.a library that all the modules
used. Having a PAL_InitializeFlags function doesn't really help either because of
the PAL_RegisterModule call to PAL_IntializeDLL and the LoadLibrary dance/protocol
that uses it to call the loading module's DLLMain.
Add PAL_SetInitializeFlags; remove flags from PAL_INITIALIZE and PAL_INITIALIZE_DLL
default. Add PAL_InitializeFlags() to allowing the default to be overriden.
|
|
|
|
Fixes build breaks with latest Visual Studio Preview.
Fixes #18338
|
|
This bug fix is a port from the equivalent fix in framework. The
debugger tried performing a stackwalk in the epilog due to the JIT
incorrectly reporting epilogue information. This caused an invalid
GS cookie to be checked and caused the debugger to crash. A flag was
added to allow debug stackwalks to skip the cookie check.
|
|
* Separate sections READONLY_VCHUNKS and READONLY_DICTIONARY
* Remove relocations for second-level indirection of Vtable in case FEATURE_NGEN_RELOCS_OPTIMIZATIONS is enabled.
Introduce FEATURE_NGEN_RELOCS_OPTIMIZATIONS, under which NGEN specific relocations optimizations are enabled
* Replace push/pop of R11 in stubs with
- str/ldr of R4 in space reserved in epilog for non-tail calls
- usage of R4 with hybrid-tail calls (same as for EmitShuffleThunk)
* Replace push/pop of R11 for function epilog with usage of LR as helper register right before its restore from stack
|
|
When the debugger is querying the active rejit IL for an IL method that has not been rejitted it incorrectly creates a VMPTR_ILCodeVersionNode for a code version that shouldn't have one.
|
|
For compat with profilers that used our APIs in unexpected ways we can allow
the ILCodeVersion to fallback to the default IL code when no IL was explicitly
given.
|
|
* addres -> address
* depedant -> dependent
* gaurantee -> guarantee
* gaurantees -> guarantees
* lable -> label
* lazieness -> laziness
* lcoation -> location
* enquing -> enqueuing
* enregsitered -> enregistered
* ensurin -> ensuring
|
|
* Cleanup all disabled warnings that do not trigger
* Fix warning about line continuation in single line comment
* Eliminiate all unreferenced local variables and reenable warning
|
|
* Ajusted -> Adjusted
* alot -> a lot
* Ambigous -> Ambiguous
* amoun -> amount
* amoung -> among
* Amperstand -> Ampersand
* Anbody -> Anybody
* anddoens't -> and doesn't
* anme -> name
* annoations -> annotations
* annother -> another
* anothr -> another
* ansynchronous -> asynchronous
* anticpation -> anticipation
* anway -> anyway
* aother -> another
* Apparant -> Apparent
* appartment -> apartment
* appdmomain -> appdomain
* Appdomian -> Appdomain
* appdomin -> appdomain
* approproiate -> appropriate
* approprate -> appropriate
* approp -> appropriate
* appened -> appended
* appropiately -> appropriately
* appropraitely -> appropriately
* Apperantly -> Apparently
* approp. -> appropriate
* Approriate -> Appropriate
|
|
* acquringing -> acquiring
* Activ -> Active
* activley -> actively
* acutal -> actual
* bIncomingIPAdddefed -> bIncomingIPAddRefed
* adddr -> addr
* readding -> reading
* Addfunction -> AddFunction
* additionnal -> additional
* Additonal -> Additional
* Additonally -> Additionally
* Addresss -> Address
* addtion -> addition
* aded -> added
* aditional -> additional
* adjustements -> adjustments
* Adress -> Address
* afer -> after
* aformentioned -> aforementioned
* afte -> after
* agains -> against
* agaisnt -> against
* aggresively -> aggressively
* aggreates -> aggregates
* aggregious -> egregious
* aginst -> against
* agregates -> aggregates
* Agressive -> Aggressive
* ahve -> have
* ajdust -> adjust
* ajust -> adjust
* alement -> element
* algoritm -> algorithm
* alighnment -> alignment
* alignmant -> alignment
* constraits -> constraints
* Allcator -> Allocator
* alllocate -> allocate
* alloacted -> allocated
* allocatate -> allocate
* allocatoror -> allocator
* alloctaed -> allocated
* alloction -> allocation
* alloted -> allotted
* allt he -> all the
* alltogether -> altogether
* alocate -> allocate
* alocated -> allocated
* Alocates -> Allocates
* alogrithm -> algorithm
* aloocate -> allocate
* alot -> a lot
* alwasy -> always
* alwyas -> always
* alwys -> always
|
|
BaseSize for System.String was not set correctly. It caused unnecessary extra 8 bytes to be allocated at the end of strings that had `Length % 4 < 2` on 64-bit platforms.
This change makes affected strings proportionally cheaper. For example, `new string('a', 1)` in a long-running loop is 7% faster.
|
|
Eliminate `FEATURE_UNIX_AMD64_STRUCT_PASSING` and replace it with `UNIX_AMD64_ABI` when used alone. Both are currently defined; it is highly unlikely the latter will work alone; and it significantly clutters up the code, especially the JIT.
Also, fix the altjit support (now `UNIX_AMD64_ABI_ITF`) to *not* call `ClassifyEightBytes` if the struct is too large. Otherwise it asserts.
|
|
and profiling. (#16141)
This reverts commit e9985126acb0f1efd7c780faac4e66bc798b73c0.
|
|
(#16790)" (#16917)
This reverts commit 47bef69b68a35eafa069d08187727684a5f47901.
|
|
This reverts commit 383736b96b643ba46ad290fc86601fc2d62a9436.
|
|
* Return DPTR from PEDecoder::FindFirstSection()
Change type of the function's return value
to PTR_IMAGE_SECTION_HEADER instead of (IMAGE_SECTION_HEADER *)
* Fix handling of incorrect assemblies on Unix
This fixes the regression that was introduced by #10772 and is
caused by a missing check for validity of loaded assembly file.
Related issue: #15544
|
|
|
|
|
|
debugging and profiling. (#15878)"
This reverts commit 5bcfde404803f85451cf0ee9fd6406734cb878ff.
|
|
and profiling. (#15878)
To disable the named pipes and semaphores created on linux execute "export COMPlus_EnableDiagnostics=0" before start the .NET Core program.
On Windows execute "set COMPlus_EnableDiagnostics=0" and on Linux execute "export "COMPlus_EnableDiagnostics=0"
Removed the "Telesto" registry entry (old unnecessary Silverlight code) and Watson (always true) checks.
For issues #11769 and #8844.
|
|
This reverts commit cf1fb9e17fc8b6ee849edab5a696d0ec5c6eadd2.
|
|
|
|
- Reserve space for jump stubs for precodes and other code fragments at the end of each code heap segment. This is trying
to ensure that eventual allocation of jump stubs for precodes and other code fragments succeeds. Accounting is done
conservatively - reserves more than strictly required. It wastes a bit of address space, but no actual memory. Also,
this reserve is not used to allocate jump stubs for JITed code since the JITing can recover from failure to allocate
the jump stub now. Fixes #14996.
- Improve algorithm to reuse HostCodeHeap segments: Maintain estimated size of the largest free block in HostCodeHeap.
This estimate is updated when allocation request fails, and also when memory is returned to the HostCodeHeap. Fixes #14995.
- Retry JITing on failure to allocate jump stub. Failure to allocate jump during JITing is not fatal anymore. There is
extra memory reserved for jump stubs on retry to ensure that the retry succeeds allocating the jump stubs that it needs
with high probability.
- Respect CodeHeapRequestInfo::getRequestSize for HostCodeHeap. CodeHeapRequestInfo::getRequestSize is used to
throttle code heap segment size for large workloads. Not respecting it in HostCodeHeap lead to too many
too small code heap segments in large workloads.
- Switch HostCodeHeap nibble map to be allocated on regular heap as part. It simplied the math required to estimate
the nibble map size, and allocating on regular heap is overall goodness since it does not need to be executable.
|
|
Improve Monitor scaling and reduce spinning
Part 1: Improve Monitor scaling
Fixes https://github.com/dotnet/coreclr/issues/13978
- Refactored AwareLock::m_MonitorHeld into a class LockState with operations to mutate the state
- Allowed the lock to be taken by a non-waiter when there is a waiter to prevent creating lock convoys
- Added a bit to LockState to indicate that a waiter is signaled to wake, to avoid waking more than one waiter at a time. A waiter that wakes by observing the signal unsets this bit. See AwareLock::EnterEpilogHelper().
- Added a spinner count to LockState. Spinners now register and unregister themselves and lock releasers don't wake a waiter when there is a registered spinner (the spinner guarantees to take the lock if it's available when unregistering itself)
- This was necessary mostly on Windows to reduce CPU usage to the expected level in contended cases with several threads. I believe it's the priority boost Windows gives to signaled threads, which seems to cause waiters to much more frequently succeed in acquiring the lock. This causes a CPU usage problem because once the woken waiter releases the lock, on the next lock attempt it will become a spinner. This keeps repeating, converting several waiters into spinners unnecessarily. Before registering spinners, I saw typically 4-6 spinners under contention (with delays inside and outside the lock) when I expected to have only 1-2 spinners at most.
- It costs an interlocked operation before and after the spin loop, doesn't seem to be too significant since spinning is a relatively slow path anyway, and the reduction in CPU usage in turn reduces contention on the lock and lets more useful work get done
- Updated waiters to spin a bit before going back to waiting, reasons are explained in AwareLock::EnterEpilogHelper()
- Removed AwareLock::Contention() and any references (this removes the 10 repeats of the entire spin loop in that function). With the lock convoy issue gone, this appears to no longer be necessary.
Perf
- On Windows, throughput has increased significantly starting at slightly lower than proc count threads. On Linux, latency and throughput have increased more significantly at similar proc counts.
- Most of the larger regressions are in the unlocked fast paths. The code there hasn't changed and is almost identical (minor layout differences), I'm just considering this noise until we figure out how to get consistently faster code generated.
- The smaller regressions are within noise range
Part 2: Reduce Monitor spinning
Fixes https://github.com/dotnet/coreclr/issues/13980
- Added new config value Monitor_SpinCount and Monitor spins for that many iterations, default is 30 (0x1e). This seems to give a somewhat decent balance between latency, fairness, and throughput. Lower spin counts improve latency and fairness significantly and regress throughput slightly, and higher spin counts improve throughput slightly and regress latency and fairness significantly.
- The other constants can still be used to disable spinning but otherwise they are no longer used by Monitor
- Decreased the number of bits used for tracking spinner count to 3. This seems to be more than enough since only one thread can take a lock at a time, and prevents spikes of unnecessary CPU usage.
Tried some things that didn't pan out:
- Sleep(0) doesn't seem to add anything to the spin loop, so left it out. Instead of Sleep(0) it can just proceed to waiting. Waiting is more expensive than Sleep(0), but I didn't see that benefit in the tests. Omitting Sleep(0) also keeps the spin loop very short (a few microseconds max).
- Increasing the average YieldProcessor() duration per spin iteration improved thorughput slightly but regressed latency and fairness very quickly. Given that fairness is generally worse with part 1 of this change above, it felt like a better compromise to take a small reduction in throughput for larger improvements in latency and fairness.
- Tried adding a very small % of lock releases by random wake a waiter despite there being spinners to improve fairness. This improved fairness noticeably but not as much as decreasing the spin count slightly, and it was making latency and throughput worse more quickly. After reducing the % to a point where I was hardly seeing fairness improvements, there were still noticeable latency and throughput regressions.
Miscellaneous
- Moved YieldProcessorNormalized code into separate files so that they can be included earlier and where needed
- Added a max for "optimal max normalized yields per spin iteration" since it has a potential to be very large on machines where YieldProcessor may be implemented as no-op, in which case it's probably not worth spinning for the full duration
- Refactored duplicate code in portable versions of MonEnterWorker, MonEnter, and MonReliableEnter. MonTryEnter has a slightly different structure, did not refactor that.
Perf
- Throughput is a bit lower than before at lower thread counts and better at medium-high thread counts. It's a bit lower at lower thread counts because of two reasons:
- Shorter spin loop means the lock will be polled more frequently because the exponential backoff does not get as high, making it more likely for a spinner to steal the lock from another thread, causing the other thread to sometimes wait early
- The duration of YieldProcessor() calls per spin iteration has decreased and a spinner or spinning waiter are more likely to take the lock, the rest is similar to above
- For the same reasons as above, latency is better than before. Fairness is better on Windows and worse on Linux compared to baseline due to the baseline having differences between these platforms. Latency also has differences between Windows/Linux in the baseline, I suspect those are due to differences in scheduling.
- Performance now scales appropriately on processors with different pause delays
Part 3: Add mitigation for waiter starvation
Normally, threads are allowed to preempt waiters to acquire the lock. There are cases where waiters can be easily starved as a result. For example, a thread that holds a lock for a significant amount of time (much longer than the time it takes to do a context switch), then releases and reacquires the lock in quick succession, and repeats. Though a waiter would be woken upon lock release, usually it will not have enough time to context-switch-in and take the lock, and can be starved for an unreasonably long duration.
In order to prevent such starvation and force a bit of fair forward progress, it is sometimes necessary to change the normal policy and disallow threads from preempting waiters. A new bit was added to LockState and ShouldNotPreemptWaiters() indicates the current state of the policy.
- When the first waiter begins waiting, it records the current time as a "waiter starvation start time". That is a point in time after which no forward progress has occurred for waiters. When a waiter acquires the lock, the time is updated to the current time.
- Before a spinner begins spinning, and when a waiter is signaled to wake, it checks whether the starvation duration has crossed a threshold (currently 100 ms) and if so, sets ShouldNotPreemptWaiters()
When unreasonable starvation is occurring, the lock will be released occasionally and if caused by spinners, spinners will be starting to spin.
- Before starting to spin, if ShouldNotPreemptWaiters() is set, the spinner will skip spinning and wait instead. Spinners that are already registered at the time ShouldNotPreemptWaiters() is set will stop spinning as necessary. Eventually, all spinners will drain and no new ones will be registered.
- After spinners have drained, only a waiter will be able to acquire the lock. When a waiter acquires the lock, or when the last waiter unregisters itself, ShouldNotPreemptWaiters() is cleared to restore the normal policy.
|
|
Not supported on CoreCLR appdomain's method and properties were removed
#15001
|
|
Adds the appropriate handling of the default ILCodeVersion in DacDbiInterfaceImpl::GetILCodeVersionNode
|
|
Linux and Windows arm64 are using the regular C/C++ thread local statics. This change unifies the remaining Windows architectures to be on the same plan.
|
|
* Change jit notifications so that they pass the native code address. This fixes !bpmd so that it will set the correct breakpoint on tiered jitted methods.
* code review feedback
* don't handle OnCodeGenerated
|
|
(#13805)
* Make dumpmd work with tiered jitting. Now displays previous code addresses
* add tier info and nativecodeversionnode ptr to dumpmd output
* fix warnings on non-rejit platforms
|
|
The createdump utility now enumerates all the native stack frames (with
some help from the managed stack walker) for all the threads adding all
the ELF unwind info needed.
On a different machine and without any of the native modules loaded when
the crashdump was generated all the thread stacks can still be unwound
with lldb/gdb.
Change the PAL_VirtualUnwindOutOfProc read memory adapter in DAC
to add the memory to instances manager.
Some misc. cleanup.
|
|
Implement out of context stack unwinder
Decode the eh frame info found in the in-memory module image
and pass it back to the remote libunwind8 to do the unwind.
Added remote-unwind.cpp for all the out of context unwind code.
Added an all managed threads option -all the "clrstack" (sos ClrStack).
The IDebugDataTarget4 feature needs to be enabled for OS X.
Add libunwind license notice to third party notices file.
|