summaryrefslogtreecommitdiff
path: root/src/gc
AgeCommit message (Collapse)AuthorFilesLines
2020-04-16Fix PIE options (#26323)submit/tizen/20200415.223728accepted/tizen/unified/20200416.080052Jan Vorlicek2-2/+0
* Fix PIE options We were missing passing the -pie linker option. That means that while we were compiling our code as position independent, the executables (not shared libraries) were not marked as position independent and ASLR was not applied to them. They were always loaded to fixed addresses. This change adds the missing -pie option and also replaces all the individual settings of -fPIE / -fPIC on the targets we build by a centralized setting of CMAKE_POSITION_INDEPENDENT_CODE variable that causes cmake to add the appropriate compiler options everywhere. * Fix native parts of coreclr tests build The native parts of the tests are not built using the root CMakeLists.txt so I am moving enabling the position independent code to configurecompiler.cmake Change-Id: Ieafff8984ec23e5fdb00fb0c2fb017e53afbce88
2020-02-13Fix GC heap corruption on ARM. (#27985)Anton Lapounov1-1/+1
Port of dotnet/runtime#1389.
2020-01-14Port to 3.1 - Fix getting affinity set on MUSL on Jetson TX2 (#27957)Jan Vorlicek1-2/+2
Ports https://github.com/dotnet/runtime/pull/206 to release/3.1. The code in PAL_GetCurrentThreadAffinitySet relied on the fact that the number of processors reported as configured in the system is always larger than the maximum CPU index. However, it turns out that it is not true on some devices / distros. The Jetson TX2 reports CPUs 0, 3, 4 and 5 in the affinity mask and the 1 and 2 are never reported. GLIBC reports 6 as the number of configured CPUs, however MUSL reports just 4. The PAL_GetCurrentThreadAffinitySet was using the number of CPUs reported as configured as the upper bound for scanning affinity set, so on Jetson TX2, the affinity mask returned had just two bits set while there were 4 CPUs. That triggered an assert in the GCToOSInterface::Initialize. This change fixes that by reading the maximum CPU index from the /proc/cpuinfo. It falls back to using the number of processors configured when the /proc/cpuinfo is not available (on macOS, FreeBSD, ...) Fixes https://github.com/dotnet/runtime/issues/170
2019-10-14Fix available memory extraction on Linux (#26764) (#26938)agoretsky3-15/+278
* Fix available memory extraction on Linux The GlobalMemoryStatusEx in PAL is returning number of free physical pages in the ullAvailPhys member. But there are additional pages that are allocated as buffers and caches that get released when there is a memory pressure and thus they are effectively available too. This change extracts the available memory on Linux from the /proc/meminfo MemAvailable row, which is reported by the kernel as the most precise amount of available memory.
2019-10-03oom (#26457) (#26983)Maoni Stephens2-1/+31
+ when hardlimit is specified we should only retry when we didn't fail due to commit failure - if commit failed it means we simply didn't have as much memory as what the hardlimit specified. we should throw OOM in this case. + added some diag info around OOM history to help with future diagnostics. (cherry picked from commit 7dca41fd36721068e610c537654765e8e42275d7)
2019-08-16Fix a potential division by 0 in post GC counter computation (#26085) (#26089)Sung Yoon Whang1-1/+4
* Fix a potential division by 0 in post GC counter computation * Remove useless code
2019-07-12Fixes when accessing fgn_maxgen_percent (#25650)Andy Hanson1-6/+15
* Fixes when accessing fgn_maxgen_percent PR #25350 changed `fgn_maxgen_percent` to be a per-heap property when `MULTIPLE_HEAPS` is set. A few uses need to be updated. * In `full_gc_wait`, must re-read `fgn_maxgen_percent` before the second test of `maxgen_percent == 0`. (Otherwise the second test is statically unreachable.) * In RegisterForFullGCNotification, must set `fgn_maxgen_percent` when `MULTIPLE_HEAPS` is not set * In CancelFullGCNotification, must set `fgn_maxgen_percent` for each heap separately when `MULTIPLE_HEAPS` is set. Fix dotnet/corefx#39374 * Avoid duplicate code when getting fgn_maxgen_percent twice in full_gc_wait
2019-07-08Return HardLimitBytes from GCMemoryInfo.TotalAvailableMemoryBytes (#25437)Andy Hanson3-21/+25
* Add property HardLimitBytes to GCMemoryInfo This adds a new property HardLimitBytes. Unlike TotalAvailableMemoryBytes, this will reflect an explicitly set COMPLUS_GCHeapHardLimit. It will also reflect the fraction of a container's size that we use, where TotalAvailableMemoryBytes is the total container size. Normally, though, it is equal to TotalAvailableMemoryBytes. Fix #38821 * Remove HardLimitBytes; have TotalAvailableMemoryBytes take on its behavior * Fix typos * Separate total_physical_mem and heap_hard_limit so we can compute highMemoryLoadThresholdBytes and memoryLoadBytes * Do more work in gc.cpp instead of Gc.cs * Consistently end names in "Bytes"
2019-07-05many core (#25350)Maoni Stephens7-152/+970
2019-06-25Brick table (#25349)Peter Sollich2-18/+17
Fix brick table logic to fix perf issue in several ASP.NET tests, remove #ifdef FFIND_OBJECT. What I observed was that some GCs spent a lot of time in find_first_object called from find_object, which is called during stack scanning to find the containing object for interior pointers. A substantial fraction of generation 0 was being scanned, indicating that the brick table logic didn't work properly in these cases. The root cause was the fact that the brick table entries were not being set in adjust_limit_clr if the allocation was satisfied from the free list in gen0 instead of newly allocated space. This is the case if there are pinned objects in gen0 as well. The main fix is in adjust_limit_clr - if the allocation is satisfied from the freelist, seg is nullptr, the change is to set the bricks in this case as well if we are allocating in gen0 and the allocated piece is above a reasonable size threshold. The bricks are not set always set during allocation - instead, when we detect an interior pointer during GC, we make the allocator set the bricks during the next GC cycles by setting gen0_must_clear_bricks. I changed the way this is handled for server GC (multiple heaps). We used to multiply the decay time by the number of heaps (gc_heap::n_heaps), but only applied it to the single heap where an interior pointer was found. Instead, I think it's better to instead set gen0_must_clear_bricks for all heaps, but leave the decay time unchanged compared to workstation GC. Maoni suggested to remove the #ifdef FFIND_OBJECT - interior pointers are not going away, so the #ifdefs are unnecessary clutter. Addressed code review feedback: - add parentheses as per GC coding conventions - use max instead of if-statement - merge body of for-loop over all into existing for-loop
2019-06-21don't require seg size to be power of 2 for large pages (#25216)Maoni Stephens1-5/+17
large pages will have segments aligned to 16mb (the default min seg size for hardlimit)
2019-06-20ensure process-wide fence when updating GC write barrier on ARM64 (#25130)Vladimir Sadov1-1/+1
* ensure process-wide fences when updating GC write barrier on ARM64
2019-06-15Do not export GC entrypoints outside standalone build (#25184)Michal Strehovský1-0/+4
It doesn't seem like something we would want to export outside standalone build.
2019-06-12Expose readonly heap segments to DAC (#25113)Michal Strehovský1-0/+1
This was in CoreRT's copy of gcinterface.dac.h, but got lost in dotnet/corert#7517.
2019-06-11CoreRT changeSuchiman1-7/+3
2019-06-11Ensure gen0_max_size to be initially >= gen0_min_sizeSuchiman1-0/+2
Otherwise, gen0_min_size is eventually capped by gen0_max_size, which makes it impossible to raise gen0 size above the default max sizes for gen0. This is required for some scenarios (CppCodeGen, WASM) in CoreRT.
2019-06-11Multiple CoreRT changesSuchiman1-14/+7
2019-06-11Fix castsSuchiman2-5/+5
2019-06-11Fix Redhawk definesSuchiman2-8/+4
2019-06-11UNREFERENCED_PARAMETERSuchiman2-6/+10
2019-06-11Port typo fixes from CoreRTSuchiman6-49/+49
2019-06-06Use CMake's C# support to build DacTableGen instead of manually invoking ↵Jeremy Koritzinsky1-2/+2
csc.exe ourselves. (#24342) * Use CMake's C# support to build DacTableGen instead of manually invoking csc.exe ourselves. * Fix x86 failures. * Disable DAC generation when building with NMake Makefiles and issue an error since the CMake C# support is VS-only. We don't actually support building with NMake (only configure) so this is ok. * Clean up rest of the macro=1's PR Feedback. * Fix Visual Studio generator matching. * Explicitly specify anycpu32bitpreferred for DacTableGen so the ARM64 build doesn't accidentally make it 64-bit * Fix bad merge
2019-06-06Clear syncblock early when `VERIFY_HEAP && DEBUG` to prevent verification ↵Vladimir Sadov1-0/+11
asserts. (#24992) Fixes:#24879
2019-05-28Using AllocateUninitializedArray in array pool (#24504)Vladimir Sadov1-14/+28
* Just use `new T[]` when elements are not pointer-free * reduce zeroing out when not necessary. * use AllocateUninitializedArray in ArrayPool
2019-05-28Fix initial thread affinity on Linux (#24801)Jan Vorlicek1-1/+1
* Fix initial thread affinity on Linux On Linux, a new thread inherits the affinity mask of the thread that created the new thread. This is a problem for background GC threads that are created by one of the server GC threads that are affinitized to single core. This change adds resetting each new thread affinity to match the current process affinity. In addition to that, I've also fixed the extraction of the CPU count that was using PID 0. While the doc says that 0 represents current process, it in fact means current thread. And as a small bonus, I've added caching of the value returned by the PAL_GetLogicalCpuCountFromOS, since it cannot change during runtime.
2019-05-24Add more runtime GC counters (#24561)Sung Yoon Whang3-118/+31
* Add Series/CounterType to CounterPayload and IncrementingCounterPayload * merging with master * Add Generation sizes counter * Some cleanup * Add allocation rate counter * Fix build * add Allocation Rate runtime counter * Fix a potential div by zero exception * Add back in code commented out * Add LOH size counter * Fix linux build * GetTotalAllocated -> GetTotalAllocation * PR feedback * More cleanup + renaming per PR feedback * undo comments * more pr feedback * Use existing GC.GetTotalAllocatedBytes API instead * Remove duplicate GetTotalAllocation * More PR feedback * Fix x86 build * Match type between C++/C# * remove unused variables'
2019-05-21Fix GCToOSInterface::SetCurrentThreadIdealAffinity on Unix (#24706)Jan Vorlicek1-1/+2
The code was using GCToOSInterface::SetThreadAffinity, which effectively pinned the current thread to a specific processor. On Windows, it calls SetThreadIdealProcessor which is basically just a scheduler hint, but the thread can stil run on other threads. Since there is no way to set ideal affinity on Unix, the fix is to do nothing in the GCToOSInterface::SetCurrentThreadIdealAffinity.
2019-05-17Merge pull request #24520 from am11/freebsd/set-affinityJan Vorlicek3-4/+21
Fix CPUSET_T definition for FreeBSD
2019-05-15Remove concept of AppDomains from the GC (#24536)David Wrighton20-1180/+27
* Remove concept of AppDomains from the GC - Leave constructs allowing for multiple handle tables, as scenarios for that have been proposed - Remove FEATURE_APPDOMAIN_RESOURCE_MONITORING
2019-05-14Fix issues reported by PREfast static analysis tool (#24577)Jan Kotas6-17/+9
2019-05-13Implement GC.GetTotalAllocatedBytes (#23852)Ludovic Henry4-3/+45
* keep what's allocated so far on each heap * Implement GC.GetTotalAllocatedBytes It is based on https://github.com/dotnet/corefx/issues/34631 and https://github.com/dotnet/corefx/issues/30644 * Fixing races related to dead_threads_non_alloc_bytes * separated per-heap SOH and LOH counters. Different locks imply that we need different counters. * allow/ignore torn 64bit reads on 32bit in imprecise mode. * PR feedback * simplified the test a little to avoid OOM on ARM
2019-05-11Fix CPUSET_T definition for FreeBSDAdeel3-4/+21
2019-05-10Move EventProvider native layout to be driven by CMake configure (#24478)Jeremy Koritzinsky1-0/+1
* Generate eventpipe implementation as part of CMake configure. * Generate Etw provider as part of CMake configure. * First pass porting over lttng provider to cmake. * Fix up CMake Lttng provider generation. * Move Lttng provider into CMake tree. * Move dummy event provider to CMake * Move genEventing into the CMake tree. * Remove extraneous logging and unused python locator. * Clean up build.sh * Clean up genEventingTests.py * Add dependencies to enable more incremental builds (providers not fully incremental). * Convert to custom command and targets instead of at configure time. * Get each eventing target to incrementally build. * Fix incremental builds * Add missing dependencies on eventing headers. * PR Feedback. Mark all generated files as generated * Clean up eventprovider test CMakeLists
2019-05-08Merge pull request #24366 from sandreenko/fixLogPrintingSergey Andreenko1-1/+1
Fix some small issues with stress logging.
2019-05-02System.GC.AllocateUninitializedArray (#24096)Vladimir Sadov3-73/+155
* Do not expand to allocation_quantum in SOH when GC_ALLOC_ZEROING_OPTIONAL * short-circuit short arrays to use `new T[size]` * Clean syncblock of large-aligned objects on ARM32 * specialize single-dimensional path AllocateSzArray * Unit tests * Some PR feedback. Made AllocateUninitializedArray not be trimmed away. * PR feedback on gchelpers - replaced use of multiple bool parameters with flags enum - merged some methods with nearly identical implementation - switched callers to use AllocateSzArray vs. AllocateArrayEx where appropriate. * PR feedback. Removed X86 specific array/string allocation helpers.
2019-05-02Fix some small issues with stress logging.Sergey Andreenko1-1/+1
2019-05-01When large pages are enabled, only reserve/commit 1x seg size for LOH (#24320)Andy Hanson2-6/+4
When large pages are enabled, we must commit everything we reserve. Previously we reserved 2x the segment size for LOH. This is a problem with large pages where we must commit everything we reserve. Thanks to https://github.com/dotnet/coreclr/pull/24081 this does not cause performance regression with large pages; but without large pages we were seeing regressions when the loh_seg_size was reduced. So this change will only take effect when large pages are enabled.
2019-04-26Typos (#24280)John Doe1-1/+1
* thier -> their * exeption -> exception * Estbalisher -> Establisher * neeed -> need * neeed -> need * neeeded -> needed * neeeded -> needed * facilitiate -> facilitate * extremly -> extremely * extry -> extra
2019-04-26Improve LOH heap balancing (#24081)Andy Hanson2-73/+135
* Improve LOH heap balancing Previously in `balance_heaps_loh`, we would default to `org_hp` being `acontext->get_alloc_heap()`. Since `alloc_large_object` is an instance method, that ultimately came from the heap instance this was called on. In `GCHeap::Alloc` that came from `acontext->get_alloc_heap()` (this is a different acontext). That variable is set when we allocate a small object. So the heap we were allocating large objects on was affected by the heap we were allocating small objects on. This isn't necessary as small object heap and large object heaps have separate areas. In scenarios with limited memory, we can unnecessarily run out of memory by refusing to move away from that hea. However, we do want to ensure that the large object heap accessed is not on a different numa node than the small object heap. I experimented with adding a `get_loh_alloc_heap()` to acontext similar to the SOH alloc heap, but performance tests showed that it was usually better to just start from the home heap. The chosen policy was: * Start searching from the home heap -- this is the one corresponding to our processor. * Have a low (but non-zero) preference for that heap (dd_min_size(dd) / 2), as long as we stay within the same numa node. * Have a higher cost of switching to a different numa node. However, this is still much less than before; it was dd_min_size(dd) * 4, now dd_min_size(dd) * 3 / 2. This showed big performance improvements (over 30% less time) in a scenario with lots of LOH allocation where there were fewer allocating threads than GC heaps. The changes were more pronounced the more we allocated large objects vs small objects. There was usually slight improvement (1-2%) when there were 48 constantly allocating threads and 48 heaps. The one place we did see a slight regression was in an 800MB container and 4 allocating threads on a 48 processor machine; however, similar tests with less memory or more threads were prone to running out of memory or running very slow on the master branch, so we've improved stability. Previously the gc could get lucky by having the SOH choice happen to be a good choice for LOH, but we shouldn't be relying on it as it failed in some container scenarios. One more change is in joined_generation_to_condemn: If there is a memory limit and we are about to OOM, we should always do a compacting GC. This helps avoid the OOM and feeds into the next change. This PR also adds a *second* balance_heaps_loh function for when there is a memory limit and we previously failed to allocate into the chosen heap. `balance_heaps_loh` works based on allocation budgets, whereas `balance_heaps_loh_hard_limit_retry` works on the actual space available at the end of the segment. Thanks to the change to joined_generation_to_condemn the heaps should be compact, so not looking at free space here. * Fix uninitialized variable * In a container, use space available instead of budget * Fix duplicate semicolon
2019-04-26Fix creation of the NUMA node to heap number mapJan Vorlicek1-9/+17
The current implementation assumes that the NUMA nodes of CPUs used for GC threads form a zero based continous range. However that doesn't have to be true for cases when user selects only a subset of the available CPUs for the GC heap threads using the COMPlus_GCHeapAffinitizeMask or COMPlus_GCHeapAffinitizeRanges. The selected CPUs may belong only to a subset of NUMA nodes that don't necessarily start at node 0 or form a continuous range. This change fixes the algorithm that initializes the numa_node_to_heap_map lookup array so that it works correctly even in such cases.
2019-04-25Merge pull request #24242 from janvorli/fix-numa-node-for-disabled-numaJan Vorlicek1-13/+13
Fix NUMA node for heap when NUMA is not available
2019-04-25Fix NUMA node for heap when NUMA is not availableJan Vorlicek1-13/+13
The recent refactoring of the GCToOSInterface::GetProcessorForHeap has accidentally changed the NUMA node returned in case NUMA is disabled (either via the COMPlus_GCNumaAware or due to the fact that there is just a single NUMA node on the system) and the CPU groups are disabled. Before that refactoring, the code was incorrectly returning 0 as the NUMA node when CPU groups were disabled no matter whether NUMA was enabled or disabled. The refactoring fixed that by returning the current CPU group number for the case when NUMA was enabled, however it still returned incorrect value, this time GroupProcNo::NoGroup as the NUMA node number in case NUMA was disabled. This change fixes it by returning the current group number in this case.
2019-04-24Switch to workstation GC in case of constrained CPU resources (#24194)Ludovic Henry1-5/+1
* Switch to workstation GC in case of constrained CPU resources Right now, if the user sets the configuration so that the server GC is used, the server GC will be loaded even in conditions where we know the workstation GC would fare better. An example of such conditions is constrained environment where there is only 1 or less CPU or with very low memory. This can be harmful if users deploy the same projects on different kind of platforms: deploying to a 20+ cores server and to Azure Functions will require largely different configurations for the runtime. There are already multiple ways for the user to specify to use the server GC or not: - setting `COMPlus_gcServer` as an environment variable - setting `gcServer` in the configuration file - setting `System.GC.Server` passed to `coreclr_initialize` Fix https://github.com/dotnet/coreclr/issues/23949 * Address review * Address review Remove GCToOSInterface::GetCurrentProcessCpuLimit in favor of GCToOSInterface::GetCurrentProcessCpuCount because the CpuLimit is taken into account in the CpuCount again. * Address review Do the work in src/vm/ceemain.cpp otherwise there will be a disparity between what the VM and the GC are running. Before, only the GC would be aware of the switch from server to workstation GC, but not the VM.
2019-04-24Add Medium GC Profiling Mode & ICorProfilerInfo::GetObjectReferences (#24156)Mukul Sabharwal7-7/+29
2019-04-23Delete unnecessary static and update GCSample to VS2019 (#24204)Jan Kotas3-6/+4
2019-04-19Large Pages on Linux & macOS (#24098)Mukul Sabharwal3-0/+12
2019-04-17Put back the CPU limiting in GCJan Vorlicek1-8/+14
The CPU limiting was accidentally removed during refactoring of the CPU groups support in GC. This change puts them back.
2019-04-16Use delete [] on array types (#24027)Omair Majid1-1/+1
Calling delete on types allocated with new[] leads to undefined behaviour.
2019-04-15Delete unused YieldProcessorScalingFactor from GC (#23994)Jan Kotas3-15/+0
2019-04-15Merge pull request #23981 from VSadov/arm32fix22422Vladimir Sadov1-36/+8
Adjust plug_size_to_fit to consider large alignment on ARM32