Improve LOH heap balancing (#24081)

* Improve LOH heap balancing Previously in `balance_heaps_loh`, we would default to `org_hp` being `acontext->get_alloc_heap()`. Since `alloc_large_object` is an instance method, that ultimately came from the heap instance this was called on. In `GCHeap::Alloc` that came from `acontext->get_alloc_heap()` (this is a different acontext). That variable is set when we allocate a small object. So the heap we were allocating large objects on was affected by the heap we were allocating small objects on. This isn't necessary as small object heap and large object heaps have separate areas. In scenarios with limited memory, we can unnecessarily run out of memory by refusing to move away from that hea. However, we do want to ensure that the large object heap accessed is not on a different numa node than the small object heap. I experimented with adding a `get_loh_alloc_heap()` to acontext similar to the SOH alloc heap, but performance tests showed that it was usually better to just start from the home heap. The chosen policy was: * Start searching from the home heap -- this is the one corresponding to our processor. * Have a low (but non-zero) preference for that heap (dd_min_size(dd) / 2), as long as we stay within the same numa node. * Have a higher cost of switching to a different numa node. However, this is still much less than before; it was dd_min_size(dd) * 4, now dd_min_size(dd) * 3 / 2. This showed big performance improvements (over 30% less time) in a scenario with lots of LOH allocation where there were fewer allocating threads than GC heaps. The changes were more pronounced the more we allocated large objects vs small objects. There was usually slight improvement (1-2%) when there were 48 constantly allocating threads and 48 heaps. The one place we did see a slight regression was in an 800MB container and 4 allocating threads on a 48 processor machine; however, similar tests with less memory or more threads were prone to running out of memory or running very slow on the master branch, so we've improved stability. Previously the gc could get lucky by having the SOH choice happen to be a good choice for LOH, but we shouldn't be relying on it as it failed in some container scenarios. One more change is in joined_generation_to_condemn: If there is a memory limit and we are about to OOM, we should always do a compacting GC. This helps avoid the OOM and feeds into the next change. This PR also adds a *second* balance_heaps_loh function for when there is a memory limit and we previously failed to allocate into the chosen heap. `balance_heaps_loh` works based on allocation budgets, whereas `balance_heaps_loh_hard_limit_retry` works on the actual space available at the end of the segment. Thanks to the change to joined_generation_to_condemn the heaps should be compact, so not looking at free space here. * Fix uninitialized variable * In a container, use space available instead of budget * Fix duplicate semicolon
author: Andy Hanson <anhans@microsoft.com> 2019-04-26 19:33:26 -0700
committer: GitHub <noreply@github.com> 2019-04-26 19:33:26 -0700
commit: 141926d90c54bb358cfe8d9eb641c88e94639a8c (patch)
tree: 3f52ea1959640fc74e427dfedad385dcf9e2fddb /src/gc/gcpriv.h
parent: 4452efd309d40d4bc7fc1fa48bf1b6e615ee6755 (diff)
download: coreclr-141926d90c54bb358cfe8d9eb641c88e94639a8c.tar.gz
coreclr-141926d90c54bb358cfe8d9eb641c88e94639a8c.tar.bz2
coreclr-141926d90c54bb358cfe8d9eb641c88e94639a8c.zip
1 files changed, 10 insertions, 2 deletions
diff --git a/src/gc/gcpriv.h b/src/gc/gcpriv.h
index b13cd24fa4..34c820aa22 100644
--- a/src/gc/gcpriv.h
+++ b/src/gc/gcpriv.h
@@ -1222,9 +1222,15 @@ public:
                              alloc_context* acontext);
 
 #ifdef MULTIPLE_HEAPS
-    static void balance_heaps (alloc_context* acontext);
+    static
+    void balance_heaps (alloc_context* acontext);
+    PER_HEAP
+    ptrdiff_t get_balance_heaps_loh_effective_budget ();
     static 
     gc_heap* balance_heaps_loh (alloc_context* acontext, size_t size);
+    // Unlike balance_heaps_loh, this may return nullptr if we failed to change heaps.
+    static
+    gc_heap* balance_heaps_loh_hard_limit_retry (alloc_context* acontext, size_t size);
     static
     void gc_thread_stub (void* arg);
 #endif //MULTIPLE_HEAPS
@@ -1232,6 +1238,8 @@ public:
     // For LOH allocations we only update the alloc_bytes_loh in allocation
     // context - we don't actually use the ptr/limit from it so I am
     // making this explicit by not passing in the alloc_context.
+    // Note: This is an instance method, but the heap instance is only used for
+    // lowest_address and highest_address, which are currently the same accross all heaps.
     PER_HEAP
     CObjectHeader* allocate_large_object (size_t size, int64_t& alloc_bytes);
 
@@ -1446,7 +1454,7 @@ protected:
     PER_HEAP
     allocation_state try_allocate_more_space (alloc_context* acontext, size_t jsize,
                                               int alloc_generation_number);
-    PER_HEAP
+    PER_HEAP_ISOLATED
     BOOL allocate_more_space (alloc_context* acontext, size_t jsize,
                               int alloc_generation_number);
author	Andy Hanson <anhans@microsoft.com>	2019-04-26 19:33:26 -0700
committer	GitHub <noreply@github.com>	2019-04-26 19:33:26 -0700
commit	141926d90c54bb358cfe8d9eb641c88e94639a8c (patch)
tree	3f52ea1959640fc74e427dfedad385dcf9e2fddb /src/gc/gcpriv.h
parent	4452efd309d40d4bc7fc1fa48bf1b6e615ee6755 (diff)
download	coreclr-141926d90c54bb358cfe8d9eb641c88e94639a8c.tar.gz coreclr-141926d90c54bb358cfe8d9eb641c88e94639a8c.tar.bz2 coreclr-141926d90c54bb358cfe8d9eb641c88e94639a8c.zip