Improve LOH heap balancing (#24081)

* Improve LOH heap balancing Previously in `balance_heaps_loh`, we would default to `org_hp` being `acontext->get_alloc_heap()`. Since `alloc_large_object` is an instance method, that ultimately came from the heap instance this was called on. In `GCHeap::Alloc` that came from `acontext->get_alloc_heap()` (this is a different acontext). That variable is set when we allocate a small object. So the heap we were allocating large objects on was affected by the heap we were allocating small objects on. This isn't necessary as small object heap and large object heaps have separate areas. In scenarios with limited memory, we can unnecessarily run out of memory by refusing to move away from that hea. However, we do want to ensure that the large object heap accessed is not on a different numa node than the small object heap. I experimented with adding a `get_loh_alloc_heap()` to acontext similar to the SOH alloc heap, but performance tests showed that it was usually better to just start from the home heap. The chosen policy was: * Start searching from the home heap -- this is the one corresponding to our processor. * Have a low (but non-zero) preference for that heap (dd_min_size(dd) / 2), as long as we stay within the same numa node. * Have a higher cost of switching to a different numa node. However, this is still much less than before; it was dd_min_size(dd) * 4, now dd_min_size(dd) * 3 / 2. This showed big performance improvements (over 30% less time) in a scenario with lots of LOH allocation where there were fewer allocating threads than GC heaps. The changes were more pronounced the more we allocated large objects vs small objects. There was usually slight improvement (1-2%) when there were 48 constantly allocating threads and 48 heaps. The one place we did see a slight regression was in an 800MB container and 4 allocating threads on a 48 processor machine; however, similar tests with less memory or more threads were prone to running out of memory or running very slow on the master branch, so we've improved stability. Previously the gc could get lucky by having the SOH choice happen to be a good choice for LOH, but we shouldn't be relying on it as it failed in some container scenarios. One more change is in joined_generation_to_condemn: If there is a memory limit and we are about to OOM, we should always do a compacting GC. This helps avoid the OOM and feeds into the next change. This PR also adds a *second* balance_heaps_loh function for when there is a memory limit and we previously failed to allocate into the chosen heap. `balance_heaps_loh` works based on allocation budgets, whereas `balance_heaps_loh_hard_limit_retry` works on the actual space available at the end of the segment. Thanks to the change to joined_generation_to_condemn the heaps should be compact, so not looking at free space here. * Fix uninitialized variable * In a container, use space available instead of budget * Fix duplicate semicolon
author: Andy Hanson <anhans@microsoft.com> 2019-04-26 19:33:26 -0700
committer: GitHub <noreply@github.com> 2019-04-26 19:33:26 -0700
commit: 141926d90c54bb358cfe8d9eb641c88e94639a8c (patch)
tree: 3f52ea1959640fc74e427dfedad385dcf9e2fddb /src/gc
parent: 4452efd309d40d4bc7fc1fa48bf1b6e615ee6755 (diff)
download: coreclr-141926d90c54bb358cfe8d9eb641c88e94639a8c.tar.gz
coreclr-141926d90c54bb358cfe8d9eb641c88e94639a8c.tar.bz2
coreclr-141926d90c54bb358cfe8d9eb641c88e94639a8c.zip
2 files changed, 135 insertions, 73 deletions
diff --git a/src/gc/gc.cpp b/src/gc/gc.cpp
index b3bb610be4..1b57fa029e 100644
--- a/src/gc/gc.cpp
+++ b/src/gc/gc.cpp
@@ -13562,79 +13562,123 @@ try_again:
     acontext->alloc_count++;
 }
 
-gc_heap* gc_heap::balance_heaps_loh (alloc_context* acontext, size_t alloc_size)
+ptrdiff_t gc_heap::get_balance_heaps_loh_effective_budget ()
 {
-    gc_heap* org_hp = acontext->get_alloc_heap()->pGenGCHeap;
-    dprintf (3, ("[h%d] LA: %Id", org_hp->heap_number, alloc_size));
-
-    //if (size > 128*1024)
-    if (1)
+    if (heap_hard_limit)
     {
-        dynamic_data* dd = org_hp->dynamic_data_of (max_generation + 1);
+        const ptrdiff_t free_list_space = generation_free_list_space (generation_of (max_generation + 1));
+        heap_segment* seg = generation_start_segment (generation_of (max_generation + 1));
+        assert (heap_segment_next (seg) == nullptr);
+        const ptrdiff_t allocated = heap_segment_allocated (seg) - seg->mem;
+        // We could calculate the actual end_of_seg_space by taking reserved - allocated,
+        // but all heaps have the same reserved memory and this value is only used for comparison.
+        return free_list_space - allocated;
+    }
+    else
+    {
+        return dd_new_allocation (dynamic_data_of (max_generation + 1));
+    }
+}
 
-        ptrdiff_t org_size = dd_new_allocation (dd);
-        gc_heap* max_hp;
-        ptrdiff_t max_size;
-        size_t delta = dd_min_size (dd) * 4;
+gc_heap* gc_heap::balance_heaps_loh (alloc_context* acontext, size_t alloc_size)
+{
+    const int home_hp_num = heap_select::select_heap(acontext, 0);
+    dprintf (3, ("[h%d] LA: %Id", home_heap, alloc_size));
+    gc_heap* home_hp = GCHeap::GetHeap(home_hp_num)->pGenGCHeap;
+    dynamic_data* dd = home_hp->dynamic_data_of (max_generation + 1);
+    const ptrdiff_t home_hp_size = home_hp->get_balance_heaps_loh_effective_budget ();
 
-        int start, end, finish;
-        heap_select::get_heap_range_for_heap(org_hp->heap_number, &start, &end);
-        finish = start + n_heaps;
+    size_t delta = dd_min_size (dd) / 2;
+    int start, end;
+    heap_select::get_heap_range_for_heap(home_hp_num, &start, &end);
+    const int finish = start + n_heaps;
 
 try_again:
-        {
-            max_hp = org_hp;
-            max_size = org_size + delta;
-            dprintf (3, ("orig hp: %d, max size: %d",
-                org_hp->heap_number,
-                max_size));
+    gc_heap* max_hp = home_hp;
+    ptrdiff_t max_size = home_hp_size + delta;
+    
+    dprintf (3, ("home hp: %d, max size: %d",
+        home_hp_num,
+        max_size));
 
-            for (int i = start; i < end; i++)
-            {
-                gc_heap* hp = GCHeap::GetHeap(i%n_heaps)->pGenGCHeap;
-                dd = hp->dynamic_data_of (max_generation + 1);
-                ptrdiff_t size = dd_new_allocation (dd);
-                dprintf (3, ("hp: %d, size: %d",
-                    hp->heap_number,
-                    size));
-                if (size > max_size)
-                {
-                    max_hp = hp;
-                    max_size = size;
-                    dprintf (3, ("max hp: %d, max size: %d",
-                        max_hp->heap_number,
-                        max_size));
-                }
-            }
-        }
+    for (int i = start; i < end; i++)
+    {
+        gc_heap* hp = GCHeap::GetHeap(i%n_heaps)->pGenGCHeap;
+        const ptrdiff_t size = hp->get_balance_heaps_loh_effective_budget ();
 
-        if ((max_hp == org_hp) && (end < finish))
+        dprintf (3, ("hp: %d, size: %d", hp->heap_number, size));
+        if (size > max_size)
         {
-            start = end; end = finish;
-            delta = dd_min_size(dd) * 4;   // Need to tuning delta
-            goto try_again;
+            max_hp = hp;
+            max_size = size;
+            dprintf (3, ("max hp: %d, max size: %d",
+                max_hp->heap_number,
+                max_size));
         }
+    }
 
-        if (max_hp != org_hp)
+    if ((max_hp == home_hp) && (end < finish))
+    {
+        start = end; end = finish;
+        delta = dd_min_size (dd) * 3 / 2; // Make it harder to balance to remote nodes on NUMA.
+        goto try_again;
+    }
+
+    if (max_hp != home_hp)
+    {
+        dprintf (3, ("loh: %d(%Id)->%d(%Id)", 
+            home_hp->heap_number, dd_new_allocation (home_hp->dynamic_data_of (max_generation + 1)),
+            max_hp->heap_number, dd_new_allocation (max_hp->dynamic_data_of (max_generation + 1))));
+    }
+
+    return max_hp;
+}
+
+gc_heap* gc_heap::balance_heaps_loh_hard_limit_retry (alloc_context* acontext, size_t alloc_size)
+{
+    assert (heap_hard_limit);
+    const int home_heap = heap_select::select_heap(acontext, 0);
+    dprintf (3, ("[h%d] balance_heaps_loh_hard_limit_retry alloc_size: %d", home_heap, alloc_size));
+    int start, end;
+    heap_select::get_heap_range_for_heap (home_heap, &start, &end);
+    const int finish = start + n_heaps;
+
+    gc_heap* max_hp = nullptr;
+    size_t max_end_of_seg_space = alloc_size; // Must be more than this much, or return NULL
+
+try_again:
+    {
+        for (int i = start; i < end; i++)
         {
-            dprintf (3, ("loh: %d(%Id)->%d(%Id)", 
-                org_hp->heap_number, dd_new_allocation (org_hp->dynamic_data_of (max_generation + 1)),
-                max_hp->heap_number, dd_new_allocation (max_hp->dynamic_data_of (max_generation + 1))));
+            gc_heap* hp = GCHeap::GetHeap (i%n_heaps)->pGenGCHeap;
+            heap_segment* seg = generation_start_segment (hp->generation_of (max_generation + 1));
+            // With a hard limit, there is only one segment.
+            assert (heap_segment_next (seg) == nullptr);
+            const size_t end_of_seg_space = heap_segment_reserved (seg) - heap_segment_allocated (seg);
+            if (end_of_seg_space >= max_end_of_seg_space)
+            {
+                dprintf (3, ("Switching heaps in hard_limit_retry! To: [h%d], New end_of_seg_space: %d", hp->heap_number, end_of_seg_space));
+                max_end_of_seg_space = end_of_seg_space;
+                max_hp = hp;
+            }
         }
-
-        return max_hp;
     }
-    else
+
+    // Only switch to a remote NUMA node if we didn't find space on this one.
+    if ((max_hp == nullptr) && (end < finish))
     {
-        return org_hp;
+        start = end; end = finish;
+        goto try_again;
     }
+
+    return max_hp;
 }
 #endif //MULTIPLE_HEAPS
 
 BOOL gc_heap::allocate_more_space(alloc_context* acontext, size_t size,
                                   int alloc_generation_number)
 {
-    allocation_state status;
+    allocation_state status = a_state_start;
     do
     { 
 #ifdef MULTIPLE_HEAPS
@@ -13645,7 +13689,20 @@ BOOL gc_heap::allocate_more_space(alloc_context* acontext, size_t size,
         }
         else
         {
-            gc_heap* alloc_heap = balance_heaps_loh (acontext, size);
+            gc_heap* alloc_heap;
+            if (heap_hard_limit && (status == a_state_retry_allocate))
+            {
+                alloc_heap = balance_heaps_loh_hard_limit_retry (acontext, size);
+                if (alloc_heap == nullptr)
+                {
+                    return false;
+                }
+            }
+            else
+            {
+                alloc_heap = balance_heaps_loh (acontext, size);
+            }
+
             status = alloc_heap->try_allocate_more_space (acontext, size, alloc_generation_number);
             if (status == a_state_retry_allocate)
             {
@@ -14678,10 +14735,15 @@ int gc_heap::joined_generation_to_condemn (BOOL should_evaluate_elevation,
         dprintf (GTC_LOG, ("committed %Id is %d%% of limit %Id", 
             current_total_committed, (int)((float)current_total_committed * 100.0 / (float)heap_hard_limit),
             heap_hard_limit));
-        if ((current_total_committed * 10) >= (heap_hard_limit * 9))
-        {
-            bool full_compact_gc_p = false;
 
+        bool full_compact_gc_p = false;
+
+        if (joined_last_gc_before_oom)
+        {
+            full_compact_gc_p = true;
+        }
+        else if ((current_total_committed * 10) >= (heap_hard_limit * 9))
+        {
             size_t loh_frag = get_total_gen_fragmentation (max_generation + 1);
             
             // If the LOH frag is >= 1/8 it's worth compacting it
@@ -14698,14 +14760,14 @@ int gc_heap::joined_generation_to_condemn (BOOL should_evaluate_elevation,
                 full_compact_gc_p = ((est_loh_reclaim * 8) >= heap_hard_limit);
                 dprintf (GTC_LOG, ("loh est reclaim: %Id, 1/8 of limit %Id", est_loh_reclaim, (heap_hard_limit / 8)));
             }
+        }
 
-            if (full_compact_gc_p)
-            {
-                n = max_generation;
-                *blocking_collection_p = TRUE;
-                settings.loh_compaction = TRUE;
-                dprintf (GTC_LOG, ("compacting LOH due to hard limit"));
-            }
+        if (full_compact_gc_p)
+        {
+            n = max_generation;
+            *blocking_collection_p = TRUE;
+            settings.loh_compaction = TRUE;
+            dprintf (GTC_LOG, ("compacting LOH due to hard limit"));
         }
     }
 
@@ -31087,12 +31149,7 @@ CObjectHeader* gc_heap::allocate_large_object (size_t jsize, int64_t& alloc_byte
 {
     //create a new alloc context because gen3context is shared.
     alloc_context acontext;
-    acontext.alloc_ptr = 0;
-    acontext.alloc_limit = 0;
-    acontext.alloc_bytes = 0;
-#ifdef MULTIPLE_HEAPS
-    acontext.set_alloc_heap(vm_heap);
-#endif //MULTIPLE_HEAPS
+    acontext.init();
 
 #if BIT64
     size_t maxObjectSize = (INT64_MAX - 7 - Align(min_obj_size));
@@ -35047,9 +35104,6 @@ GCHeap::Alloc(gc_alloc_context* context, size_t size, uint32_t flags REQD_ALIGN_
         AssignHeap (acontext);
         assert (acontext->get_alloc_heap());
     }
-#endif //MULTIPLE_HEAPS
-
-#ifdef MULTIPLE_HEAPS
     gc_heap* hp = acontext->get_alloc_heap()->pGenGCHeap;
 #else
     gc_heap* hp = pGenGCHeap;
diff --git a/src/gc/gcpriv.h b/src/gc/gcpriv.h
index b13cd24fa4..34c820aa22 100644
--- a/src/gc/gcpriv.h
+++ b/src/gc/gcpriv.h
@@ -1222,9 +1222,15 @@ public:
                              alloc_context* acontext);
 
 #ifdef MULTIPLE_HEAPS
-    static void balance_heaps (alloc_context* acontext);
+    static
+    void balance_heaps (alloc_context* acontext);
+    PER_HEAP
+    ptrdiff_t get_balance_heaps_loh_effective_budget ();
     static 
     gc_heap* balance_heaps_loh (alloc_context* acontext, size_t size);
+    // Unlike balance_heaps_loh, this may return nullptr if we failed to change heaps.
+    static
+    gc_heap* balance_heaps_loh_hard_limit_retry (alloc_context* acontext, size_t size);
     static
     void gc_thread_stub (void* arg);
 #endif //MULTIPLE_HEAPS
@@ -1232,6 +1238,8 @@ public:
     // For LOH allocations we only update the alloc_bytes_loh in allocation
     // context - we don't actually use the ptr/limit from it so I am
     // making this explicit by not passing in the alloc_context.
+    // Note: This is an instance method, but the heap instance is only used for
+    // lowest_address and highest_address, which are currently the same accross all heaps.
     PER_HEAP
     CObjectHeader* allocate_large_object (size_t size, int64_t& alloc_bytes);
 
@@ -1446,7 +1454,7 @@ protected:
     PER_HEAP
     allocation_state try_allocate_more_space (alloc_context* acontext, size_t jsize,
                                               int alloc_generation_number);
-    PER_HEAP
+    PER_HEAP_ISOLATED
     BOOL allocate_more_space (alloc_context* acontext, size_t jsize,
                               int alloc_generation_number);
author	Andy Hanson <anhans@microsoft.com>	2019-04-26 19:33:26 -0700
committer	GitHub <noreply@github.com>	2019-04-26 19:33:26 -0700
commit	141926d90c54bb358cfe8d9eb641c88e94639a8c (patch)
tree	3f52ea1959640fc74e427dfedad385dcf9e2fddb /src/gc
parent	4452efd309d40d4bc7fc1fa48bf1b6e615ee6755 (diff)
download	coreclr-141926d90c54bb358cfe8d9eb641c88e94639a8c.tar.gz coreclr-141926d90c54bb358cfe8d9eb641c88e94639a8c.tar.bz2 coreclr-141926d90c54bb358cfe8d9eb641c88e94639a8c.zip