Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
Fix brick table logic to fix perf issue in several ASP.NET tests, remove #ifdef FFIND_OBJECT.
What I observed was that some GCs spent a lot of time in find_first_object called from find_object, which is called during stack scanning to find the containing object for interior pointers. A substantial fraction of generation 0 was being scanned, indicating that the brick table logic didn't work properly in these cases.
The root cause was the fact that the brick table entries were not being set in adjust_limit_clr if the allocation was satisfied from the free list in gen0 instead of newly allocated space. This is the case if there are pinned objects in gen0 as well.
The main fix is in adjust_limit_clr - if the allocation is satisfied from the freelist, seg is nullptr, the change is to set the bricks in this case as well if we are allocating in gen0 and the allocated piece is above a reasonable size threshold.
The bricks are not set always set during allocation - instead, when we detect an interior pointer during GC, we make the allocator set the bricks during the next GC cycles by setting gen0_must_clear_bricks. I changed the way this is handled for server GC (multiple heaps). We used to multiply the decay time by the number of heaps (gc_heap::n_heaps), but only applied it to the single heap where an interior pointer was found. Instead, I think it's better to instead set gen0_must_clear_bricks for all heaps, but leave the decay time unchanged compared to workstation GC.
Maoni suggested to remove the #ifdef FFIND_OBJECT - interior pointers are not going away, so the #ifdefs are unnecessary clutter.
Addressed code review feedback:
- add parentheses as per GC coding conventions
- use max instead of if-statement
- merge body of for-loop over all into existing for-loop
|
|
|
|
* Remove concept of AppDomains from the GC
- Leave constructs allowing for multiple handle tables, as scenarios for that have been proposed
- Remove FEATURE_APPDOMAIN_RESOURCE_MONITORING
|
|
* keep what's allocated so far on each heap
* Implement GC.GetTotalAllocatedBytes
It is based on https://github.com/dotnet/corefx/issues/34631 and https://github.com/dotnet/corefx/issues/30644
* Fixing races related to dead_threads_non_alloc_bytes
* separated per-heap SOH and LOH counters. Different locks imply that we need different counters.
* allow/ignore torn 64bit reads on 32bit in imprecise mode.
* PR feedback
* simplified the test a little to avoid OOM on ARM
|
|
* Do not expand to allocation_quantum in SOH when GC_ALLOC_ZEROING_OPTIONAL
* short-circuit short arrays to use `new T[size]`
* Clean syncblock of large-aligned objects on ARM32
* specialize single-dimensional path AllocateSzArray
* Unit tests
* Some PR feedback. Made AllocateUninitializedArray not be trimmed away.
* PR feedback on gchelpers
- replaced use of multiple bool parameters with flags enum
- merged some methods with nearly identical implementation
- switched callers to use AllocateSzArray vs. AllocateArrayEx where appropriate.
* PR feedback. Removed X86 specific array/string allocation helpers.
|
|
When large pages are enabled, we must commit everything we reserve.
Previously we reserved 2x the segment size for LOH. This is a problem
with large pages where we must commit everything we reserve.
Thanks to https://github.com/dotnet/coreclr/pull/24081 this does not
cause performance regression with large pages; but without large pages
we were seeing regressions when the loh_seg_size was reduced. So this
change will only take effect when large pages are enabled.
|
|
* Improve LOH heap balancing
Previously in `balance_heaps_loh`, we would default to `org_hp` being
`acontext->get_alloc_heap()`.
Since `alloc_large_object` is an instance method, that ultimately came
from the heap instance this was called on. In `GCHeap::Alloc` that came
from `acontext->get_alloc_heap()` (this is a different acontext). That
variable is set when we allocate a small object. So the heap we were
allocating large objects on was affected by the heap we were allocating
small objects on. This isn't necessary as small object heap and large
object heaps have separate areas. In scenarios with limited memory, we
can unnecessarily run out of memory by refusing to move away from that
hea. However, we do want to ensure that the large object heap accessed
is not on a different numa node than the small object heap.
I experimented with adding a `get_loh_alloc_heap()` to acontext similar
to the SOH alloc heap, but performance tests showed that it was usually
better to just start from the home heap. The chosen policy was:
* Start searching from the home heap -- this is the one corresponding to
our processor.
* Have a low (but non-zero) preference for that heap (dd_min_size(dd) /
2), as long as we stay within the same numa node.
* Have a higher cost of switching to a different numa node. However,
this is still much less than before; it was dd_min_size(dd) * 4, now
dd_min_size(dd) * 3 / 2.
This showed big performance improvements (over 30% less time) in a
scenario with lots of LOH allocation where there were fewer allocating
threads than GC heaps. The changes were more pronounced the more we
allocated large objects vs small objects. There was usually slight
improvement (1-2%) when there were 48 constantly allocating threads and
48 heaps. The one place we did see a slight regression was in an 800MB
container and 4 allocating threads on a 48 processor machine; however,
similar tests with less memory or more threads were prone to running out
of memory or running very slow on the master branch, so we've improved
stability. Previously the gc could get lucky by having the SOH choice
happen to be a good choice for LOH, but we shouldn't be relying on it as
it failed in some container scenarios.
One more change is in joined_generation_to_condemn: If there is a memory
limit and we are about to OOM, we should always do a compacting GC. This
helps avoid the OOM and feeds into the next change.
This PR also adds a *second* balance_heaps_loh function for when there
is a memory limit and we previously failed to allocate into the chosen
heap. `balance_heaps_loh` works based on allocation budgets, whereas
`balance_heaps_loh_hard_limit_retry` works on the actual space available
at the end of the segment. Thanks to the change to
joined_generation_to_condemn the heaps should be compact, so not looking
at free space here.
* Fix uninitialized variable
* In a container, use space available instead of budget
* Fix duplicate semicolon
|
|
|
|
|
|
Don't compact LOH just because a heap_hard_limit exists
|
|
This is based on a perf test with 100% survival in a container, before
and after #22180. GC pause times were greater after that commit.
Debugging showed that the reason was that after, we were always doing
compacting GC, and objects were staying in generation 1 and not making it
to generation 2. The reason was that in the "after" build,
`should_compact_loh()` was always returning true if heap_hard_limit was
set; currently if we do an LOH compaction, we compact all other
generations too. As the comment indicates, we should decide that
automatically, not just set it to true all the time.
|
|
* Replace __sync_swap with __atomic_exchange_n
__sync_swap() is a clang specific function.
* Remove multiline comment
* Add paranthesis around sum
src/md/hotdata/../inc/streamutil.h:73:34: warning: suggest parentheses around ‘+’ in operand of ‘&’ [-Wparentheses]
UINT32 aligned = *totalBytes + 3 & ~3;
* Define __int64
* Define windows types for tests
* Remove undefined has_builtin defines and define alloca and inline for GNUC
* Remove __clang__ where possible
* Add implicit casting to help compiler find WCHAR* variant
src/binder/assembly.cpp:294:73: error: no matching function for call to ‘SString::SString(SString)’
return (pAsmName == nullptr ? nullptr : pAsmName->GetSimpleName());
^
In file included from src/inc/sstring.h:1082:0,
from src/inc/ex.h:19,
from src/inc/stgpool.h:28,
from src/inc/../md/inc/metamodel.h:18,
from src/inc/../md/inc/metamodelro.h:19,
from src/inc/metadata.h:17,
from src/binder/../vm/util.hpp:19,
from src/binder/../vm/common.h:110,
from src/binder/assembly.cpp:14:
src/inc/sstring.inl:73:8: note: candidate: SString::SString(void*, COUNT_T)
inline SString::SString(void *buffer, COUNT_T size)
^
src/inc/sstring.inl:73:8: note: candidate expects 2 arguments, 1 provided
src/inc/sstring.inl:436:8: note: candidate: SString::SString(SString::tagLiteral, const WCHAR*, COUNT_T)
inline SString::SString(tagLiteral dummytag, const WCHAR *literal, COUNT_T count)
^
src/inc/sstring.inl:436:8: note: candidate expects 3 arguments, 1 provided
src/inc/sstring.inl:418:8: note: candidate: SString::SString(SString::tagLiteral, const WCHAR*)
inline SString::SString(tagLiteral dummytag, const WCHAR *literal)
^
src/inc/sstring.inl:418:8: note: candidate expects 2 arguments, 1 provided
src/inc/sstring.inl:401:8: note: candidate: SString::SString(SString::tagUTF8Literal, const UTF8*)
inline SString::SString(tagUTF8Literal dummytag, const UTF8 *literal)
^
src/inc/sstring.inl:401:8: note: candidate expects 2 arguments, 1 provided
src/inc/sstring.inl:382:8: note: candidate: SString::SString(SString::tagLiteral, const CHAR*)
inline SString::SString(tagLiteral dummytag, const ASCII *literal)
* Reorder DLLEXPORT and STDAPI
GNUC wants extern "C" <attribute> format.
* Abstract __FUNCSIG__
* Abstract __debugbreak()
* Move common compiler options out of clang and add Wno-unused-value
* Add paranthesis around || and &&
src/gc/gc.cpp:9084:38: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
(!chosen_power2) && (i < free_space_count));
* Set Wno-delete-non-virtual-dtor for CXX files only
* Don't warn on unterminated endif labels
* Suppress unused functions
* Use 0x syntax rather than h syntax on GNU asm files
* Correct constructor call directly
src/ToolBox/superpmi/superpmi-shared/logging.cpp:301:27: required from here
src/inc/clr_std/string:58:9: error: cannot call constructor ‘std::basic_string<char>::basic_string’ directly
this->basic_string::basic_string(_Ptr, c_len(_Ptr));
* Suppress NULL used in arithmetic warnings
|
|
GCHeapHardLimit - specifies a hard limit for the GC heap
GCHeapHardLimitPercent - specifies a percentage of the physical memory this process is allowed to use
If both are specified, GCHeapHardLimit is checked first and only when it's not specified
would we check GCHeapHardLimitPercent.
If neither is specified but the process is running inside a container with a memory
limit specified, we will take this as the hard limit:
max (20mb, 75% of the memory limit on the container)
If one of the HardLimit configs is specified, and the process is running inside a container
with a memory limit, the GC heap usage will not exceed the HardLimit but the total memory
is still the memory limit on the container so when we calculate the memory load it's based
off the container memory limit.
An example,
process is running inside a container with 200mb limit
user also specified GCHeapHardLimit as 100mb.
if 50mb out of the 100mb is used for GC, and 100mb is used for other things, the memory load
is (50 + 100)/200 = 75%.
Some notes on these configs -
+ The limit is the commit size.
+ This is only supported on 64-bit.
+ For Server GC the minimum *reserved* segment size is 16mb per heap, this is to avoid the
scenario where the hard limit is small but the process can use many procs and we end up
with tiny segments which doesn't make sense. We then keep track of the committed on the segments
so the total does not exceed the hard limit.
|
|
+alloc lock split into SOH and LOH
+provisional mode to fix too many gen2 GCs triggered in low mem situation when the heap has heavy pinning fragmentation
+better free list usage
+premature OOM fixes
+3 new configs: GCHeapAffinitizeMask, GCHighMemPercent, GCLOHThreshold (will be documented)
YieldProcessor scaling factor is different on core due to the different implementation on core.
|
|
|
|
|
|
stubs from local gc (#19500)
* Switch NumaNodeInfo and CPUGroupInfo to the interface
* Remove AppDomain/SystemDomain stubs
* remove EEConfig methods
* Port numa code to the coreclr side
* add numa back to PAL and standalone builds
* enable numa for PAL/Standalone builds, and fix BOOL warnings
* remove unused defines, and fix linux build errors
* building on windows
* about to delete numa work from unix and want a backup
* add stubs for unix numa/cpugroup
* Code review feedback
* Code review feedback
|
|
|
|
Fixes #17716
|
|
Add an api so the VM can tell the GC how long to spin for to normalize across processor families.
|
|
is to be used by BCL for deciding when to trim memory usage in pooling code
|
|
calls in to the runtime (#16765)
* [Local GC] Refactor calls involving thread modes, suspension, and alloc contexts to always operate on current thread
* BOOL -> bool for enable/disable preemptive routines, also remove an unused call to GetThread
* avoid one indirection by having GCToEEInterface::EnablePreemptiveGC return a bool indicating whether the mode was changed
* Callback on IGCHeap for the runtime to notify the GC when it is trapping threads
* use g_fSuspensionPending instead of GCToEEInterface::CatchAtSafePoint for allow_fgc
* Remove CatchAtSafePoint
* Remove GetThread from WaitLongerNoInstru
* code review feedback
* code review feedback
* change BOOL to bool
|
|
Starting the work on latency levels
Current tuning is latency_level_balanced
Added the latency_level_memory_footprint level for optimizing for memory footprint
|
|
* Initial cut, ignoring thread affinity
* Integrate thread affinity
* Affinity for standalone Windows
* Add 'specialness' and the thread name as arguments to CreateThread
* First crack at unified implementation
* Set priority for server GC threads
* Remove unused parameter
* Address code review feedback and remove some dead code that broke the clang build
* Use char* on the interface instead of wchar_t (doesn't play well cross-platform)
* Rename IsGCSpecialThread -> CurrentThreadWasCreatedByGC
* Code review feedback and fix up the build
* rename CurrentThreadWasCreatedByGC -> WasCurrentThreadCreatedByGC
* Fix a contract violation when converting string encodings
* Thread::CreateUtilityThread returns a thread that is not suspended - restarting a non-suspended thread is incorrect
* CreateUnsuspendableThread -> CreateNonSuspendableThread
|
|
* [Local GC] Build the GC using system headers
* Disable features to get the GC to build
* Get rid of the separate 'GC PAL' build
* Remove unused stuff
* Don't build gcenv.os.cpp when linking in a standalone gc
* Stub out CPUGroupInfo and NumaNodeInfo
* Stub out IsGCThread and friends
* Build the GC as a shared library :tada:
* Build, link, and run! :tada:
* Fix standalone GC build break
* Fixes where the GC's MethodTable and VM's MethodTable classes disagree
* Integrate a standalone GC into the CoreCLR build system (so it gets copied to the output folder). Re-enable some ifdef-ed out includes that are required for a non-standalone build of the GC.
* Bring changes to Unix and fix the Unix build. Implement some compiler intrinsic wrappers and alignment functions expected by the GC.
* Fix the Windows build
* 1. Code review feedback: use standard types for BitScanForward and
BitScanForward64
2. Delete FEATURE_COM stuff from the build system, not needed for this
PR
3. Fix the Unix build
* Fix the Windows x86 build - the _BitScanForward64 intrinsic is not available when targeting 32-bit platforms
* Remove a number of things from gcenv.base.h that are not used
* Remove a linker workaround now that we are not linking a standalone GC into the runtime
* Remove dead code, make the lack of GC_PROFILING for standalone gc not break profiling on the non-standalone build
* Code review feedback - use add_library_clr and other cmake-related fixes
* Fix include indentation
* Remove some extraneous parameters to cmake functions (cmake is remarkably lenient...)
|
|
table indices (#13076)
seg size fixes
|
|
* Cleanup GC *_STAT bitrot
* Fixup per review
|
|
* [Arm64/Unix] Support 64K pages
* GC move GCToOSInterface::Initialize() into InitializeGarbageCollector()
|
|
* [Local GC] Skeleton for GC configuration
* Initial tweaks after design feedback:
1) Use string keys instead of enums. Upon receiving a string key,
the EE looks at it and, if it's something that comes from startup
flags, responds using the startup flag information. Otherwise, it
forwards the string onto CLRConfig.
2) Add a mechanism for getting string configuration values from the
EE. This includes adding a RAII wrapper around strings so that they
are freed correctly.
* Remove uses of g_pConfig from the GC and replace with GCConfig
* Use the GCConfig system for the GC log
* Fix poorly-named parameter
* Add documentation and caching of bool and int configs obtained from the EE
* Remove AppDomainLeaks as dead code
* Remove GC trace configs as dead code
* Repair unix build
* Fix an issue where we started the GC in the wrong latency mode
* Fix the unix build
* Pipe GCRetainVM configuration to the GC
* Dead code removal in the GC sample
* EEConfig -> GCConfig for heap verification constants in the GC
* Populate config information for bools and ints eagerly at startup
* Initialize g_theGCToCLR before initializing GCConfig
* Propegate HoardVM config to the GC
* Fix an incorrect comment
|
|
NoGC region. (#11923)
This is just so that I can test with larger SOH seg size to make sure we are not getting AVs.
More perf changes will happen with the NoGC region changes.
|
|
(#11762)
|
|
* [Local GC] Move operations on CLREventStatic to the EE and add their functionality to the interface
* Fix a missed case
* Split GetWaitForGCEvent into two smaller interface methods to avoid exposing the event itself on the interface
* Initial implementation for Unix
* Complete unix implementation
* Make it work on Windows
* Remove redudant methods from GCToEEInterface
* Fix the Linux build
* First part of code review feedback: make GCEvent dispatch statically (Windows)
* Second part of code review feedback: make GCEvent dispatch statically (Unix)
* Standardize implementation across Windows/Unix (apparently MSVC is more lenient about constructor names than clang)
* Address code review feedback: Add Create*Event methods back onto GCEvent and remove them from GCToOSInterface
* Address code review feedback: remove a dead define
* Remove a bad comment, remove an unnecessary friend class, fix some formatting issues
* Fix an issue when initializing a GCEvent on Linux (should not be allocating an Impl in the constructor)
* Fix the same issue on Windows (less bad, just leaks memory instead of asserting)
|
|
|
|
structure did not agree on the location of the heap field (#10576)
|
|
interface (#10463)
* [Local GC] BOOL -> bool on IGCHeap
* [Local GC] size_t -> void* on IGCHeap
* [Local GC] Silence warnings by being explicit about BOOl -> bool conversions
* Address code review feedback: FinalizeAppDomain BOOL -> bool
* Fix warnings
* Address code review feedback:
1) Fix a missed default parameter (FALSE) on a parameter of type bool,
2) Fix invocations of the diagnostic callbacks to use boolean literals
instead of TRUE and FALSE,
3) Fix various invocations of GC interface methods in the VM to use
boolean literals instead of TRUE and FALSE
* Address code review feedback: fix inconsistency
|
|
single-proc scenarios (#10065)
* Remove usage of the generation table from the EE by introducing an
EE-owned GC alloc context used for allocations on single-proc machines.
* Move the GC alloc lock to the EE side of the interface
* Repair the Windows ARM build
* Move the decision to use per-thread alloc contexts to the EE
* Rename the lock used by StartNoGCRegion and EndNoGCRegion to be more indicative of what it is protecting
* Address code review feedback 2 (enumerate the global alloc context as a part of GCToEEInterface)
* Code review feedback (3)
* Address code review feedback (move some GC-internal globals to gcimpl.h and gc.cpp)
* g_global_alloc_lock is a dword, not a qword - fixes a deadlock
* Move GlobalAllocLock to gchelpers.cpp and switch to preemptive mode when spinning
* Repair the Windows x86 build
|
|
1) https://github.com/dotnet/coreclr/issues/6809
2) when we do a minimal GC, we need to maintain the states correctly (sync up tables and clear bricks for the portion that we don't need in a normal GC but do need in minimal GC)
|
|
Implement second-level card tables for non-Windows platforms
|
|
|
|
DAC (#9255)
* [Local GC] Move workstation GC DAC globals to a struct shared between
the GC and the DAC
* (Some) code review feedback and bug fixes for issues found while debugging on OSX
* Address some code review feedback:
1. Make g_gcDacGlobals a pointer and dacvar on the VM side, so
that publishing the GC dac vars is done atomically (through
a pointer assignment). This fixes a race that Noah noticed.
2. Remove the requirement for the GC's generation class struct
to be known at compile-time, by using a dacvar as the size
of the generation class at run-time (for pointer arithmetic)
3. Move all DAC-interesting fields to be at the start of GC
internal classes, so that the DAC does not need to know the
size or exact layout of the class past the fields it cares
about.
* Split the definition of the size of several arrays across the SOS/DAC and GC/DAC interfaces, and add static asserts that they are the same
* Repair the Windows Release build
* Implement the GC DAC scheme for Server GC and eliminate the duplicate GC dac vars
* Some work
* Decouple use of the GC generation table from a write barrier by having the EE store a copy of the global during initialization
* Actually make it work with server GC
* Checkpoint
* Checkpoint where everything works
* Code cleanup
* Fix debugger test failures
* Additional code cleanup
* Address code review feedback by adding a static assert and standardizing the way that we iterate over the generation table
* Repair the Windows x86 build
* Revert "Decouple use of the GC generation table from a write barrier by having the EE store a copy of the global during initialization"
This reverts commit 573f61a16b4fa8c2fc4c568c0b968a921230f31c.
* Revert "Repair the Windows x86 build"
This reverts commit 188c22d87e1d65abf00ab8fa28f46ad607a9028f.
* Partial revert, move `generation_table` back the global namespace for a single-proc allocation helper
* Fix a debugger test failure
* Repair crash dump scenarios
|
|
|
|
GCToEEInterface::StompWriteBarrier (#8605)
* Move Software Write Watch's write barrier updates to use the new
GCToEEInterface::StompWriteBarrier to stomp the EE's write barrier.
* Address code review feedback, move SetCardsAfterBulkCopy to EE side of the interface
|
|
* Decouple write barrier operations between the GC and EE
* Address code review feedback
* Address code review feedback
* Repair the standalone GC build
|
|
|
|
GC's own logging), gcee.cpp and objecthandle.cpp.
The rule is GC will call the diagnostics functions at various points to communicate
info and these diagnostics functions might call back into the GC if they require
initimate knowledge of the GC in order to get certain info. This way gc.cpp does
not need to know about details of the diagnostics components, eg, the profiling context;
and the diagnostics components do not need to know details about the GC. So got rid of
ProfilingScanContext in gcinterface.h and passed scanning functions to GC and handle table.
Got rid of where it goes through things per heap as this is not something that diagnostics
should care about, including going through stack roots per heap (which is only done in
GC for scalability purposes but profiling is doing this on a single thread). This also makes it faster.
Got rid of the checks for gc_low/high in ProfScanRootsHelper as this is knowledge
profiling shouldn't have. And it was also incorrectly not including all interior
pointer stack roots.
|
|
* Re-introduce changes lost in a merge conflict
* Enable GCToOSInterface to be defined behind the GC interface
when building the GC in standalone mode. Provide a skeleton Windows
implementation and the framework for a Unix implementation.
* Address code review feedback
|
|
from within the GC (#7501)
|
|
|
|
modifying the VM to utilize this interface.
Introduce an interface separating the GC and the rest of the VM
Remove static members of both IGCHeap and IGCHeapInternal and move the management of the singular GC heap to the VM.
Rename uses of IGCHeap in the VM to GCHeapHolder, as well as other misc. renames throughout the VM and GC.
Split each interface function into categories, document them, use consistent formatting across the interface
Undo some accidental find/replace collateral damage
Remove all ifdefs from the GC interface
Deduplicate function declarations between IGCHeap and IGCHeapInternal, expose AllocAlign8 through the interface and the reference to alloc_context to repair the ARM build
Paper cut: false -> nullptr
Repair the ARM and x86 builds
Rename GCHeapHolder -> GCHeapUtilities and address documentation feedback
Rebase against master
Rename gcholder.h/cpp -> gcheaputilities.h/cpp
Fix an uninitialized field on alloc_context causing test failures on clang
Rename the include guard for gcheaputilities.h
Un-breaks SOS by making the following changes:
1) Instructs the DAC to look for IGCHeap::gcHeapType by name,
instead of assuming that it exists near g_pGCHeap,
2) Eliminate all virtual calls on IGCHeap in the DAC, since we cannot
dispatch on an object in another process,
3) Because of 2, expose the number of generations past the GC interface
using a static variable on IGCHeap that the DAC can read directly.
repair the Windows build
|