Age | Commit message (Collapse) | Author | Files | Lines |
|
* Fix PAL_GetLogicalProcessorCacheSizeFromOS on mac
In a previous PR
(https://github.com/dotnet/coreclr/commit/ed52a006c01a582d4d34add40c318d6f324b99ba#diff-8447e54277bb962d167a77bb260760d7R1879),
GetCacheSizePerLogicalCpu was changed to no longer rely on cpuid on
amd64 systems; instead it uses GetLogicalProcessorCacheSizeFromOS().
Unfortunately that function consisted of a number of `#if`s, none of
which were active on macs, and we just returned 0. This caused us to
default to a gen0size of only 0.25MB, causing many GCs.
Fixed by adding a new case that uses `sysctlbyname`.
Fix #24658
* Fixes from code review
* Check for function sysctlbyname instead of header
|
|
* Convert C++ standard settings and warning options from CMAKE_<LANG>_FLAGS to Modern CMake isms.
* More $<COMPILE_LANGUAGE> generator expressions instead of CMAKE_CXX_FLAGS.
* Use $<COMPILE_LANGUAGE:CXX> for all -fpermissive usage
* Fix generator expression that generates multiple flags
* Fix invalid use of CMAKE_CXX_FLAGS instead of CMAKE_C_FLAGS.
* Treat AppleClang as though it is Clang (match pre-3.0 behavior).
* Update our build system to understand that AppleClang is distinct from Clang and remove CMP0025 policy setting.
* PR Feedback.
|
|
* Improve fatal err msg
* Match SO format
* a
* Remove dead define
* More adjustments to ex msg
* And PAL
* Remove special case for SOE
* Remove excess Debug.Assert newline
* Remove excess newline
* typo
* New format
* Remove DebugProvider redundancy
* Adjustments
* Remove preceding newline
* Make other SOE and OOM consistent
* Tidy up assertion msg
* Fix missing newline after inner exception divider
* CR when no inner exception
* ToString never CR terminated
* disable corefx tests temporarily
|
|
test case run at (#24844)
least 4 to 5 times faster than before.
Fallback to old transport ReadMemory if /proc/<pid>/mem can't be opened. This happens
on attach because of permissions/access, but works fine on the launch (the most
important case).
|
|
* default value is 1, and when set to 0 will disable loading LTTng.
Co-Authored-By: Jan Kotas <jkotas@microsoft.com>
|
|
* Fix initial thread affinity on Linux
On Linux, a new thread inherits the affinity mask of the thread
that created the new thread. This is a problem for background GC
threads that are created by one of the server GC threads that are
affinitized to single core.
This change adds resetting each new thread affinity to match the
current process affinity.
In addition to that, I've also fixed the extraction of the CPU count
that was using PID 0. While the doc says that 0 represents current process,
it in fact means current thread.
And as a small bonus, I've added caching of the value returned by
the PAL_GetLogicalCpuCountFromOS, since it cannot change during runtime.
|
|
While looking at the strace of `corerun` built with logging and
debugging information, I was amazed at the number of gettid() calls it
was making. While system calls are cheap, they're still not free;
cache this number in the thread local storage area. Adds a branch, but
it's just a comparison with 0, so it's fine in comparison.
|
|
Fixes #21009
|
|
|
|
Fix CPUSET_T definition for FreeBSD
|
|
unicodedata.cpp based on UnicodeData.txt v11.0.
|
|
|
|
|
|
* Generate eventpipe implementation as part of CMake configure.
* Generate Etw provider as part of CMake configure.
* First pass porting over lttng provider to cmake.
* Fix up CMake Lttng provider generation.
* Move Lttng provider into CMake tree.
* Move dummy event provider to CMake
* Move genEventing into the CMake tree.
* Remove extraneous logging and unused python locator.
* Clean up build.sh
* Clean up genEventingTests.py
* Add dependencies to enable more incremental builds (providers not fully incremental).
* Convert to custom command and targets instead of at configure time.
* Get each eventing target to incrementally build.
* Fix incremental builds
* Add missing dependencies on eventing headers.
* PR Feedback. Mark all generated files as generated
* Clean up eventprovider test CMakeLists
|
|
|
|
Add the DiagnosticProtocolHelper class to deserialize and dispatch
the new GenerateCoreDump command.
Refactor the PAL createdump launch on unhandled exception code to
used by a new PAL_GenerateCoreDump method that doesn't depend on
the complus dump environment variables.
Changed the "full" createdump not to include the uncommitted pages and
removed the "add module metadata" workaround for SOS clrstack !UNKNOWN
problem now that is fixed in SOS (crashinfo.cpp).
|
|
* Fixing up time.cpp in the PAL
* Fixing GetTickCount64 in the PAL to continue using CLOCK_MONOTONIC_COARSE where available
* Reverting QueryPerformanceFrequency in the PAL to return tccSecondsToNanoSeconds for CLOCK_MONOTONIC
* Removing two unused variables from GetTickCount64 in the PAL
* Updating the PAL to error if neither mach_absolute_time nor clock_gettime(CLOCK_MONOTONIC) are supported.
* Fixing the PAL configure.cmake to link rt where applicable
|
|
returned by gettimeofday
This allows IBC profile data to record a meaningful time of when the training scenario was run.
Made EPOCH_DIFF a defined constant
Change calcTime to be an unsigned 64-bit integer
Change constants to units of 100NS instead of NS to avoid division and integer overflows.
Use the defined constants SECS_TO_100NS and USECS_TO_100NS when performing time calculations
Don't add a space after the Assembly arg when argc is zero
|
|
|
|
|
|
The PAL_SetCurrentThreadAffinity was incorrectly adding the specified processor
to the current thread affinity set instead of setting the affinity to only
the processor specified.
It was causing significant performance hit in aspnet benchmarks on machines with
many cores.
This change crept in when I was refactoring the related code while removing
CPU groups emulation.
|
|
|
|
The list size was set to g_SystemInfo.dwNumberOfProcessors which is a
number of processors the current process is allowed to run on, but not
the total number of processors in the system. Fixed to use
PAL_GetTotalCpuCount.
Also revert a change to the mbind node mask length computation I've
incorrectly made in my last commit and make it clear that the value is
a number of used bits in the node mask, which is the highest numa node
plus 1. And finally, re-reading the mbind doc, I've found that the
maxnode parameter is in fact "number of nodes" in the mask, so fixing
that too.
|
|
* Fix build on OSX and Linux machines without NUMA installed - there were
couple of places where I was missing ifdefs
* Fix bug in nodeMaskLength computation
* Remove testing change in eeconfig.cpp that has leaked into the PR
* Fix GCToOSInterface::GetTotalProcessorCount for embedded GC to return
all processors on the system, not just the ones enabled for the current
process.
|
|
This change removes CPU groups emulation from Unix PAL and modifies the
GC and thread pool code accordingly.
|
|
In case you would have UINT32_MAX - 1 CPUs, you would round up to return UINT32_MAX CPUs.
|
|
* Round up the value of the CPU limit
In the case where `--cpus` is set to a value very close to the smaller
integer (ex: 1.499999999), it would previously be rounded down. This
would mean that the runtime would only try to take advantage of 1 CPU in
this example, leading to underutilization.
By rounding it up, we augment the pressure on the OS threads scheduler,
but even in the worst case scenario (`--cpus=1.000000001` previously
being rounded to 1, now rounded to 2), we do not observe any
overutilization of the CPU leading to performance degradation.
* Teach the ThreadPool of CPU limits
By making sure we do take the CPU limits into account when computing the
CPU busy time, we ensure we do not have the various heuristic of the
threadpool competing with each other: one trying to allocate more
threads to increase the CPU busy time, and the other one trying to
allocate less threads because there adding more doesn't improve the
throughput.
Let's take the example of a system with 20 cores, and a docker container
with `--cpus=2`. It would mean the total CPU usage of the machine is
2000%, while the CPU limit is 200%. Because the OS scheduler would never
allocate more than 200% of its total CPU budget to the docker container,
the CPU busy time would never get over 200%. From `PAL_GetCpuBusyTime`,
this would indicate that we threadpool threads are mostly doing non-CPU
bound work, meaning we could launch more threads.
|
|
* Fix invalid use of stack memory
|
|
This focuse on better supporting `--cpuset-cpus` which limits the number of processors we have access to on the CPU; it also specifies which specific processor we have access to, but that’s irrelevant here
The work has been done here for all runtime components except `Environment.ProcessorCount`. The work consist in fixing `PAL_GetLogicalCpuCountFromOS` to use `sched_getaffinity`.
Fixes https://github.com/dotnet/coreclr/issues/22302
|
|
Integer Conversion issues
|
|
also fixing the LocalGC standalone case on Linux
|
|
|
|
Fix conversion, unknown pragmas and Visibility Issues for GCC
|
|
(#23106)
This is the first commit to enable a "diagnostic port" using IPC (Named Pipe on Windows and Unix Domain Socket on other platforms). This change currently enable EventPipe to be enabled/disabled without the use of a file drop.
- Split the DiagnosticsIpc into (DiagnosticsIpc/IpcStream)
- DiagnosticsIpc (IPC listener) is meant to be used by the Diagnostic server.
- IpcStream (IPC channel) is meant to be use to communicate with the connected client.
- Change the FastSerializer dependency from `CFileStream` to `StreamWriter`
This abstraction is meant decouple the writing of objects in order to extend its usability.
The main objective is to reuse FastSerializer to stream data through the open IPC channel.
- Moved the EventPipeSessionProvider* classes to their own file.
- Added a more streamlined parsing achievable by defining a simpler binary protocol (by noahfalk).
1. Only one allocation is needed for the EventPipeProviderConfiguration array, no allocations or copies are needed for strings because we can refer to them directly out of the incoming command buffer
2. No change to the EventPipe API for enable is required. EventPipeProviderConfiguration retains its current behavior of not deleting the string pointers it holds.
3. No leaks happen because the command buffer owns the string memory and ensures that it stays alive for the duration of the Enable() call.
|
|
|
|
The function was incorrectly assuming that shifting 64 bit
constant 1 by 64 bits to the left gets result 0.
|
|
The function was returning mask not expected by runtime for coreclr
built with NUMA enabled on machines with multiple NUMA nodes.
The mask was 0 in case the current process was affinitized to CPUs
from multiple NUMA nodes. It was following MSDN doc, however the
doc turned out to be ambiguous. Moreover, the runtime depended
on the fact that on Windows, a process is never run on multiple
NUMA nodes unless is explicitly calls APIs to set ideal processor
for threads. But on Unix, there is no such case and by default, a
process is affinitized to all existing processors over all NUMA
nodes.
And there was one more issue. The GetProcessAffinityMask was returning
a mask within a single CPU group, which is a transformed view of
NUMA node on Windows. So the CPU indices in the mask didn't necessarily
correspond to the native Unix CPU indices. But, the SetThreadAffinityMask
was using the native Unix CPU indices.
To fix the problem, the GetProcessAffinityMask is changed so that
it always returns a mask corresponding to the native Unix CPU indices
(reporting upto 64 processors as the mask is 64 bit wide). Thus it
corresponds to what the SetThreadAffinityMask expects. And it also exactly
matches the behavior when NUMA support is not compiled in.
Moreover, the COMPlus_GCHeapAffinitizeMask bits now correspond to
the native Unix CPU indices.
The GetProcessAffinityMask is used by GC and thread pool only when
NUMA is not enabled using the COMPlus_GCCpuGroup env variable.
|
|
Fix no-return false positives in static analyzer build
|
|
GCC doesn't like attributes before the extern keyword.
|
|
There were about 800 false positive issues in the clang status analyzer
build caused by the fact that various forms of asserts were not considered
by the analyzer as not returning.
This change adds __attribute__((analyzer_noreturn)) (wrapped in a macro) to
those assertion functions.
|
|
Improve gcc configuration
|
|
|
|
|
|
* Use `find_path` instead of `check_include_files` for lttng.
* `locate_gcc_exec gcc` to `locate_gcc_exec link` for `gcc_link`
* Remove unused `DCMAKE_OBJCOPY`
* Fix all warnings in gen-buildsys-gcc.sh reported by shellchecker.
|
|
Setting the init_count to 0 in the PALCommonCleanup was causing
intermittent crashes in the GC stress C tests on Linux with
SIGILL. The reason is that the signal handlers do nothing and call a
previous handler in case the PAL is not initialized, which is indicated
by the init_count being zero.
To fix that issue, I have removed the init_count zeroing at process
exit, since the PAL and runtime is still capable of handling the
signals.
|
|
|
|
* Fix visibility and signed comparison issues for GCC
* Fix hidden _CLRDataCreateInstance warned by macOS
* Fix indentation in vswprintf/test1
* Change void* to PVOID in implementation files
|
|
|
|
|
|
* Declare throw only when compiling for c++
Prevent the definition from getting defined multiple times and
map it to throw() only when compiling c++ code.
* Suppress warnings for tests
Suppress:
-Wno-write-strings
-Wno-sign-compare
-Wno-narrowing
-fpermissive
-Wno-int-to-pointer-cast
to allow tests to compile
* Add gcc option to build.sh script
Following clangx.y model add -gccx.y command line
arguments with gcc5 and gcc7 being the currnetly supported
options.
* Allow environment variable to be used for TOOLCHAIN
Remove CLANG specific compiler options as well.
* Hide non-GNU compiler options
* Do not include local directory if cross compiling
[ 0%] Building CXX object src/pal/src/eventprovider/tracepointprovider/CMakeFiles/coreclrtraceptprovider.dir/__/lttng/traceptprovdotnetruntime.cpp.o
cc1plus: error: include location "/usr/local/include" is unsafe for cross-compilation [-Werror=poison-system-directories]
* Suppress unknown pragma warnings
src/pal/src/exception/seh-unwind.cpp:37:0:
warning: ignoring #pragma clang diagnostic [-Wunknown-pragmas]
#pragma clang diagnostic pop
Removing these cause compilation error on clang7 and arm as follows:
In file included from /bin/obj/Linux.arm.Debug/src/pal/src/libunwind/include/libunwind.h:9:
/src/pal/src/libunwind/include/libunwind-arm.h:247:9: error: empty struct has size 0 in C, size 1 in C++ [-Werror,-Wextern-c-compat]
typedef struct unw_tdep_save_loc
^
/src/pal/src/libunwind/include/libunwind-arm.h:288:9: error: empty struct has size 0 in C, size 1 in C++ [-Werror,-Wextern-c-compat]
typedef struct
* plt not useful for GNU and ARM64/ARM
src/pal/src/arch/arm64/callsignalhandlerwrapper.S: Assembler messages:
src/pal/src/arch/arm64/callsignalhandlerwrapper.S:31: Error: unexpected characters following instruction at operand 1 -- `bl signal_handler_worker@plt'
src/pal/src/arch/arm64/callsignalhandlerwrapper.S:32: Error: unexpected characters following instruction at operand 1 -- `bl signal_handler_worker@plt'
* Remove double const from argv in PAL_Initialize
Seeing compilation error with GNU for C source files as follows:
if (PAL_Initialize(argc, argv) != 0)
^
src/pal/tests/palsuite/common/palsuite.h:21:0,
from src/pal/tests/palsuite/c_runtime/asinhf/test1/test1.c:18:
src/pal/inc/pal.h:374:1: note: expected ‘const char * const*’ but argument is of type ‘char **’
* Suppress format warnings using GNU for libunwind
warning: format ‘%li’ expects argument of type ‘long int’, but argument 3 has type ‘int’ [-Wformat=]
Debug (4, " aligned frame, offset %li\n", f->cfa_reg_offset);
* Fix -fpermissive warnings for GNU
* Suppress unused variable warning in libunwind
src/pal/src/libunwind/include/libunwind-aarch64.h:201:5: warning: right-hand operand of comma expression has no effect [-Wunused-value]
#define unw_tdep_getcontext(uc) (({ \
~~~~~~~~~
unw_tdep_context_t *unw_ctx = (uc); \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
register uint64_t *unw_base asm ("x0") = (uint64_t*) unw_ctx->uc_mcontext.regs; \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__asm__ __volatile__ ( \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"stp x0, x1, [%[base], #0]\n" \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"stp x2, x3, [%[base], #16]\n" \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"stp x4, x5, [%[base], #32]\n" \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"stp x6, x7, [%[base], #48]\n" \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"stp x8, x9, [%[base], #64]\n" \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"stp x10, x11, [%[base], #80]\n" \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"stp x12, x13, [%[base], #96]\n" \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"stp x14, x13, [%[base], #112]\n" \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"stp x16, x17, [%[base], #128]\n" \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"stp x18, x19, [%[base], #144]\n" \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"stp x20, x21, [%[base], #160]\n" \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"stp x22, x23, [%[base], #176]\n" \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"stp x24, x25, [%[base], #192]\n" \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"stp x26, x27, [%[base], #208]\n" \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"stp x28, x29, [%[base], #224]\n" \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"str x30, [%[base], #240]\n" \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"mov x1, sp\n" \
~~~~~~~~~~~~~~~~
"stp x1, x30, [%[base], #248]\n" \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: [base] "+r" (unw_base) : : "x1", "memory"); \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
}), 0)
* Fix warning: ‘memset’ used with length equal to number of elements warning
Fix similar warnings to these by including the element size into total size
calculation.
src/pal/tests/palsuite/miscellaneous/SetEnvironmentVariableW/test1/test.cpp: In function ‘int main(int, char**)’:
src/pal/tests/palsuite/miscellaneous/SetEnvironmentVariableW/test1/test.cpp:89:31: warning: ‘memset’ used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]
memset(NewValue,0,BUF_SIZE);
* Define CLR specific compiler option CLR_CMAKE_COMPILER
By the time toolchain.cmake is called, the compiler detection from
cmake is not active. We need an intermediate definition to pass
to compiler detection.
|