Age | Commit message (Collapse) | Author | Files | Lines |
|
* delete isProfLeaveCB from arm signature
The previous implementation was done many years ago and I do not why it was done that way.
* extract GetSavedSet
* add isNoGCHelper
* delete isNoGC arg
* move declarations closer to their uses
* delete isGc from genEmitCall
* delete unused method declaration.
* add emitNoGChelper that accepts CORINFO_METHOD_HANDLE
* fix missed switch cases
* add function headers
* Fix feedback
* Fix feedback2
|
|
|
|
|
|
* Adding containment support to one-operand scalar HWIntrinsics (x86)
* Adding containment support to two-operand imm HWIntrinsics (x86)
* Adding containment support to three-operand imm HWIntrinsics (x86)
* Updating hwintrinsiccodegenxarch to properly mask Sse41.Insert for TYP_FLOAT
* Updating the Sse41.Insert tests for TYP_FLOAT
* Adding containment support for Sse2.CompareLessThan and BlendVariable (Sse41/Avx/Avx2)
* Fixing `genHWIntrinsic_R_RM_I` to call `emitIns_SIMD_R_R_I`, rather than `emitIns_R_R_I`
* Updating emitOutputSV to not modify the code for IF_RWR_RRD_SRD_CNS
* Cleaning up some of the emitxarch code.
* Moving roundps and roundpd into the IsDstSrcImm check
|
|
comments on related code.
|
|
intrinsics.
|
|
|
|
|
|
Remove JIT LEGACY_BACKEND code
All code related to the LEGACY_BACKEND JIT is removed. This includes all code related to x87 floating-point code generation. Almost 50,000 lines of code have been removed.
Remove legacyjit/legacynonjit directories
Remove reg pairs
Remove tiny instruction descriptors
Remove compCanUseSSE2 (it's always true)
Remove unused FEATURE_FP_REGALLOC
|
|
|
|
And add a new range-check IR for x86 imm-intrinsics.
|
|
|
|
|
|
overloads.
|
|
LoadAlignedVector128 as contained.
|
|
* jit sources: Each local pointer variable must be declared on its own line.
Implement https://github.com/dotnet/coreclr/blob/master/Documentation/coding-guidelines/clr-jit-coding-conventions.md#101-pointer-declarations
Each local pointer variable must be declared on its own line.
* add constGenTreePtr
* delete GenTreePtr
* delete constGenTreePtr
* fix arm
|
|
1. Fix `LEGACY_BACKEND`
2. `#if FEATURE_HW_INTRINSICS` => `#ifdef FEATURE_HW_INTRINSICS`
[tfs-changeset: 1686599]
|
|
|
|
|
|
|
|
LoadScalar intrinsics
|
|
|
|
|
|
intrinsics
|
|
Enable CORINFO_INTRINSIC Round, Ceiling, and Floor to generate ROUNDSS and ROUNDSD
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Removed unused idClsCookie from struct instrDescDebugInfo
Cleaned up several ifdefs
|
|
|
|
|
|
x86, except double->long/ulong conversion on x86
|
|
It's always defined, is always expected to be defined, and the build
doesn't work without it.
Also remove unused `SECURITY_CHECK` and `VERIFY_IMPORTER` defines.
|
|
|
|
|
|
This change implements support for Vector<long>, handling
SIMDIntrinsicInit, which takes a LONG, and decomposition of
SIMDIntrinsicGetItem, which produces a LONG.
It also enables SIMD, including AVX, by default for RyuJIT/x86.
|
|
|
|
There are two two kinds of transition penalties:
1.Transition from 256-bit AVX code to 128-bit legacy SSE code.
2.Transition from 128-bit legacy SSE code to either 128 or
256-bit AVX code. This only happens if there was a preceding
AVX256->legacy SSE transition penalty.
The primary goal is to remove the #1 AVX to SSE transition penalty.
Added two emitter flags: contains256bitAVXInstruction indicates that
if the JIT method contains 256-bit AVX code, containsAVXInstruction
indicates that if the method contains 128-bit or 256-bit AVX code.
Issue VZEROUPPER in prolog if the method contains 128-bit or 256-bit
AVX code, to avoid legacy SSE to AVX transition penalty, this could
happen for reverse pinvoke situation. Issue VZEROUPPER in epilog
if the method contains 256-bit AVX code, to avoid AVX to legacy
SSE transition penalty.
To limite code size increase impact, we only issue VZEROUPPER before
PInvoke call on user defined function if the JIT method contains
256-bit AVX code, assuming user defined function contains legacy
SSE code. No need to issue VZEROUPPER after PInvoke call because #2
SSE to AVX transition penalty won't happen since #1 AVX to SSE
transition has been taken care of before the PInvoke call.
We measured ~3% to 1% performance gain on TechEmPower plaintext and
verified those VTune AVX/SSE events: OTHER_ASSISTS.AVX_TO_SSE and
OTHER_ASSISTS.SSE_TO_AVE have been reduced to 0.
Fix #7240
move setContainsAVX flags to lower, refactor to a smaller method
refactor, fix typo in comments
fix format error
|
|
The encoder was using size_t, a 32-bit type on x86, to accumulate opcode
and prefix bits to emit. AVX support uses 3 bytes for prefixes that are
higher than the 32-bit type can handle. So, change all code byte related types
from size_t to a new code_t, defined as "unsigned __int64" on RyuJIT x86
(there is precedence for this type on the ARM architectures).
Fixes #8331
|
|
Enable use of SSE3_4 instruction set for SIMD codegen.
|
|
|
|
|
|
This change is the result of running clang-tidy and clang-format on jit
sources.
|
|
This change starts the process of updating the jit code to make it ready
for being formatted by clang-format. Changes mostly include reflowing
comments that go past our column limit and moving comments around ifdefs
so clang-format does not modify the indentation. Additionally, some
header files are manually reformatted for pointer alignment and marked
as clang-format off so that we do not lose the current formatting.
|
|
Added method IsMultiRegPassedType and updated IsMultiRegReturnType
Switched these methods to using getArgTypeForStruct and getReturnTypeForStruct
Removed IsRegisterPassable and used IsMultiRegReturned instead.
Converted lvIsMultiregStruct to use getArgTypeForStruct
Renamed varDsc->lvIsMultiregStruct() to compiler->lvaIsMultiregStruct(varDsc)
Skip calling getPrimitiveTypeForStruct when we have a struct larger than 8 bytes
Refactored ReturnTypeDesc::InitializeReturnType
Fixed missing SPK_ByReference case in InitializeReturnType
Fixes for RyiJIt x86 TYP_LONG return types and additional ARM64 work for full multireg support
Added ARM64 guard the uses of MAX_RET_MULTIREG_BYTES with FEATURE_MULTIREG_RET
Fixes for multireg returns in Arm64 Codegen
Added dumping of lvIsMultiRegArg and lvIsMultiRegRet in the assembly output
Added check and set of compFloatingPointUsed to InitializeStructReturnType
Fixes to handle JIT helper calls that say they return a TYP_STRUCT with no class handle available
Placed all of the second GC return reg under MULTIREG_HAS_SECOND_GC_RET ifdefs
Added the Arm64 VM changes from Rahul's PR 5175
Update getArgTypeForStruct for x86/arm32 so that it returns TYP_STRUCT for all pass by value cases
Fixes for the passing of 3,5,6 or 7 byte sized structs
Fix issue on ARM64 where we would back fill into x7 after passing a 16-byte struct on the stack
Implemented register shuffling for multi reg Call returns on Arm64
Fixed regression on Arm32 for struct args that are not multi regs
Updated Tests.Lst with 23 additional passing tests
Changes from codereview feedback
|
|
This assert hit in the encoder when we were trying to generate an
INS_shl_N with a constant of 1, instead of using the special xarch
INS_shl_1 encoding, which saves a byte. It turns out, the assert was
and in fact amd64 does generate the suboptimal encoding currently.
The bad code occurs in the RMW case of genCodeForShift(). It turns out
that function is unnecessarily complex, unique (it doesn't use the
common RMW code paths), and has a number of other latent bugs.
To fix this, I split genCodeForShift() by leaving the non-RMW case
there, and adding a genCodeForShiftRMW() function just for the RMW case.
I rewrote the RMW case to use the existing emitInsRMW functions.
Other related cleanups along the way:
1. I changed emitHandleMemOp to consistently set the idInsFmt field,
and changed all callers to stop pre-setting or post-setting this field.
This makes the API much easier to understand. I added a big header
comment for the function. Now, it takes a "basic" insFmt (using ARD,
AWR, or ARW forms), which might be munged to a MRD/MWR/MRW form
if necessary.
2. I changed some places to always use the most derived GenTree type
for all uses. For instance, if the code has
"GenTreeIndir* mem = node->AsIndir()", then always use "mem" from then
on, and don't use "node". I changed some functions to take more derived
GenTree node types.
3. I rewrote the emitInsRMW() functions to be much simpler, and rewrote
their header comments.
4. I added GenTree predicates OperIsShift(), OperIsRotate(), and
OperIsShiftOrRotate().
5. I added function genMapShiftInsToShiftByConstantIns() to encapsulate
mapping from INS_shl to INS_shl_N or INS_shl_1 based on a constant.
This code was in 3 different places already.
6. The change in assertionprop.cpp is simply to make JitDumps readable.
In addition to fixing the bug for RyuJIT/x86, there are a small number
of x64 diffs where we now generate smaller encodings for shift by 1.
|