summaryrefslogtreecommitdiff
path: root/src/jit/gtlist.h
AgeCommit message (Collapse)AuthorFilesLines
2017-02-25Refactor isContained().Pat Gavlin1-5/+5
This changes isContained to terminate early if it is not possible for the node in question to ever be contained. There are many nodes that are statically non-containable: any node that is not a value may not be contained along with a small number of value nodes. To avoid the need for a `switch` to identify the latter, their node kinds have been marked with a new flag, `GTK_NOCONTAIN`. The checks for identifying non-containable nodes are encapsulated withing a new function, `canBeContained`.
2017-01-20CommentsMike Danes1-0/+7
2017-01-17Add GT_TEST_EQ and GT_TEST_NEMike Danes1-0/+4
2017-01-10Updates based on PR review.Carol Eidt1-1/+1
2017-01-10Fix handling of PutArgStkCarol Eidt1-2/+2
GT_PUTARG_STK doesn't produce a value, so it should have the GTK_NOVALUE flag set. Although the dstCount was being set to zero by the parent call, localDefUse was also being set, causing a register to be allocated. Fixing this produces a number of improvements due to reuse of constant registers that were otherwise unnecessarily "overwritten" by the localDefUse. Also on x86, GT_LONG shouldn't be used to pass a long, since GT_LONG should always be a value-producing node. Instead, use the existing GT_FIELD_LIST approach.
2016-11-18Reinstate the struct optimization changes:Carol Eidt1-0/+2
Remove many of the restrictions on structs that were added to preserve behavior of the old IR form. Change the init block representation to not require changing the parent when a copy block is changed to an init. In addition, fix the bug that was causing the corefx ValueTuple tests to fail. This was a bug in assertion prop where it was not invaliding the assertions about the parent struct when a field is modified. Add a repro test case for that bug. Incorporate the fix to #7954, which was encountered after the earlier version of these changes. This was a case where we reused a struct temp, and then it wound up eventually getting assigned the value it originally contained, so we had V09 = V09. This hit the assert in liveness where it wasn't expecting to find a struct assignment with the same lclVar on the lhs and rhs. This assert should have been eliminated with the IR change to struct assignments, but when this situation arises we may run into issues if we call a helper that doesn't expect the lhs and rhs to be the same. So, change morph to eliminate such assignments.
2016-11-03Revert "Enable optimization of structs"Jan Kotas1-2/+0
2016-10-20Enable optimization of structsCarol Eidt1-0/+2
Remove many of the restrictions on structs that were added to preserve behavior of the old IR form. Change the init block representation to not require changing the parent when a copy block is changed to an init.
2016-09-20Clean up GenTree node size dumping code. (#7278)Peter Kukol1-0/+2
Clean up GenTreeXxxx struct size dump; fix 32-bit build when MEASURE_NODE_SIZE is enabled.
2016-09-20Support GT_OBJ for x86Carol Eidt1-1/+2
Add support for GT_OBJ for x86, and allow them to be transformed into a list of fields (in morph) if it is a promoted struct. Add a new list type for this (GT_FIELD_LIST) with the type and offset, and use it for the multireg arg passing as well for consistency. Also refactor fgMorphArgs so that there is a positive check for reMorphing, rather than relying on gtCallLateArgs, which can be null if there are no register args. In codegenxarch, modify the struct passing (genPutStructArgStk) to work for both the x64/ux and x86 case, including the option of pushing fields onto the stack. Eliminate the redundant INS_movs_ptr, and replace with the pre-existing INS_movsp.
2016-09-16Add optimization for shift by CNS_INTMichelle McDaniel1-0/+10
This change adds support for shifting by a GT_CNS_INT without going through a helper. If the shiftOp is a GT_CNS_INT we do several transformations based on the shift amount: If the shift amount is 0, the shift is a nop, so we just put together the hi and lo ops as a GT_LONG. If the shift amount is < 32, we generate a shl/shld pattern, a shr/shrd pattern or a sar/shrd pattern, depending on the oper. The first operand of the shrd/shld is a GT_LONG, which we crack in codegen, using it essentially as two int operands, rather than creating a tri op GenTree node (essentially so that we can have 3 operands, instead of the normal two). If the shift amount is 32, it differs between shifting left and shifting right. For GT_LSH, we move the loOp into the hiResult and set the loResult to 0. For GT_RSZ, we move the hiOp into the loResult, and set the hiResult to 0. For GT_RSH, we move the hiOp into the loResult, and set the hiResult to a 31 bit signed shift of the hiOp to sign extend. If the shift amount is less than 64, but larger than 32: for GT_LSH, the hiResult is a shift of the loOp by shift amount - 32 (the move from lo into hi is the 32 bit shift). We set the loResult to 0. For GT_RSH and GT_RSZ, the loResult is a right shift (signed for GT_RSH) of the hiOp by shift amount - 32. The hiResult is 0 for GT_RSZ, and a 31 bit signed shift of hiOp1 for GT_RSH. If the shift amount is >= 64, we set both hiResult and loResult to 0 for GT_LSH and GT_RSZ, and do a sign extend shift to set hiResult and loResult to the sign of the original hiOp for GT_RSH.
2016-09-15Option for reporting GenTree operator bashing stats (#7152)Peter Kukol1-162/+158
* Add option (off by default) to report GenTree operator bashing stats.
2016-09-14Address PR feedback.Pat Gavlin1-0/+3
2016-09-14Introduce GT_JCC.Pat Gavlin1-0/+1
This node represents a jump that is conditional upon the value stored in the target's condition code register. It is only valid in the backend. No formal modeling of the CCR is performed, so its use must be constrained such that instructions that def the CCR are not inserted between the JCC node and the node that it expected to def the CCR. This is currently only used when lowering compares of long-typed values for x86.
2016-09-07Enable long multiplyMichelle McDaniel1-4/+14
Most long multiplies are converted in morph to helper calls. However, GT_MULs of the form long = (long)int * (long)int are passed through to be handled in codegen. In decompose, we convert these multiplies to GT_MUL_HIs to be handled in a similar manner at GT_MULHI. Since mul and imul take two ints and return a long in edx:eax, we can handle them similarly to GT_CALLs, where we save them to lclvars if they aren't already saved. In codegen, we generate a mul (or imul for signed multiply), which produces the output in edx:eax, and then we store those locations when we handle the storelclvar.
2016-09-01More PR FeedbackCarol Eidt1-2/+2
2016-09-011st Class Struct Block AssignmentsCarol Eidt1-10/+11
Change block ops to assignments, with block nodes (GT_BLK, GT_OBJ and GT_DYN_BLK) as the lhs, and with GT_STORE_* in the backend. For this initial change, existing behavior is preserved as much as possible, with a few differences in lea generation for SIMD types. This causes pessimization in some areas; those, as well as the additional opportunities that can be enabled after this change, have all been marked TODO-1stClassStructs.
2016-08-31Fix #6893. (#6932)Pat Gavlin1-1/+1
Aggregate arguments to calls (i.e. arguments that are made up of multiple, independent values such as those to be passed in multiple registers) are represented by GT_LIST nodes that are not distiguishable from normal GT_LIST nodes without the extra context of the call. GT_LIST nodes, however, are disallowed in LIR ranges. The intersection of these two design choices causes problems for code that operates on LIR but wants to observe aggregate arguments as a whole (rather than piecewise). In the case of #6893, this caused lowering to attempt to insert a PUTARG_STK node after the GT_LIST node that represented the aggregate argument that was being forced to the stack, which caused an assertion in the LIR utilities to fail (the GT_LIST node was passed as the insertion point to `LIR::Range::InsertAfter`, which requires that the insertion point be a part of the given range). Conceptually, the fix is simple: the nodes that represent aggregate arguments must be allowed to be part of LIR ranges. The implementation, however, is complicated somewhat by the need to distinguish the nodes used for aggregate arguments from normal GT_LIST nodes. These changes implement a short-term fix that distiguished these nodes by adding a node-specific flag, `GTF_LIST_AGGREGATE`, to indicate that a GT_LIST node is in fact an aggregate argument. In the long term, a more palatable implementation would be the introduction of a new node and operator type to represent aggregate arguments, but this is likely to require broader changes to the compiler.
2016-08-19Implement the proposed design for RyuJIT's LIR. (#6689)Pat Gavlin1-80/+82
These changes implement the design for RyuJIT's LIR described in https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/removing-embedded-statements.md. The following passes required changes: Rationalize, which has been almost completely rewritten Long decomposition Target-independent lowering Target-dependent lowering LSRA Liveness Flowgraph optimization Codegen For the most part, these changes are confined to the backend. Common code that needed to be updated included liveness, flowgraph optimization, and a few miscellaneous utilities. The utilities used to analyze and manipulate LIR live (almost) entirely in src/jit/lir.{cpp,h}. The core concepts that are unique to LIR are LIR::Use and LIR::Range. The latter is a tuple that captures an SDSU def (i.e. an LIR node) and its corresponding use->def edge and user. The former serves to abstract a self-contained sequence of LIR nodes that make up e.g. the contents of a basic block. Testing indicates that neither JIT throughput nor code quality are significantly impacted by these changes.
2016-08-08Work towards objects stack allocation: moved allocation part of ↵Egor Chesakov1-0/+2
newobj-lowering into separate phase 1. Introduced `GT_ALLOCOBJ` node to mark places where object allocation happens 2. In `importer.cpp` changed lowering of allocation part of newobj instruction from an allocation helper call to a `GT_ALLOCOBJ` node creation 3. Created new phase `ObjectAllocator` (`PHASE_ALLOCATE_OBJECTS`) and put it right after dominator computing (we will need the information for escape analysis) 4. Current implementation of ObjectAllocator walks over all basic blocks having flag `BBF_HAS_NEWOBJ` set and replaces `GT_ALLOCOBJ` with an allocation helper call
2016-07-29Massage code for clang-formatMichelle McDaniel1-0/+2
This change starts the process of updating the jit code to make it ready for being formatted by clang-format. Changes mostly include reflowing comments that go past our column limit and moving comments around ifdefs so clang-format does not modify the indentation. Additionally, some header files are manually reformatted for pointer alignment and marked as clang-format off so that we do not lose the current formatting.
2016-07-06Introduce GT_ADD_LO and GT_SUB_LOMike Danes1-0/+2
2016-05-11Small tweaks to RyuJIT/x86 LONG handlingBruce Forstall1-6/+6
1. Display GT_LONG trees in JitDump as gt_long, not just "long". This helps distinguish them from the pervasive "long" type, making them easier to see and search for. 2. Change the "hi" operator GT_* node definition to not include GTK_EXOP. 3. Dump the lva table after lvaPromoteLongVars().
2016-03-251stClassStructs: Replace GT_LDOBJ with GT_OBJCarol Eidt1-1/+1
In preparation for using block nodes in assignments, change GT_LDOBJ to GT_OBJ. Also, eliminate gtFldTreeList, which was only being used in a transitory fashion for x87 codegen - instead create the nodes on the fly as needed for stack fp codegen. Additional minor cleanup.
2016-02-26Fixes GT_MOD codegen and three other issues for ARM64Brian Sullivan1-2/+2
Fix Issue 3263 - Incorrect codegen for rem.un - Unisgned Remainder Fix Issue 3317 - Improve the codegen for divide so that we omit checks when we have a constant divisor Fix Issue 3326 - assert EA_4BYTE or EA_8BYTE Fix an assert when using alloca with allocation between 65 and 4095 bytes Also fixed the name for GT_UDIV and GT_UDIV to contain a "un-" prefix. Added 10 additional passing tests with tag REL_PASS
2016-01-27Update license headersdotnet-bot1-4/+3
2015-12-11Port of all JIT changes for .NET Framework 4.6.1 changesBrian Sullivan1-124/+130
http://blogs.msdn.com/b/dotnet/archive/2015/11/30/net-framework-4-6-1-is-now-available.aspx .NET Framework list of changes in 4.6.1 https://github.com/Microsoft/dotnet/blob/master/releases/net461/dotnet461-changes.md Additional changes including - Working ARM64 JIT compiler - Additional JIT Optimizations o Tail call recursion optimization o Array length tracking optimization o CSE for widening casts o Smaller encoding for RIP relative and absolute addresses in addressing modes o Tracked Local Variable increased to 512 o Improved handling of Intrinsics System.GetType() o Improved handling of Math intrinsics - Work for the X86 Ryu-JIT compiler [tfs-changeset: 1557101]
2015-10-21 Generate efficient code for rotation patterns.Eugene Rozenfeld1-0/+2
This change adds code to recognize rotation idioms and generate efficient instructions for them. Two new operators are added: GT_ROL and GT_ROR. The patterns recognized: (x << c1) | (x >>> c2) => x rol c1 (x >>> c1) | (x << c2) => x ror c2 where c1 and c2 are constant and c1 + c2 == bitsize(x) (x << y) | (x >>> (N - y)) => x rol y (x >>> y) | (x << (N - y)) => x ror y where N == bitsize(x) (x << y & M1) | (x >>> (N - y) & M2) => x rol y (x >>> y & M1) | (x << (N - y) & M2) => x ror y where N == bitsize(x) M1 & (N - 1) == N - 1 M2 & (N - 1) == N - 1 For a simple benchmark with 4 rotation patterns in a tight loop time goes from 7.324 to 2.600 (2.8 speedup). Rotations found and optimized in mscorlib: System.Security.Cryptography.SHA256Managed::RotateRight System.Security.Cryptography.SHA384Managed::RotateRight System.Security.Cryptography.SHA512Managed::RotateRight System.Security.Cryptography.RIPEMD160Managed:MDTransform (320 instances!) System.Diagnostics.Tracing.EventSource.Sha1ForNonSecretPurposes::Rol1 System.Diagnostics.Tracing.EventSource.Sha1ForNonSecretPurposes::Rol5 System.Diagnostics.Tracing.EventSource.Sha1ForNonSecretPurposes::Rol30 System.Diagnostics.Tracing.EventSource.Sha1ForNonSecretPurposes::Drain (9 instances of Sha1ForNonSecretPurposes::Rol* inlined) Closes #1619.
2015-01-30Initial commit to populate CoreCLR repo dotnet-bot1-0/+239
[tfs-changeset: 1407945]