summaryrefslogtreecommitdiff
path: root/src
diff options
context:
space:
mode:
authorAndy Ayers <andya@microsoft.com>2017-07-31 13:01:01 -0700
committerGitHub <noreply@github.com>2017-07-31 13:01:01 -0700
commit9c844a9f2a1a40fca1766ccc3a3c5e146743cb13 (patch)
treed57dfb129ba010627de610f5bdf5cc549f356760 /src
parent6b38dca32dee8321dafab8be92366d17da2b8bec (diff)
downloadcoreclr-9c844a9f2a1a40fca1766ccc3a3c5e146743cb13.tar.gz
coreclr-9c844a9f2a1a40fca1766ccc3a3c5e146743cb13.tar.bz2
coreclr-9c844a9f2a1a40fca1766ccc3a3c5e146743cb13.zip
JIT: Fix value type box optimization (#13016)
Boxing a value type produces a non-null result. If the result of the box is only used to feed a compare against null, the jit tries to optimize the box away entirely since the result of the comparison is known. Such idiomatic expressions arise fairly often in generics instantiated over value types. In the current implementation the box expands into two parts. The first is an upstream statement to allocate a boxed object and assign a reference to the boxed object to a local var known as the "box temp". The second is an expression tree whose value is the box temp that also contains an an encapsulated copy from the value being boxed to the payload section of the boxed object. The box node also contains a pointer back to the first statement (more on this later). In the examples being discussed here this second tree is a child of a compare node whose other child is a null pointer. When the optimization fires, the upstream allocation statement is located via the pointer in the box node and removed, and the entire compare is replaced with a constant 0 or 1 as appropriate. Unfortunately the encapsulated copy in the box subtree may include side effects that should be preserved, and so this transformation is unsafe. Note that the copy subtree as a whole will always contain side effects, since the copy is storing values into the heap, and that copy now will not happen. But the side effects that happen when producing the value to box must remain. In the initial example from #12949 the side effects in question were introduced by the jit's optimizer to capure a CSE definition. #13016 gives several other examples where the side effects are present in the initial user code. For instance the value being boxed might come from an array, in which case the encapsulated copy in the box expression tree would contain the array null check and bounds check. So removing the entire tree can alter behavior. This fix attempts to carefully preserve the important side effects by reworking how a box is imported. The copy is now moved out from under the box into a second upstream statement. The box itself is then just a trivial side-effect-free reference to the box temp. To ensure proper ordering of side effects the jit spills the evaluation stack before appending the copy statement. When the optimization fires the jit removes the upstream heap allocation as before, as well as the now-trivial compare tree. It analyzes the source side of the upstream copy. If it is side effect free, the copy is removed entirely. If not, the jit modifies the copy into a minimal load of the boxed value, and this load should reproduce the necessary side effects. The optimization is only performed when the tree shape of the copy matches expected patterns. There are some expected cases where the tree won't match, for instance if the optimization is invoked while the jit is inlining. Because this optimization runs at several points the jit can catch these cases once inlining completes. There is one case that is not handled that could be -- if the assignment part of the copy is itself a subtree of a comma. This doesn't happen often. The optimization is now also extended to handle the case where the comparision operation is `cgt.un`. This doesn't catch any new cases but causes the optimization to happen earlier, typically during importation, which should reduce jit time slightly. Generally the split of the box into two upstream statements reduces code size, especially when the box expression is incorporated into a larger tree -- for example a call. However in some cases where the value being boxed comes from an array, preserving the array bounds check now causes loop cloning to kick in and increase code size. Hence the overall size impact on the jit-diff set is essentially zero. Added a number of new test cases showing the variety of situations that must be handled and the need to spill before appending the copy statement. Fixes #12949.
Diffstat (limited to 'src')
-rw-r--r--src/jit/gentree.cpp167
-rw-r--r--src/jit/gentree.h13
-rw-r--r--src/jit/importer.cpp15
3 files changed, 176 insertions, 19 deletions
diff --git a/src/jit/gentree.cpp b/src/jit/gentree.cpp
index 28849da2db..ed0f2279e1 100644
--- a/src/jit/gentree.cpp
+++ b/src/jit/gentree.cpp
@@ -7454,7 +7454,8 @@ GenTreePtr Compiler::gtCloneExpr(
case GT_BOX:
copy = new (this, GT_BOX)
- GenTreeBox(tree->TypeGet(), tree->gtOp.gtOp1, tree->gtBox.gtAsgStmtWhenInlinedBoxValue);
+ GenTreeBox(tree->TypeGet(), tree->gtOp.gtOp1, tree->gtBox.gtAsgStmtWhenInlinedBoxValue,
+ tree->gtBox.gtCopyStmtWhenInlinedBoxValue);
break;
case GT_INTRINSIC:
@@ -12032,32 +12033,168 @@ GenTreePtr Compiler::gtFoldExprSpecial(GenTreePtr tree)
switch (oper)
{
-
case GT_EQ:
case GT_NE:
+ case GT_GT:
// Optimize boxed value classes; these are always false. This IL is
// generated when a generic value is tested against null:
// <T> ... foo(T x) { ... if ((object)x == null) ...
if (val == 0 && op->IsBoxedValue())
{
- // Change the assignment node so we don't generate any code for it.
+ // The tree under the box must be side effect free
+ // since we drop it if we optimize the compare.
+ assert(!gtTreeHasSideEffects(op->gtBox.gtOp.gtOp1, GTF_SIDE_EFFECT));
+ // grab related parts for the optimization
GenTreePtr asgStmt = op->gtBox.gtAsgStmtWhenInlinedBoxValue;
assert(asgStmt->gtOper == GT_STMT);
- GenTreePtr asg = asgStmt->gtStmt.gtStmtExpr;
- assert(asg->gtOper == GT_ASG);
+ GenTreePtr copyStmt = op->gtBox.gtCopyStmtWhenInlinedBoxValue;
+ assert(copyStmt->gtOper == GT_STMT);
#ifdef DEBUG
if (verbose)
{
- printf("Bashing ");
- printTreeID(asg);
- printf(" to NOP as part of dead box operation\n");
+ printf("\nAttempting to optimize BOX(valueType) %s null\n", GenTree::OpName(oper));
gtDispTree(tree);
+ printf("\nWith assign\n");
+ gtDispTree(asgStmt);
+ printf("\nAnd copy\n");
+ gtDispTree(copyStmt);
}
#endif
+
+ // We don't expect GT_GT with signed compares, and we
+ // can't predict the result if we do see it, since the
+ // boxed object addr could have its high bit set.
+ if ((oper == GT_GT) && !tree->IsUnsigned())
+ {
+ JITDUMP(" bailing; unexpected signed compare via GT_GT\n");
+ goto FAIL;
+ }
+
+ // If we don't recognize the form of the assign, bail.
+ GenTreePtr asg = asgStmt->gtStmt.gtStmtExpr;
+ if (asg->gtOper != GT_ASG)
+ {
+ JITDUMP(" bailing; unexpected assignment op %s\n", GenTree::OpName(asg->gtOper));
+ goto FAIL;
+ }
+
+ // If we don't recognize the form of the copy, bail.
+ GenTree* copy = copyStmt->gtStmt.gtStmtExpr;
+ if (copy->gtOper != GT_ASG)
+ {
+ // GT_RET_EXPR is a tolerable temporary failure.
+ // The jit will revisit this optimization after
+ // inlining is done.
+ if (copy->gtOper == GT_RET_EXPR)
+ {
+ JITDUMP(" bailing; must wait for replacement of copy %s\n", GenTree::OpName(copy->gtOper));
+ }
+ else
+ {
+ // Anything else is a missed case we should
+ // figure out how to handle. One known case
+ // is GT_COMMAs enclosing the GT_ASG we are
+ // looking for.
+ JITDUMP(" bailing; unexpected copy op %s\n", GenTree::OpName(copy->gtOper));
+ }
+ goto FAIL;
+ }
+
+ // If the copy is a struct copy, make sure we know how to isolate
+ // any source side effects.
+ GenTreePtr copySrc = copy->gtOp.gtOp2;
+
+ // If the copy source is from a pending inline, wait for it to resolve.
+ if (copySrc->gtOper == GT_RET_EXPR)
+ {
+ JITDUMP(" bailing; must wait for replacement of copy source %s\n",
+ GenTree::OpName(copySrc->gtOper));
+ goto FAIL;
+ }
+
+ bool hasSrcSideEffect = false;
+ bool isStructCopy = false;
+
+ if (gtTreeHasSideEffects(copySrc, GTF_SIDE_EFFECT))
+ {
+ hasSrcSideEffect = true;
+
+ if (copySrc->gtType == TYP_STRUCT)
+ {
+ isStructCopy = true;
+
+ if ((copySrc->gtOper != GT_OBJ) && (copySrc->gtOper != GT_IND) && (copySrc->gtOper != GT_FIELD))
+ {
+ // We don't know how to handle other cases, yet.
+ JITDUMP(" bailing; unexpected copy source struct op with side effect %s\n",
+ GenTree::OpName(copySrc->gtOper));
+ goto FAIL;
+ }
+ }
+ }
+
+ // Proceed with the optimization
+ //
+ // Change the assignment expression to a NOP.
+ JITDUMP("\nBashing NEWOBJ [%06u] to NOP\n", dspTreeID(asg));
asg->gtBashToNOP();
- op = gtNewIconNode(oper == GT_NE);
+ // Change the copy expression so it preserves key
+ // source side effects.
+ JITDUMP("\nBashing COPY [%06u]", dspTreeID(copy));
+
+ if (!hasSrcSideEffect)
+ {
+ // If there were no copy source side effects just bash
+ // the copy to a NOP.
+ copy->gtBashToNOP();
+ JITDUMP(" to NOP\n");
+ }
+ else if (!isStructCopy)
+ {
+ // For scalar types, go ahead and produce the
+ // value as the copy is fairly cheap and likely
+ // the optimizer can trim things down to just the
+ // minimal side effect parts.
+ copyStmt->gtStmt.gtStmtExpr = copySrc;
+ JITDUMP(" to scalar read via [%06u]\n", dspTreeID(copySrc));
+ }
+ else
+ {
+ // For struct types read the first byte of the
+ // source struct; there's no need to read the
+ // entire thing, and no place to put it.
+ assert(copySrc->gtOper == GT_OBJ || copySrc->gtOper == GT_IND || copySrc->gtOper == GT_FIELD);
+ copySrc->ChangeOper(GT_IND);
+ copySrc->gtType = TYP_BYTE;
+ copyStmt->gtStmt.gtStmtExpr = copySrc;
+ JITDUMP(" to read first byte of struct via modified [%06u]\n", dspTreeID(copySrc));
+ }
+
+ // Set up the result of the compare.
+ int compareResult = 0;
+ if (oper == GT_GT)
+ {
+ // GT_GT(null, box) == false
+ // GT_GT(box, null) == true
+ compareResult = (op1 == op);
+ }
+ else if (oper == GT_EQ)
+ {
+ // GT_EQ(box, null) == false
+ // GT_EQ(null, box) == false
+ compareResult = 0;
+ }
+ else
+ {
+ assert(oper == GT_NE);
+ // GT_NE(box, null) == true
+ // GT_NE(null, box) == true
+ compareResult = 1;
+ }
+ op = gtNewIconNode(compareResult);
+
if (fgGlobalMorph)
{
if (!fgIsInlining())
@@ -12070,9 +12207,15 @@ GenTreePtr Compiler::gtFoldExprSpecial(GenTreePtr tree)
op->gtNext = tree->gtNext;
op->gtPrev = tree->gtPrev;
}
- fgSetStmtSeq(asgStmt);
+
+ if (fgStmtListThreaded)
+ {
+ fgSetStmtSeq(asgStmt);
+ fgSetStmtSeq(copyStmt);
+ }
return op;
}
+
break;
case GT_ADD:
@@ -12240,7 +12383,9 @@ GenTreePtr Compiler::gtFoldExprSpecial(GenTreePtr tree)
break;
}
- /* The node is not foldable */
+/* The node is not foldable */
+
+FAIL:
return tree;
diff --git a/src/jit/gentree.h b/src/jit/gentree.h
index 33d6b203fb..365dca6b7b 100644
--- a/src/jit/gentree.h
+++ b/src/jit/gentree.h
@@ -2965,9 +2965,16 @@ struct GenTreeBox : public GenTreeUnOp
// This is the statement that contains the assignment tree when the node is an inlined GT_BOX on a value
// type
GenTreePtr gtAsgStmtWhenInlinedBoxValue;
-
- GenTreeBox(var_types type, GenTreePtr boxOp, GenTreePtr asgStmtWhenInlinedBoxValue)
- : GenTreeUnOp(GT_BOX, type, boxOp), gtAsgStmtWhenInlinedBoxValue(asgStmtWhenInlinedBoxValue)
+ // And this is the statement that copies from the value being boxed to the box payload
+ GenTreePtr gtCopyStmtWhenInlinedBoxValue;
+
+ GenTreeBox(var_types type,
+ GenTreePtr boxOp,
+ GenTreePtr asgStmtWhenInlinedBoxValue,
+ GenTreePtr copyStmtWhenInlinedBoxValue)
+ : GenTreeUnOp(GT_BOX, type, boxOp)
+ , gtAsgStmtWhenInlinedBoxValue(asgStmtWhenInlinedBoxValue)
+ , gtCopyStmtWhenInlinedBoxValue(copyStmtWhenInlinedBoxValue)
{
}
#if DEBUGGABLE_GENTREE
diff --git a/src/jit/importer.cpp b/src/jit/importer.cpp
index 47f5106355..c2c491a35e 100644
--- a/src/jit/importer.cpp
+++ b/src/jit/importer.cpp
@@ -5260,7 +5260,7 @@ void Compiler::impImportAndPushBox(CORINFO_RESOLVED_TOKEN* pResolvedToken)
gtNewArgList(op2));
}
- /* Remember that this basic block contains 'new' of an array */
+ /* Remember that this basic block contains 'new' of an object */
compCurBB->bbFlags |= BBF_HAS_NEWOBJ;
GenTreePtr asg = gtNewTempAssign(impBoxTemp, op1);
@@ -5302,11 +5302,16 @@ void Compiler::impImportAndPushBox(CORINFO_RESOLVED_TOKEN* pResolvedToken)
op1 = gtNewAssignNode(gtNewOperNode(GT_IND, lclTyp, op1), exprToBox);
}
- op2 = gtNewLclvNode(impBoxTemp, TYP_REF);
- op1 = gtNewOperNode(GT_COMMA, TYP_REF, op1, op2);
+ // Spill eval stack to flush out any pending side effects.
+ impSpillSideEffects(true, (unsigned)CHECK_SPILL_ALL DEBUGARG("impImportAndPushBox"));
- // Record that this is a "box" node.
- op1 = new (this, GT_BOX) GenTreeBox(TYP_REF, op1, asgStmt);
+ // Set up this copy as a second assignment.
+ GenTreePtr copyStmt = impAppendTree(op1, (unsigned)CHECK_SPILL_NONE, impCurStmtOffs);
+
+ op1 = gtNewLclvNode(impBoxTemp, TYP_REF);
+
+ // Record that this is a "box" node and keep track of the matching parts.
+ op1 = new (this, GT_BOX) GenTreeBox(TYP_REF, op1, asgStmt, copyStmt);
// If it is a value class, mark the "box" node. We can use this information
// to optimise several cases: