summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--Documentation/design-docs/GuardedDevirtualization.md533
-rw-r--r--Documentation/design-docs/ThreeClassesDevirt.JPGbin0 -> 111024 bytes
-rw-r--r--Documentation/design-docs/ThreeClassesDevirtDetail.JPGbin0 -> 140846 bytes
-rw-r--r--Documentation/design-docs/ThreeClassesDevirtFull.JPGbin0 -> 113061 bytes
-rw-r--r--Documentation/design-docs/ThreeClassesInterface.JPGbin0 -> 91430 bytes
-rw-r--r--Documentation/design-docs/TwoClassesBaseline.JPGbin0 -> 80619 bytes
-rw-r--r--Documentation/design-docs/TwoClassesDevirt.JPGbin0 -> 84273 bytes
-rw-r--r--Documentation/design-docs/TwoClassesInterface.JPGbin0 -> 56361 bytes
-rw-r--r--src/jit/compiler.cpp6
-rw-r--r--src/jit/compiler.h48
-rw-r--r--src/jit/flowgraph.cpp706
-rw-r--r--src/jit/gentree.cpp20
-rw-r--r--src/jit/gentree.h34
-rw-r--r--src/jit/importer.cpp420
-rw-r--r--src/jit/inline.cpp12
-rw-r--r--src/jit/inline.h34
-rw-r--r--src/jit/jitconfigvalues.h9
-rw-r--r--src/jit/lclvars.cpp5
-rw-r--r--src/jit/morph.cpp64
19 files changed, 1642 insertions, 249 deletions
diff --git a/Documentation/design-docs/GuardedDevirtualization.md b/Documentation/design-docs/GuardedDevirtualization.md
new file mode 100644
index 0000000000..ba1f7bf920
--- /dev/null
+++ b/Documentation/design-docs/GuardedDevirtualization.md
@@ -0,0 +1,533 @@
+# Guarded Devirtualization
+
+## Overview
+
+Guarded devirtualization is a proposed new optimization for the JIT in .NET Core
+3.0. This document describes the motivation, initial design sketch, and highlights
+various issues needing further investigation.
+
+## Motivation
+
+The .NET Core JIT is able to do a limited amount of devirtualization for virtual
+and interface calls. This ability was added in .NET Core 2.0. To devirtualize
+the JIT must be able to demonstrate one of two things: either that it knows the
+type of some reference exactly (say because it has seen a `newobj`) or that the
+declared type of the reference is a `final` class (aka `sealed`). For virtual
+calls the JIT can also devirtualize if it can prove the method is marked as `final`.
+
+However, most of the time the JIT is unable to determine exactness or finalness
+and so devirtualization fails. Statistics show that currently only around 15% of
+virtual call sites can be devirtualized. Result are even more pessimistic for
+interface calls, where success rates are around 5%.
+
+There are a variety of reasons for this. The JIT analysis is somewhat weak.
+Historically all the JIT cared about was whether some location held **a** reference
+type, not a specific reference type. So the current type propagation has been
+retrofitted and there are places where types just get lost. The JIT analysis
+happens quite early (during importation) and there is only minimal ability to do
+data flow analysis at this stage. So for current devirtualization the source of
+the type information and the consumption must be fairly close in the code. A
+more detailed accounting of some of the shortcomings can be found in
+[CoreCLR#9908](https://github.com/dotnet/coreclr/issues/9908).
+
+Resolution of these issues will improve the ability of the JIT to devirtualize,
+but even the best analysis possible will still miss out on many cases. Some call
+sites are truly polymorphic. Some others are truly monomorphic but proving this
+would require sophisticated interprocedural analyses that are not practical in
+the JIT or in a system as dynamic as the CLR. And some sites are monomorphic in
+practice but potentially polymorphic.
+
+As an alternative, when devirtualization fails, the JIT can perform *guarded
+devirtualization*. Here the JIT creates an `if-then-else` block set in place of
+a virtual or interface call and inserts a runtime type test (or similar) into
+the `if` -- the "guard". If the guard test succeeds the JIT knows the type of
+the reference, so the `then` block can directly invoke the method corresponding
+to that type. If the test fails then the `else` block is executed and this
+contains the original virtual or interface call.
+
+The upshot is that the JIT conditionally gains the benefit of devirtualization at
+the expense of increased code size, longer JIT times, and slightly longer code
+paths around the call. So long as the JIT's guess at the type is somewhat
+reasonable, this optimization can improve performance.
+
+## Opportunity
+
+One might imagine that the JIT's guess about the type of the reference has to be
+pretty good for devirtualization to pay off. Somewhat surprisingly, at least based
+on our initial results, that is not the case.
+
+### Virtual Calls: The Two-Class Case
+
+Given these class declarations:
+```C#
+class B
+{
+ public virtual int F() { return 33; }
+}
+
+class D : B
+{
+ public override int F() { return 44; }
+}
+```
+Suppose we have an array `B[]` that is randomly filled with instances of `B` and
+`D` and each element is class `B` with probability `p`. We time how long
+it takes to invoke `F` on each member of the array (note the JIT will not ever
+be able to devirtualize these calls), and plot the times as a function of `p`.
+The result is something like the following:
+
+![two classes baseline perf](TwoClassesBaseline.JPG)
+
+Modern hardware includes an indirect branch target predictor and we can see it
+in action here. When the array element type is predictable (`p` very close to
+zero or very close to 1) performance is better. When the element type is
+unpredictable (`p` near 0.5) performance is quite a bit worse.
+
+From this we can see that a correctly predicted virtual call requires about
+19 time units and worst case incorrect prediction around 55 time units. There is
+some timing overhead here too so the real costs are a bit lower.
+
+Now imagine we update the JIT to do guarded devirtualization and check if the
+element is indeed type `B`. If so the JIT can call `B.F` directly and in our
+prototype the JIT will also inline the call. So we would expect that if the
+element types are mostly `B`s (that is if `p` is near 1.0) we'd see very good
+performance, and if the element type is mostly `D` (that is `p` near 0.0)
+performance should perhaps slightly worse than the un-optimized case as there is
+now extra code to run check before the call.
+
+![two classes devirt perf](TwoClassesDevirt.JPG)
+
+However as you can see the performance of devirtualized case (blue line) is as
+good or better than the un-optimized case for all values of `p`. This is perhaps
+unexpected and deserves some explanation.
+
+Recall that modern hardware also includes a branch predictor. For small or large
+values of `p` this predictor will correctly guess whether the test added by the
+JIT will resolve to the `then` or `else` case. For small values of `p` the JIT
+guess will be wrong and control will flow to the `else` block. But unlike the
+original example, the indirect call here will only see instances of type `D` and
+so the indirect branch predictor will work extremely well. So the overhead for
+the small `p` case is similar to the well-predicted indirect case without guarded
+devirtualization. As `p` increases the branch predictor starts to mispredict and
+that costs some cycles. But when it mispredicts control reaches the `then` block
+which executes the inlined call. So the cost of misprediction is offset by the
+faster execution and the cost stays relatively flat.
+
+As `p` passes 0.5 the branch predictor flips its prediction to prefer the `then`
+case. As before mispredicts are costly and send us down the `else` path but there
+we still execute a correctly predicted indirect call.
+
+And as `p` approaches 1.0 the cost falls as the branch predictor is almost always
+correct and so the cost is simply that of the inlined call.
+
+So oddly enough the guarded devirtualization case shown here does not require any
+sort of perf tradeoff. The JIT is better off guessing the more likely case but
+even guessing the less likely case can pay off and doesn't hurt performance.
+
+One might suspect at this point that the two class case is a special case and that
+the results do not hold up in more complex cases. More on that shortly.
+
+Before moving on, we should point out that virtual calls in the current
+CLR are a bit more expensive than in C++, because the CLR uses a two-level method
+table. That is, the indirect call sequence is something like:
+```asm
+000095 mov rax, qword ptr [rcx] ; fetch method table
+000098 mov rax, qword ptr [rax+72] ; fetch proper chunk
+00009C call qword ptr [rax+32]B:F():int:this ; call indirect
+```
+This is a chain of 3 dependent loads and so best-case will require at least 3x
+the best cache latency (plus any indirect prediction overhead).
+
+So the virtual call costs for the CLR are high. The chunked method table design
+was adopted to save space (chunks can be shared by different classes) at the
+expense of some performance. And this apparently makes guarded devirtualization
+pay off over a wider range of class distributions than one might expect.
+
+And for completeness, the full guarded `if-then-else` sequence measured above is:
+```asm
+00007A mov rcx, gword ptr [rsi+8*rcx+16] ; fetch array element
+00007F mov rax, 0x7FFC9CFB4A90 ; B's method table
+000089 cmp qword ptr [rcx], rax ; method table test
+00008C jne SHORT G_M30756_IG06 ; jump if class is not B
+
+00008E mov eax, 33 ; inlined B.F
+000093 jmp SHORT G_M30756_IG07
+
+G_M30756_IG06:
+
+000095 mov rax, qword ptr [rcx] ; fetch method table
+000098 mov rax, qword ptr [rax+72] ; fetch proper chunk
+00009C call qword ptr [rax+32]B:F():int:this ; call indirect
+
+G_M30756_IG07:
+```
+Note there is a redundant load of the method table (hidden in the `cmp`) that
+could be eliminated with a bit more work on the prototype. So guarded
+devirtualization perf could potentially be even better than is shown above,
+especially for smaller values of `p`.
+
+### Virtual Calls: The Three-Class Case
+
+Now to return to the question we asked above: is there something about the two
+class case that made guarded devirtualization especially attractive? Read on.
+
+Suppose we introduce a third class into the mix and repeat the above measurement.
+There are now two probabilities in play: `p`, the probability that the element
+has class `B`, and `p1`, the probability that the element has class `D`, and
+there is a third class `E`. To avoid introducing a 3D plot we'll first simply
+average the results for the various values of `p1` and plot performance as a
+function of `p`:
+
+![three classes devirt perf](ThreeClassesDevirt.JPG)
+
+The right-hand side (`p` near 1.0) looks a lot like the previous chart. This is
+not surprising as there are relatively few instances of that third class. But the
+middle and left hand side differ and are more costly.
+
+For the un-optimized case (orange) the difference is directly attributable to
+the performance of the indirect ranch predictor. Even when `p` is small there
+are still two viable branch targets (on average) and some some degree of indirect
+misprediction.
+
+For the optimized case we now see that guarded devirtualization performs worse
+than no optimization if the JIT's guess is completely wrong. The penalty is not
+that bad because the JIT-introduced branch is predictable. But even at very
+modest values of `p` guarded devirtualization starts to win out.
+
+Because we've averaged over `p1` you might suspect that we're hiding something.
+The following chart shows the min and max values as well as the average, and also
+shows the two-class result (dashed lines).
+
+![three classes devirt perf ranges](ThreeClassesDevirtFull.JPG)
+
+You can see the minimum values are very similar to the two class case; these
+are cases where the `p1` is close to 0 or close to 1. And that makes sense because
+if there really are only two classes despite the potential of there being three
+then we'd expect to see similar results as in the case where there only can be
+two classes.
+
+And as noted above, if `p` is high enough then the curves also converge to the
+two class case, as the relative mixture of `D` and `E` is doesn't matter: the
+predominance of `B` wins out.
+
+For low values of `p` the actual class at the call site is some mixture of `D`
+and `E`. Here's some detail (the x axis now shows `p1` and `p` as upper and
+lower values respectively).
+
+![three classes devirt perf detail](ThreeClassesDevirtDetail.JPG)
+
+The worst case for perf for both is when the mixture of `D` and `E` is
+unpredictably 50-50 and there are no `B`s. Once we mix in just 10% of `B` then
+guarded devirt performs better no matter what distribution we have for the other
+two classes. Worst case overhead -- where the JIT guesses a class that never
+appears, and the other classes are evenly distributed -- is around 20%.
+
+So it seems reasonable to say that so long as the JIT can make a credible guess
+about the possible class -- say a guess that is right at least 10% of the time
+-- then there is quite likely a performance benefit to guarded
+devirtualization for virtual calls.
+
+We'll need to verify this with more scenarios, but these initial results are
+certainly encouraging.
+
+### Virtual Calls: Testing for Multiple Cases
+
+One might deduce from the above that if there are two likely candidates the JIT
+should test for each. This is certainly a possibility and in C++ compilers that
+do indirect call profiling there are cases where multiple tests are considered
+a good idea. But there's also additional code size and another branch.
+
+This is something we'll look into further.
+
+### Interface Calls: The Two Class Case
+
+Interface calls on the CLR are implemented via [Virtual Stub Dispatch](
+https://github.com/dotnet/coreclr/blob/master/Documentation/botr/virtual-stub-dispatch.md
+) (aka VSD). Calls are made through an indirection cell that initially points
+at a lookup stub. On the first call, the interface target is identified from the
+object's method table and the lookup stub is replaced with a dispatch stub that
+checks for that specific method table in a manner quite similar to guarded
+devirtualization.
+
+If the method table check fails a counter is incremented, and once the counter
+reaches a threshold the dispatch stub is replaced with a resolve stub that looks
+up the right target in a process-wide hash table.
+
+For interface call sites that are monomorphic, the VSD mechanism (via the dispatch
+stub) executes the following code sequence (here for x64)
+```asm
+; JIT-produced code
+;
+; set up R11 with interface target info
+mov R11, ... ; additional VSD info for call
+mov RCX, ... ; dispatch target object
+cmp [rcx], rcx ; null check (unnecessary)
+call [addr] ; call indirect through indir cell
+
+; dispatch stub
+cmp [RCX], targetMT ; check for right method table
+jne DISPATCH-FAIL ; bail to resolve stub if check fails (uses R11 info)
+jmp targetCode ; else "tail call" the right method
+```
+
+At first glance it might appear that adding guarded devirtualization on top of
+VSD may not provide much benefit for monomorphic sites. However the guarded
+devirtualization test doesn't use an indirection cell and doesn't require R11
+setup, may be able to optimize away the null check, and opens the door for
+inlining. So it should be slightly cheaper on average and significantly cheaper
+in some cases.
+
+(Note [CoreCLR#1422](https://github.com/dotnet/coreclr/issues/14222) indicates
+we should be able to optimize away the null check in any case).
+
+If the guarded tests fails we've filtered out one method table the dispatch cell
+now works well even if a call site alternates between two classes. So we'd expect
+the combination of guarded devirtualization and VSD to perform well on the two
+class test and only show limitations when faced with mixtures of three or more
+classes.
+
+If the guard test always fails we have the up-front cost for the vtable fetch
+(which should amortize pretty well with the subsequent fetch in the) stub plus
+the predicted not taken branch. So we'd expect the cost for the two-class cases
+where the JIT's prediction is always wrong to be a bit higher).
+
+The graph below shows the measured results. To makes sure we're not overly impacted
+by residual VSD state we use a fresh call site for each value of p. The solid
+orange line is the current cost. The dashed orange line is the corresponding cost
+for a virtual call with the same value of p. The solid blue line is the cost with
+an up-front guarded test. As noted there is some slowdown when the JIT always
+guesses the wrong class, but the break-even point (not shown) is at a relatively
+small probability of a correct guess.
+
+![two classes interface devirt](TwoClassesInterface.JPG)
+
+### Interface Calls: The Three Class Case
+
+As with virtual calls you may strongly suspect the two class case for interface
+calls is special. And you'd be right.
+
+If we mix a third class in as we did above, we see similar changes in the
+performance mix for interface calls, as seen below. But also as with virtual calls
+the JIT's guess doesn't have to be all that good to see payoffs. At around 10%
+correct, guessing wins on average, and around 30% correct guessing is always a
+perf win.
+
+![three classes interface devirt](ThreeClassesInterface.JPG)
+
+### Delegate Speculation
+
+While we have been discussing this topic in the context of virtual calls, the
+method is general and can be applied to indirect calls as well. Here the guard
+test may just test for a particular function rather than a type.
+
+`Delegate.Invoke` is a special method that can eventually turns into an indirect
+call. The JIT could speculate about the possible target of this call. Choosing
+a good target here would require some kind of indirect call profiling.
+
+### Calli Speculation
+
+Indirect calls also arise via the `calli` opcode. As with delegates, choosing a
+target here likely requires specialized profiling.
+
+### Costs
+
+Given the optimistic take on performance, it is important to remember that
+there are also some costs involved to guarded devirtualization: increased code
+size and increased JIT time. There may also be some second-order effects on
+the local code generation as we've introduced control flow into the method where
+it didn't exist previously.
+
+A naive implementation that aggressively performs guarded devirtualization
+increases code size overall by about 5% as measured by PMI. JIT time increase
+was not measured but should be in that same ballpark. Some assemblies see code
+size increasing by as much as 12%.
+
+However the guarded devirtualization only kicks in for about 15% of the methods.
+So the average relative size increase in a method with virtual calls is probably
+more like 33%.
+
+There may be some inefficiencies in the current prototype that can be fixed to
+reduce the code size impact. Aside from the extra method table fetch noted above
+the duplicated calls have the same sets of arguments and so we might be able to
+amortize argument evaluation costs better. And there are some complexities around
+handling return values (especially for implicit by-reference structures) that
+likewise might be able to be tightened up.
+
+Nevertheless, blindly optimizing all virtual calls with guarded devirtualization
+is not likely the right approach. Something more selective is almost certainly
+needed.
+
+However we have done code-expanding optimizations somewhat blindly before, and
+we could contain the size growth risk by restricting this optimization to Tier1.
+Also PMI can overstate size impact seen in real scenarios as it may over-count
+the impact of changes in methods that are always inlined. So we should look at
+size increases from some actual scenarios.
+
+And perhaps I'll look at the size impact of loop cloning as a precedent.
+
+## Implementation Considerations
+
+To get the data above and a better feel for the challenges involved we have
+implemented a prototype. It is currently located on this branch:
+[GuardedDevirtFoundations](https://github.com/AndyAyersMS/coreclr/tree/GuardedDevirtFoundations).
+
+The prototype can introduce guarded devirtualization for some virtual and
+interface calls. It supports inlining of the directly invoked method. It uses
+the JIT's "best known type" as the class to predict. It also anticipates being
+able to query the runtime for implementing classes of an interface.
+
+### Phase Ordering
+
+For the most part, devirtualization is done very early on in the JIT, during
+importation. This allows devirtualized calls to subsequently be inlined, and for
+devirtualization of call sites in inlinees to take advantage of type information
+propagating down into the inlinee from inlined arguments.
+
+We want those same properties to hold for guarded devirtualization candidates.
+So conceptually the transformation should happen in the same place. However it is
+not possible to introduce new control flow in the importer (ignoring for the moment
+the possibility of using question ops). So the actual transformation must be
+deferred until sometime after the importer runs and before the inliner runs.
+
+This deferral is a bit problematic as some key bits of importer state are needed
+to query the runtime about the properties of a call target. So if we defer the
+transformation we need to somehow capture the data needed for these queries and
+make it available later. The current prototype uses (abuses?) the inline
+candidate information for this. As part of this we require that all speculative
+devirtualization sites be treated as inline candidates, at least initially.
+This has the side effect of hoisting the call to be a top level (statement)
+expression and introduces a return value placeholder.
+
+We currently already have a similar transformation in the JIT, the "fat calli"
+transformation needed on CoreRT. This transformation runs at the right time --
+after the importer and before the inliner -- and introduces the right kind of
+`if-then-else` control flow structure. So the thought is to generalize this to
+handle guarded devirtualization as well.
+
+### Recognition
+
+In the prototype, candidates are recognized during the initial importer driven
+call to `impDevirtualizeCall`. If the only reason devirtualization fails is lack
+of exactness, then the call is marked as a guarded devirtualization candidate.
+
+### Devirtualization
+
+To produce the direct call the prototype updates the `this` passed in the `then`
+version of the call so it has the exact predicted type. It then re-invokes
+`impDevirtualizeCall` which should now succeed as the type is now exactly
+known. The benefit of reuse here is that certain special cases of devirtualization
+are now more likely to be handled.
+
+### Inline Candidacy
+
+The prototype currently sets up all virtual and interface calls as potential
+inline candidates. One open question is whether it is worth doing guarded
+devirtualization simply to introduce a direct call. As an alternative we could
+insist that the directly called method also be something that is potentially
+inlineable. One can argue that call overhead matters much more for small methods
+that are also likely good inline candidates.
+
+The inline candidate info is based on the apparent method invoked at the virtual
+site. This is the base method, the one that introduces the virtual slot. So if we
+speculatively check for some class and that class overrides, we need to somehow
+update the inline info. How to best do this is unclear.
+
+### Return Values
+
+Because the candidate calls are handled as inline candidates, the JIT hoists the
+call to a top level expression (which is good) during importation and introduces
+a return value placeholder into the place the call occupied in its original tree.
+(Oddly we introduce return value placeholders for some calls that don't return a
+a value -- we should fix this). The placeholder points back at the call.
+
+When we split the call into two calls we can't keep this structure intact as there
+needs to be a 1-1 relationship between call and placeholder. So the prototype
+needs to save the return value in a new local and then update the placeholder to
+refer to that local. This can be tricky because in some cases we haven't yet settled
+on what the actual type of the return value is.
+
+The handling of return values in the early stages of the JIT (arguably, in the entire
+JIT) is quite messy. The ABI details bleed through quite early and do so somewhat
+unevenly. This mostly impacts methods that return structures as different ABIs have
+quite different conventions, and the IR is transformed to reflect those conventions
+at different times for un-inlined calls, inlineable calls that end up not getting
+inlined, and for calls that get inlined. In particular, structures that are small
+enough to be returned by value (in a register or set of registers) need careful
+handling.
+
+The prototype skips over such by-value-returning struct methods today. Some of
+the logic found in `fgUpdateInlineReturnExpressionPlaceHolder` needs to be pulled
+in to properly type the call return value so we can properly type the temp. Or
+perhaps we could leverage some of importer-time transformations that are done for
+the fat calli cases.
+
+For larger structs we should arrange so that the call(s) write their return values
+directly into the new temp, instead of copying the value from wherever they
+return it into a temp, to avoid one level of struct copy. Doing so may require
+upstream zero init of the return value struct and this should only happen in one
+place.
+
+## Open Issues
+
+Here are some of the issues that need to be looked into more carefully.
+
+### Policy
+
+- what is the best mechanism for guessing which class to test for?
+ - instrument Tier0 code?
+ - look at types of arguments?
+ - ask runtime for set of known classes?
+ - harvest info from runtime caches (VSD)?
+ - add instrumenting Tier1 to collect data and Tier2 to optimize?
+- is there some efficient way to test for class ranges? Currently the JIT is
+doing an exact type test. But we really care more about what method is going to
+be invoked. So if there is a range of types `D1...DN` that all will invoke some
+particular method can we test for them all somehow?
+- or should we test the method after the method lookup (possibly worse tradeoff
+because of the chunked method table arrangement, also tricky as a method can
+have multiple addresses over time. Since many types can share a chunk this
+might allow devirtualization over a wider set of classes (good) but we'd lose
+knowledge of exact types (bad). Not clear how these tradeoffs play out.
+- interaction of guarded devirt with VSD? For interface calls we are sort of
+inlining the first level of the VSD into the JITted code.
+- revocation or reworking of the guard if the JIT's prediction turns out to bad?
+- improve regular devirtualization to reduce need for guarded
+devirtualization.
+- should we enable this for preJITted code? In preJITted code the target method
+table is not a JIT-time constant and must be looked up.
+- in the prototype, guarded devirtualization and late devirtualization sometimes
+conflict. Say we fail to devirtualize a site, and so expand via guarded devirtualization
+guessing some class X. The residual virtual call then may be optimizable via late
+devirtualization, and this may discover the actual class. In that case the guarded
+devirtualization is not needed. But currently it can't be undone.
+- we probably don't want to bother with guarded devirtualization if we can't also
+inline. But it takes us several evaluation steps to determine if a call can
+be inlined, some of these happening *after* we've done the guarded expansion.
+Again this expansion can't be undone.
+- so perhaps we need to build an undo capability for the cases where guarded
+devirtualization doesn't lead to inlining and/or where late devirtualization also
+applies.
+
+### Implementation
+
+- avoid re-fetching method table for latent virtual call (should reduce code
+size and improve overall perf win)
+- look at how effectively we are sharing argument setup (might reduce code size
+and JIT time impact) -- perhaps implement head merging?
+- handle return values in full generality
+- il offsets
+- flag residual calls as not needing null checks
+- properly establish inline candidacy
+- decide if the refactoring of `InlineCandidateInfo` is the right way to pass
+information from importer to the indirect transform phase
+
+### Futures
+
+- can we cover multiple calls with one test? This can happen already if the
+subsequent call is introduced via inlining of the directly called method, as we
+know the exact type along that path. But for back to back calls to virtual
+methods off of the same object it would be nice to do just one test.
+- should we test for multiple types? Once we've peeled off the "most likely" case
+if the conditional probability of the next most likely case is high it is probably
+worth testing for it too. I believe the C++ compiler will test up to 3 candidates
+this way... but that's a lot of code expansion. \ No newline at end of file
diff --git a/Documentation/design-docs/ThreeClassesDevirt.JPG b/Documentation/design-docs/ThreeClassesDevirt.JPG
new file mode 100644
index 0000000000..1ae19baab4
--- /dev/null
+++ b/Documentation/design-docs/ThreeClassesDevirt.JPG
Binary files differ
diff --git a/Documentation/design-docs/ThreeClassesDevirtDetail.JPG b/Documentation/design-docs/ThreeClassesDevirtDetail.JPG
new file mode 100644
index 0000000000..0edba7479e
--- /dev/null
+++ b/Documentation/design-docs/ThreeClassesDevirtDetail.JPG
Binary files differ
diff --git a/Documentation/design-docs/ThreeClassesDevirtFull.JPG b/Documentation/design-docs/ThreeClassesDevirtFull.JPG
new file mode 100644
index 0000000000..2c09920a7c
--- /dev/null
+++ b/Documentation/design-docs/ThreeClassesDevirtFull.JPG
Binary files differ
diff --git a/Documentation/design-docs/ThreeClassesInterface.JPG b/Documentation/design-docs/ThreeClassesInterface.JPG
new file mode 100644
index 0000000000..cbd3551f74
--- /dev/null
+++ b/Documentation/design-docs/ThreeClassesInterface.JPG
Binary files differ
diff --git a/Documentation/design-docs/TwoClassesBaseline.JPG b/Documentation/design-docs/TwoClassesBaseline.JPG
new file mode 100644
index 0000000000..3a8b4b21e8
--- /dev/null
+++ b/Documentation/design-docs/TwoClassesBaseline.JPG
Binary files differ
diff --git a/Documentation/design-docs/TwoClassesDevirt.JPG b/Documentation/design-docs/TwoClassesDevirt.JPG
new file mode 100644
index 0000000000..7c48264eef
--- /dev/null
+++ b/Documentation/design-docs/TwoClassesDevirt.JPG
Binary files differ
diff --git a/Documentation/design-docs/TwoClassesInterface.JPG b/Documentation/design-docs/TwoClassesInterface.JPG
new file mode 100644
index 0000000000..69063dc28a
--- /dev/null
+++ b/Documentation/design-docs/TwoClassesInterface.JPG
Binary files differ
diff --git a/src/jit/compiler.cpp b/src/jit/compiler.cpp
index 86a85cec52..2851f3d854 100644
--- a/src/jit/compiler.cpp
+++ b/src/jit/compiler.cpp
@@ -4561,10 +4561,8 @@ void Compiler::compCompile(void** methodCodePtr, ULONG* methodCodeSize, JitFlags
fgRemovePreds();
}
- if (IsTargetAbi(CORINFO_CORERT_ABI) && doesMethodHaveFatPointer())
- {
- fgTransformFatCalli();
- }
+ // Transform indirect calls that require control flow expansion.
+ fgTransformIndirectCalls();
EndPhase(PHASE_IMPORTATION);
diff --git a/src/jit/compiler.h b/src/jit/compiler.h
index a1309621b4..a3f46bda86 100644
--- a/src/jit/compiler.h
+++ b/src/jit/compiler.h
@@ -3298,6 +3298,13 @@ public:
return IsTargetAbi(CORINFO_CORERT_ABI) ? TYP_I_IMPL : TYP_REF;
}
+ void impDevirtualizeCall(GenTreeCall* call,
+ CORINFO_METHOD_HANDLE* method,
+ unsigned* methodFlags,
+ CORINFO_CONTEXT_HANDLE* contextHandle,
+ CORINFO_CONTEXT_HANDLE* exactContextHandle,
+ bool isLateDevirtualization);
+
//=========================================================================
// PROTECTED
//=========================================================================
@@ -3355,12 +3362,6 @@ protected:
CORINFO_CALL_INFO* callInfo,
IL_OFFSET rawILOffset);
- void impDevirtualizeCall(GenTreeCall* call,
- CORINFO_METHOD_HANDLE* method,
- unsigned* methodFlags,
- CORINFO_CONTEXT_HANDLE* contextHandle,
- CORINFO_CONTEXT_HANDLE* exactContextHandle);
-
CORINFO_CLASS_HANDLE impGetSpecialIntrinsicExactReturnType(CORINFO_METHOD_HANDLE specialIntrinsicHandle);
bool impMethodInfo_hasRetBuffArg(CORINFO_METHOD_INFO* methInfo);
@@ -3865,7 +3866,7 @@ private:
bool forceInline,
InlineResult* inlineResult);
- void impCheckCanInline(GenTree* call,
+ void impCheckCanInline(GenTreeCall* call,
CORINFO_METHOD_HANDLE fncHandle,
unsigned methAttr,
CORINFO_CONTEXT_HANDLE exactContextHnd,
@@ -3894,6 +3895,11 @@ private:
bool exactContextNeedsRuntimeLookup,
CORINFO_CALL_INFO* callInfo);
+ void impMarkInlineCandidateHelper(GenTreeCall* call,
+ CORINFO_CONTEXT_HANDLE exactContextHnd,
+ bool exactContextNeedsRuntimeLookup,
+ CORINFO_CALL_INFO* callInfo);
+
bool impTailCallRetTypeCompatible(var_types callerRetType,
CORINFO_CLASS_HANDLE callerRetTypeClass,
var_types calleeRetType,
@@ -4133,7 +4139,7 @@ public:
void fgImport();
- void fgTransformFatCalli();
+ void fgTransformIndirectCalls();
void fgInline();
@@ -5354,8 +5360,8 @@ private:
#ifdef DEBUG
static fgWalkPreFn fgDebugCheckInlineCandidates;
- void CheckNoFatPointerCandidatesLeft();
- static fgWalkPreFn fgDebugCheckFatPointerCandidates;
+ void CheckNoTransformableIndirectCallsRemain();
+ static fgWalkPreFn fgDebugCheckForTransformableIndirectCalls;
#endif
void fgPromoteStructs();
@@ -6138,6 +6144,7 @@ public:
#define OMF_HAS_NULLCHECK 0x00000010 // Method contains null check.
#define OMF_HAS_FATPOINTER 0x00000020 // Method contains call, that needs fat pointer transformation.
#define OMF_HAS_OBJSTACKALLOC 0x00000040 // Method contains an object allocated on the stack.
+#define OMF_HAS_GUARDEDDEVIRT 0x00000080 // Method contains guarded devirtualization candidate
bool doesMethodHaveFatPointer()
{
@@ -6156,6 +6163,27 @@ public:
void addFatPointerCandidate(GenTreeCall* call);
+ bool doesMethodHaveGuardedDevirtualization()
+ {
+ return (optMethodFlags & OMF_HAS_GUARDEDDEVIRT) != 0;
+ }
+
+ void setMethodHasGuardedDevirtualization()
+ {
+ optMethodFlags |= OMF_HAS_GUARDEDDEVIRT;
+ }
+
+ void clearMethodHasGuardedDevirtualization()
+ {
+ optMethodFlags &= ~OMF_HAS_GUARDEDDEVIRT;
+ }
+
+ void addGuardedDevirtualizationCandidate(GenTreeCall* call,
+ CORINFO_METHOD_HANDLE methodHandle,
+ CORINFO_CLASS_HANDLE classHandle,
+ unsigned methodAttr,
+ unsigned classAttr);
+
unsigned optMethodFlags;
// Recursion bound controls how far we can go backwards tracking for a SSA value.
diff --git a/src/jit/flowgraph.cpp b/src/jit/flowgraph.cpp
index dd184539fd..ad89317159 100644
--- a/src/jit/flowgraph.cpp
+++ b/src/jit/flowgraph.cpp
@@ -5892,6 +5892,21 @@ void Compiler::fgFindBasicBlocks()
{
// This temp should already have the type of the return value.
JITDUMP("\nInliner: re-using pre-existing spill temp V%02u\n", lvaInlineeReturnSpillTemp);
+
+ if (info.compRetType == TYP_REF)
+ {
+ // We may have co-opted an existing temp for the return spill.
+ // We likely assumed it was single-def at the time, but now
+ // we can see it has multiple definitions.
+ if ((retBlocks > 1) && (lvaTable[lvaInlineeReturnSpillTemp].lvSingleDef == 1))
+ {
+ // Make sure it is no longer marked single def. This is only safe
+ // to do if we haven't ever updated the type.
+ assert(!lvaTable[lvaInlineeReturnSpillTemp].lvClassInfoUpdated);
+ JITDUMP("Marked return spill temp V%02u as NOT single def temp\n", lvaInlineeReturnSpillTemp);
+ lvaTable[lvaInlineeReturnSpillTemp].lvSingleDef = 0;
+ }
+ }
}
else
{
@@ -21859,43 +21874,58 @@ void Compiler::fgInline()
do
{
- /* Make the current basic block address available globally */
-
+ // Make the current basic block address available globally
compCurBB = block;
- GenTreeStmt* stmt;
- GenTree* expr;
-
- for (stmt = block->firstStmt(); stmt != nullptr; stmt = stmt->gtNextStmt)
+ for (GenTreeStmt* stmt = block->firstStmt(); stmt != nullptr; stmt = stmt->gtNextStmt)
{
- expr = stmt->gtStmtExpr;
- // See if we can expand the inline candidate
- if ((expr->gtOper == GT_CALL) && ((expr->gtFlags & GTF_CALL_INLINE_CANDIDATE) != 0))
+#ifdef DEBUG
+ // In debug builds we want the inline tree to show all failed
+ // inlines. Some inlines may fail very early and never make it to
+ // candidate stage. So scan the tree looking for those early failures.
+ fgWalkTreePre(&stmt->gtStmtExpr, fgFindNonInlineCandidate, stmt);
+#endif
+
+ GenTree* expr = stmt->gtStmtExpr;
+
+ // The importer ensures that all inline candidates are
+ // statement expressions. So see if we have a call.
+ if (expr->IsCall())
{
GenTreeCall* call = expr->AsCall();
- InlineResult inlineResult(this, call, stmt, "fgInline");
- fgMorphStmt = stmt;
+ // We do. Is it an inline candidate?
+ //
+ // Note we also process GuardeDevirtualizationCandidates here as we've
+ // split off GT_RET_EXPRs for them even when they are not inline candidates
+ // as we need similar processing to ensure they get patched back to where
+ // they belong.
+ if (call->IsInlineCandidate() || call->IsGuardedDevirtualizationCandidate())
+ {
+ InlineResult inlineResult(this, call, stmt, "fgInline");
+
+ fgMorphStmt = stmt;
- fgMorphCallInline(call, &inlineResult);
+ fgMorphCallInline(call, &inlineResult);
- if (stmt->gtStmtExpr->IsNothingNode())
- {
- fgRemoveStmt(block, stmt);
- continue;
+ // fgMorphCallInline may have updated the
+ // statement expression to a GT_NOP if the
+ // call returned a value, regardless of
+ // whether the inline succeeded or failed.
+ //
+ // If so, remove the GT_NOP and continue
+ // on with the next statement.
+ if (stmt->gtStmtExpr->IsNothingNode())
+ {
+ fgRemoveStmt(block, stmt);
+ continue;
+ }
}
}
- else
- {
-#ifdef DEBUG
- // Look for non-candidates.
- fgWalkTreePre(&stmt->gtStmtExpr, fgFindNonInlineCandidate, stmt);
-#endif
- }
- // See if we need to replace the return value place holder.
- // Also, see if this update enables further devirtualization.
+ // See if we need to replace some return value place holders.
+ // Also, see if this replacement enables further devirtualization.
//
// Note we have both preorder and postorder callbacks here.
//
@@ -22008,6 +22038,11 @@ Compiler::fgWalkResult Compiler::fgFindNonInlineCandidate(GenTree** pTree, fgWal
void Compiler::fgNoteNonInlineCandidate(GenTreeStmt* stmt, GenTreeCall* call)
{
+ if (call->IsInlineCandidate() || call->IsGuardedDevirtualizationCandidate())
+ {
+ return;
+ }
+
InlineResult inlineResult(this, call, nullptr, "fgNotInlineCandidate");
InlineObservation currentObservation = InlineObservation::CALLSITE_NOT_CANDIDATE;
@@ -22471,10 +22506,11 @@ Compiler::fgWalkResult Compiler::fgLateDevirtualization(GenTree** pTree, fgWalkD
}
#endif // DEBUG
- CORINFO_METHOD_HANDLE method = call->gtCallMethHnd;
- unsigned methodFlags = 0;
- CORINFO_CONTEXT_HANDLE context = nullptr;
- comp->impDevirtualizeCall(call, &method, &methodFlags, &context, nullptr);
+ CORINFO_METHOD_HANDLE method = call->gtCallMethHnd;
+ unsigned methodFlags = 0;
+ CORINFO_CONTEXT_HANDLE context = nullptr;
+ const bool isLateDevirtualization = true;
+ comp->impDevirtualizeCall(call, &method, &methodFlags, &context, nullptr, isLateDevirtualization);
}
}
else if (tree->OperGet() == GT_ASG)
@@ -25507,8 +25543,10 @@ bool Compiler::fgRetargetBranchesToCanonicalCallFinally(BasicBlock* block,
return true;
}
-// FatCalliTransformer transforms calli that can use fat function pointer.
-// Fat function pointer is pointer with the second least significant bit set,
+// IndirectCallTransformer transforms indirect calls that involve fat pointers
+// or guarded devirtualization.
+//
+// A fat function pointer is pointer with the second least significant bit set,
// if the bit is set, the pointer (after clearing the bit) actually points to
// a tuple <method pointer, instantiation argument pointer> where
// instantiationArgument is a hidden first argument required by method pointer.
@@ -25557,38 +25595,60 @@ bool Compiler::fgRetargetBranchesToCanonicalCallFinally(BasicBlock* block,
// subsequent statements
// }
//
-class FatCalliTransformer
+class IndirectCallTransformer
{
public:
- FatCalliTransformer(Compiler* compiler) : compiler(compiler)
+ IndirectCallTransformer(Compiler* compiler) : compiler(compiler)
{
}
//------------------------------------------------------------------------
// Run: run transformation for each block.
//
- void Run()
+ // Returns:
+ // Count of calls transformed.
+ int Run()
{
+ int count = 0;
+
for (BasicBlock* block = compiler->fgFirstBB; block != nullptr; block = block->bbNext)
{
- TransformBlock(block);
+ count += TransformBlock(block);
}
+
+ return count;
}
private:
//------------------------------------------------------------------------
- // TransformBlock: look through statements and transform statements with fat pointer calls.
+ // TransformBlock: look through statements and transform statements with
+ // particular indirect calls
//
- void TransformBlock(BasicBlock* block)
+ // Returns:
+ // Count of calls transformed.
+ //
+ int TransformBlock(BasicBlock* block)
{
+ int count = 0;
+
for (GenTreeStmt* stmt = block->firstStmt(); stmt != nullptr; stmt = stmt->gtNextStmt)
{
if (ContainsFatCalli(stmt))
{
- StatementTransformer stmtTransformer(compiler, block, stmt);
- stmtTransformer.Run();
+ FatPointerCallTransformer transformer(compiler, block, stmt);
+ transformer.Run();
+ count++;
+ }
+
+ if (ContainsGuardedDevirtualizationCandidate(stmt))
+ {
+ GuardedDevirtualizationTransformer transformer(compiler, block, stmt);
+ transformer.Run();
+ count++;
}
}
+
+ return count;
}
//------------------------------------------------------------------------
@@ -25609,39 +25669,158 @@ private:
return fatPointerCandidate->IsCall() && fatPointerCandidate->AsCall()->IsFatPointerCandidate();
}
- class StatementTransformer
+ //------------------------------------------------------------------------
+ // ContainsGuardedDevirtualizationCandidate: check does this statement contain a virtual
+ // call that we'd like to guardedly devirtualize?
+ //
+ // Return Value:
+ // true if contains, false otherwise.
+ //
+ // Notes:
+ // calls are hoisted to top level ... (we hope)
+ bool ContainsGuardedDevirtualizationCandidate(GenTreeStmt* stmt)
+ {
+ GenTree* candidate = stmt->gtStmtExpr;
+ return candidate->IsCall() && candidate->AsCall()->IsGuardedDevirtualizationCandidate();
+ }
+
+ class Transformer
{
public:
- StatementTransformer(Compiler* compiler, BasicBlock* block, GenTreeStmt* stmt)
+ Transformer(Compiler* compiler, BasicBlock* block, GenTreeStmt* stmt)
: compiler(compiler), currBlock(block), stmt(stmt)
{
- remainderBlock = nullptr;
- checkBlock = nullptr;
- thenBlock = nullptr;
- elseBlock = nullptr;
- doesReturnValue = stmt->gtStmtExpr->OperIs(GT_ASG);
- origCall = GetCall(stmt);
- fptrAddress = origCall->gtCallAddr;
- pointerType = fptrAddress->TypeGet();
+ remainderBlock = nullptr;
+ checkBlock = nullptr;
+ thenBlock = nullptr;
+ elseBlock = nullptr;
+ origCall = nullptr;
}
//------------------------------------------------------------------------
// Run: transform the statement as described above.
//
- void Run()
+ virtual void Run()
+ {
+ origCall = GetCall(stmt);
+ Transform();
+ }
+
+ void Transform()
{
- ClearFatFlag();
+ JITDUMP("*** %s: transforming [%06u]\n", Name(), compiler->dspTreeID(stmt));
+ FixupRetExpr();
+ ClearFlag();
CreateRemainder();
CreateCheck();
CreateThen();
CreateElse();
-
RemoveOldStatement();
SetWeights();
ChainFlow();
}
- private:
+ protected:
+ virtual const char* Name() = 0;
+ virtual void ClearFlag() = 0;
+ virtual GenTreeCall* GetCall(GenTreeStmt* callStmt) = 0;
+ virtual void FixupRetExpr() = 0;
+
+ //------------------------------------------------------------------------
+ // CreateRemainder: split current block at the call stmt and
+ // insert statements after the call into remainderBlock.
+ //
+ void CreateRemainder()
+ {
+ remainderBlock = compiler->fgSplitBlockAfterStatement(currBlock, stmt);
+ unsigned propagateFlags = currBlock->bbFlags & BBF_GC_SAFE_POINT;
+ remainderBlock->bbFlags |= BBF_JMP_TARGET | BBF_HAS_LABEL | propagateFlags;
+ }
+
+ virtual void CreateCheck() = 0;
+
+ //------------------------------------------------------------------------
+ // CreateAndInsertBasicBlock: ask compiler to create new basic block.
+ // and insert in into the basic block list.
+ //
+ // Arguments:
+ // jumpKind - jump kind for the new basic block
+ // insertAfter - basic block, after which compiler has to insert the new one.
+ //
+ // Return Value:
+ // new basic block.
+ BasicBlock* CreateAndInsertBasicBlock(BBjumpKinds jumpKind, BasicBlock* insertAfter)
+ {
+ BasicBlock* block = compiler->fgNewBBafter(jumpKind, insertAfter, true);
+ if ((insertAfter->bbFlags & BBF_INTERNAL) == 0)
+ {
+ block->bbFlags &= ~BBF_INTERNAL;
+ block->bbFlags |= BBF_IMPORTED;
+ }
+ return block;
+ }
+
+ virtual void CreateThen() = 0;
+ virtual void CreateElse() = 0;
+
+ //------------------------------------------------------------------------
+ // RemoveOldStatement: remove original stmt from current block.
+ //
+ void RemoveOldStatement()
+ {
+ compiler->fgRemoveStmt(currBlock, stmt);
+ }
+
+ //------------------------------------------------------------------------
+ // SetWeights: set weights for new blocks.
+ //
+ void SetWeights()
+ {
+ remainderBlock->inheritWeight(currBlock);
+ checkBlock->inheritWeight(currBlock);
+ thenBlock->inheritWeightPercentage(currBlock, HIGH_PROBABILITY);
+ elseBlock->inheritWeightPercentage(currBlock, 100 - HIGH_PROBABILITY);
+ }
+
+ //------------------------------------------------------------------------
+ // ChainFlow: link new blocks into correct cfg.
+ //
+ void ChainFlow()
+ {
+ assert(!compiler->fgComputePredsDone);
+ checkBlock->bbJumpDest = elseBlock;
+ thenBlock->bbJumpDest = remainderBlock;
+ }
+
+ Compiler* compiler;
+ BasicBlock* currBlock;
+ BasicBlock* remainderBlock;
+ BasicBlock* checkBlock;
+ BasicBlock* thenBlock;
+ BasicBlock* elseBlock;
+ GenTreeStmt* stmt;
+ GenTreeCall* origCall;
+
+ const int HIGH_PROBABILITY = 80;
+ };
+
+ class FatPointerCallTransformer final : public Transformer
+ {
+ public:
+ FatPointerCallTransformer(Compiler* compiler, BasicBlock* block, GenTreeStmt* stmt)
+ : Transformer(compiler, block, stmt)
+ {
+ doesReturnValue = stmt->gtStmtExpr->OperIs(GT_ASG);
+ fptrAddress = origCall->gtCallAddr;
+ pointerType = fptrAddress->TypeGet();
+ }
+
+ protected:
+ virtual const char* Name()
+ {
+ return "FatPointerCall";
+ }
+
//------------------------------------------------------------------------
// GetCall: find a call in a statement.
//
@@ -25650,7 +25829,7 @@ private:
//
// Return Value:
// call tree node pointer.
- GenTreeCall* GetCall(GenTreeStmt* callStmt)
+ virtual GenTreeCall* GetCall(GenTreeStmt* callStmt)
{
GenTree* tree = callStmt->gtStmtExpr;
GenTreeCall* call = nullptr;
@@ -25667,28 +25846,22 @@ private:
}
//------------------------------------------------------------------------
- // ClearFatFlag: clear fat pointer candidate flag from the original call.
+ // ClearFlag: clear fat pointer candidate flag from the original call.
//
- void ClearFatFlag()
+ virtual void ClearFlag()
{
origCall->ClearFatPointerCandidate();
}
- //------------------------------------------------------------------------
- // CreateRemainder: split current block at the fat call stmt and
- // insert statements after the call into remainderBlock.
- //
- void CreateRemainder()
+ // FixupRetExpr: no action needed as we handle this in the importer.
+ virtual void FixupRetExpr()
{
- remainderBlock = compiler->fgSplitBlockAfterStatement(currBlock, stmt);
- unsigned propagateFlags = currBlock->bbFlags & BBF_GC_SAFE_POINT;
- remainderBlock->bbFlags |= BBF_JMP_TARGET | BBF_HAS_LABEL | propagateFlags;
}
//------------------------------------------------------------------------
// CreateCheck: create check block, that checks fat pointer bit set.
//
- void CreateCheck()
+ virtual void CreateCheck()
{
checkBlock = CreateAndInsertBasicBlock(BBJ_COND, currBlock);
GenTree* fatPointerMask = new (compiler, GT_CNS_INT) GenTreeIntCon(TYP_I_IMPL, FAT_POINTER_MASK);
@@ -25702,19 +25875,20 @@ private:
}
//------------------------------------------------------------------------
- // CreateCheck: create then block, that is executed if call address is not fat pointer.
+ // CreateThen: create then block, that is executed if the check succeeds.
+ // This simply executes the original call.
//
- void CreateThen()
+ virtual void CreateThen()
{
- thenBlock = CreateAndInsertBasicBlock(BBJ_ALWAYS, checkBlock);
- GenTree* nonFatCallStmt = compiler->gtCloneExpr(stmt)->AsStmt();
- compiler->fgInsertStmtAtEnd(thenBlock, nonFatCallStmt);
+ thenBlock = CreateAndInsertBasicBlock(BBJ_ALWAYS, checkBlock);
+ GenTree* copyOfOriginalStmt = compiler->gtCloneExpr(stmt)->AsStmt();
+ compiler->fgInsertStmtAtEnd(thenBlock, copyOfOriginalStmt);
}
//------------------------------------------------------------------------
- // CreateCheck: create else block, that is executed if call address is fat pointer.
+ // CreateElse: create else block, that is executed if call address is fat pointer.
//
- void CreateElse()
+ virtual void CreateElse()
{
elseBlock = CreateAndInsertBasicBlock(BBJ_NONE, thenBlock);
@@ -25727,27 +25901,6 @@ private:
}
//------------------------------------------------------------------------
- // CreateAndInsertBasicBlock: ask compiler to create new basic block.
- // and insert in into the basic block list.
- //
- // Arguments:
- // jumpKind - jump kind for the new basic block
- // insertAfter - basic block, after which compiler has to insert the new one.
- //
- // Return Value:
- // new basic block.
- BasicBlock* CreateAndInsertBasicBlock(BBjumpKinds jumpKind, BasicBlock* insertAfter)
- {
- BasicBlock* block = compiler->fgNewBBafter(jumpKind, insertAfter, true);
- if ((insertAfter->bbFlags & BBF_INTERNAL) == 0)
- {
- block->bbFlags &= ~BBF_INTERNAL;
- block->bbFlags |= BBF_IMPORTED;
- }
- return block;
- }
-
- //------------------------------------------------------------------------
// GetFixedFptrAddress: clear fat pointer bit from fat pointer address.
//
// Return Value:
@@ -25843,49 +25996,306 @@ private:
iterator->Rest() = compiler->gtNewArgList(hiddenArgument);
}
+ private:
+ const int FAT_POINTER_MASK = 0x2;
+
+ GenTree* fptrAddress;
+ var_types pointerType;
+ bool doesReturnValue;
+ };
+
+ class GuardedDevirtualizationTransformer final : public Transformer
+ {
+ public:
+ GuardedDevirtualizationTransformer(Compiler* compiler, BasicBlock* block, GenTreeStmt* stmt)
+ : Transformer(compiler, block, stmt), returnTemp(BAD_VAR_NUM)
+ {
+ }
+
//------------------------------------------------------------------------
- // RemoveOldStatement: remove original stmt from current block.
+ // Run: transform the statement as described above.
//
- void RemoveOldStatement()
+ virtual void Run()
{
- compiler->fgRemoveStmt(currBlock, stmt);
+ origCall = GetCall(stmt);
+
+ JITDUMP("*** %s contemplating [%06u]\n", Name(), compiler->dspTreeID(origCall));
+
+ // We currently need inline candidate info to guarded devirt.
+ if (!origCall->IsInlineCandidate())
+ {
+ JITDUMP("*** %s Bailing on [%06u] -- not an inline candidate\n", Name(), compiler->dspTreeID(origCall));
+ ClearFlag();
+ return;
+ }
+
+ // For now, bail on transforming calls that still appear
+ // to return structs by value as there is deferred work
+ // needed to fix up the return type.
+ //
+ // See for instance fgUpdateInlineReturnExpressionPlaceHolder.
+ if (origCall->TypeGet() == TYP_STRUCT)
+ {
+ JITDUMP("*** %s Bailing on [%06u] -- can't handle by-value struct returns yet\n", Name(),
+ compiler->dspTreeID(origCall));
+ ClearFlag();
+
+ // For stub calls restore the stub address
+ if (origCall->IsVirtualStub())
+ {
+ origCall->gtStubCallStubAddr = origCall->gtInlineCandidateInfo->stubAddr;
+ }
+ return;
+ }
+
+ Transform();
+ }
+
+ protected:
+ virtual const char* Name()
+ {
+ return "GuardedDevirtualization";
}
//------------------------------------------------------------------------
- // SetWeights: set weights for new blocks.
+ // GetCall: find a call in a statement.
//
- void SetWeights()
+ // Arguments:
+ // callStmt - the statement with the call inside.
+ //
+ // Return Value:
+ // call tree node pointer.
+ virtual GenTreeCall* GetCall(GenTreeStmt* callStmt)
{
- remainderBlock->inheritWeight(currBlock);
- checkBlock->inheritWeight(currBlock);
- thenBlock->inheritWeightPercentage(currBlock, HIGH_PROBABILITY);
- elseBlock->inheritWeightPercentage(currBlock, 100 - HIGH_PROBABILITY);
+ GenTree* tree = callStmt->gtStmtExpr;
+ assert(tree->IsCall());
+ GenTreeCall* call = tree->AsCall();
+ return call;
}
//------------------------------------------------------------------------
- // ChainFlow: link new blocks into correct cfg.
+ // ClearFlag: clear guarded devirtualization candidate flag from the original call.
//
- void ChainFlow()
+ virtual void ClearFlag()
{
- assert(!compiler->fgComputePredsDone);
- checkBlock->bbJumpDest = elseBlock;
- thenBlock->bbJumpDest = remainderBlock;
+ origCall->ClearGuardedDevirtualizationCandidate();
}
- Compiler* compiler;
- BasicBlock* currBlock;
- BasicBlock* remainderBlock;
- BasicBlock* checkBlock;
- BasicBlock* thenBlock;
- BasicBlock* elseBlock;
- GenTreeStmt* stmt;
- GenTreeCall* origCall;
- GenTree* fptrAddress;
- var_types pointerType;
- bool doesReturnValue;
+ //------------------------------------------------------------------------
+ // CreateCheck: create check block and check method table
+ //
+ virtual void CreateCheck()
+ {
+ checkBlock = CreateAndInsertBasicBlock(BBJ_COND, currBlock);
- const int FAT_POINTER_MASK = 0x2;
- const int HIGH_PROBABILITY = 80;
+ // Fetch method table from object arg to call.
+ GenTree* thisTree = compiler->gtCloneExpr(origCall->gtCallObjp);
+
+ // Create temp for this if the tree is costly.
+ if (!thisTree->IsLocal())
+ {
+ const unsigned thisTempNum = compiler->lvaGrabTemp(true DEBUGARG("guarded devirt this temp"));
+ // lvaSetClass(thisTempNum, ...);
+ GenTree* asgTree = compiler->gtNewTempAssign(thisTempNum, thisTree);
+ GenTree* asgStmt = compiler->fgNewStmtFromTree(asgTree, stmt->gtStmt.gtStmtILoffsx);
+ compiler->fgInsertStmtAtEnd(checkBlock, asgStmt);
+
+ thisTree = compiler->gtNewLclvNode(thisTempNum, TYP_REF);
+
+ // Propagate the new this to the call. Must be a new expr as the call
+ // will live on in the else block and thisTree is used below.
+ origCall->gtCallObjp = compiler->gtNewLclvNode(thisTempNum, TYP_REF);
+ }
+
+ GenTree* methodTable = compiler->gtNewIndir(TYP_I_IMPL, thisTree);
+ methodTable->gtFlags |= GTF_IND_INVARIANT;
+
+ // Find target method table
+ GuardedDevirtualizationCandidateInfo* guardedInfo = origCall->gtGuardedDevirtualizationCandidateInfo;
+ CORINFO_CLASS_HANDLE clsHnd = guardedInfo->guardedClassHandle;
+ GenTree* targetMethodTable = compiler->gtNewIconEmbClsHndNode(clsHnd);
+
+ // Compare and jump to else (which does the indirect call) if NOT equal
+ GenTree* methodTableCompare = compiler->gtNewOperNode(GT_NE, TYP_INT, targetMethodTable, methodTable);
+ GenTree* jmpTree = compiler->gtNewOperNode(GT_JTRUE, TYP_VOID, methodTableCompare);
+ GenTree* jmpStmt = compiler->fgNewStmtFromTree(jmpTree, stmt->gtStmt.gtStmtILoffsx);
+ compiler->fgInsertStmtAtEnd(checkBlock, jmpStmt);
+ }
+
+ //------------------------------------------------------------------------
+ // FixupRetExpr: set up to repair return value placeholder from call
+ //
+ virtual void FixupRetExpr()
+ {
+ // If call returns a value, we need to copy it to a temp, and
+ // update the associated GT_RET_EXPR to refer to the temp instead
+ // of the call.
+ //
+ // Note implicit by-ref returns should have already been converted
+ // so any struct copy we induce here should be cheap.
+ //
+ // Todo: make sure we understand how this interacts with return type
+ // munging for small structs.
+ InlineCandidateInfo* inlineInfo = origCall->gtInlineCandidateInfo;
+ GenTree* retExpr = inlineInfo->retExpr;
+
+ // Sanity check the ret expr if non-null: it should refer to the original call.
+ if (retExpr != nullptr)
+ {
+ assert(retExpr->gtRetExpr.gtInlineCandidate == origCall);
+ }
+
+ if (origCall->TypeGet() != TYP_VOID)
+ {
+ returnTemp = compiler->lvaGrabTemp(false DEBUGARG("guarded devirt return temp"));
+ JITDUMP("Reworking call(s) to return value via a new temp V%02u\n", returnTemp);
+
+ if (varTypeIsStruct(origCall))
+ {
+ compiler->lvaSetStruct(returnTemp, origCall->gtRetClsHnd, false);
+ }
+
+ GenTree* tempTree = compiler->gtNewLclvNode(returnTemp, origCall->TypeGet());
+
+ JITDUMP("Updating GT_RET_EXPR [%06u] to refer to temp V%02u\n", compiler->dspTreeID(retExpr),
+ returnTemp);
+ retExpr->gtRetExpr.gtInlineCandidate = tempTree;
+ }
+ else if (retExpr != nullptr)
+ {
+ // We still oddly produce GT_RET_EXPRs for some void
+ // returning calls. Just patch the ret expr to a NOP.
+ //
+ // Todo: consider bagging creation of these RET_EXPRs. The only possible
+ // benefit they provide is stitching back larger trees for failed inlines
+ // of void-returning methods. But then the calls likely sit in commas and
+ // the benefit of a larger tree is unclear.
+ JITDUMP("Updating GT_RET_EXPR [%06u] for VOID return to refer to a NOP\n",
+ compiler->dspTreeID(retExpr));
+ GenTree* nopTree = compiler->gtNewNothingNode();
+ retExpr->gtRetExpr.gtInlineCandidate = nopTree;
+ }
+ else
+ {
+ // We do not produce GT_RET_EXPRs for CTOR calls, so there is nothing to patch.
+ }
+ }
+
+ //------------------------------------------------------------------------
+ // CreateThen: create else block with direct call to method
+ //
+ virtual void CreateThen()
+ {
+ thenBlock = CreateAndInsertBasicBlock(BBJ_ALWAYS, checkBlock);
+
+ InlineCandidateInfo* inlineInfo = origCall->gtInlineCandidateInfo;
+ CORINFO_CLASS_HANDLE clsHnd = inlineInfo->clsHandle;
+
+ // copy 'this' to temp with exact type.
+ const unsigned thisTemp = compiler->lvaGrabTemp(false DEBUGARG("guarded devirt this exact temp"));
+ GenTree* clonedObj = compiler->gtCloneExpr(origCall->gtCallObjp);
+ GenTree* assign = compiler->gtNewTempAssign(thisTemp, clonedObj);
+ compiler->lvaSetClass(thisTemp, clsHnd, true);
+ GenTreeStmt* assignStmt = compiler->gtNewStmt(assign);
+ compiler->fgInsertStmtAtEnd(thenBlock, assignStmt);
+
+ // Clone call
+ GenTreeCall* call = compiler->gtCloneExpr(origCall)->AsCall();
+ call->gtCallObjp = compiler->gtNewLclvNode(thisTemp, TYP_REF);
+ call->SetIsGuarded();
+
+ JITDUMP("Direct call [%06u] in block BB%02u\n", compiler->dspTreeID(call), thenBlock->bbNum);
+
+ // Then invoke impDevirtualizeCall do actually
+ // transform the call for us. It should succeed.... as we have
+ // now provided an exact typed this.
+ CORINFO_METHOD_HANDLE methodHnd = inlineInfo->methInfo.ftn;
+ unsigned methodFlags = inlineInfo->methAttr;
+ CORINFO_CONTEXT_HANDLE context = inlineInfo->exactContextHnd;
+ const bool isLateDevirtualization = true;
+ compiler->impDevirtualizeCall(call, &methodHnd, &methodFlags, &context, nullptr, isLateDevirtualization);
+
+ // Presumably devirt might fail? If so we should try and avoid
+ // making this a guarded devirt candidate instead of ending
+ // up here.
+ assert(!call->IsVirtual());
+
+ // Re-establish this call as an inline candidate.
+ GenTree* oldRetExpr = inlineInfo->retExpr;
+ inlineInfo->clsHandle = clsHnd;
+ inlineInfo->exactContextHnd = context;
+ call->gtInlineCandidateInfo = inlineInfo;
+
+ // Add the call
+ GenTreeStmt* callStmt = compiler->gtNewStmt(call);
+ compiler->fgInsertStmtAtEnd(thenBlock, callStmt);
+
+ // If there was a ret expr for this call, we need to create a new one
+ // and append it just after the call.
+ //
+ // Note the original GT_RET_EXPR is sitting at the join point of the
+ // guarded expansion and for non-void calls, and now refers to a temp local;
+ // we set all this up in FixupRetExpr().
+ if (oldRetExpr != nullptr)
+ {
+ GenTree* retExpr = compiler->gtNewInlineCandidateReturnExpr(call, call->TypeGet());
+ inlineInfo->retExpr = retExpr;
+
+ if (returnTemp != BAD_VAR_NUM)
+ {
+ retExpr = compiler->gtNewTempAssign(returnTemp, retExpr);
+ }
+ else
+ {
+ // We should always have a return temp if we return results by value
+ assert(origCall->TypeGet() == TYP_VOID);
+ }
+
+ GenTreeStmt* resultStmt = compiler->gtNewStmt(retExpr);
+ compiler->fgInsertStmtAtEnd(thenBlock, resultStmt);
+ }
+ }
+
+ //------------------------------------------------------------------------
+ // CreateElse: create else block. This executes the unaltered indirect call.
+ //
+ virtual void CreateElse()
+ {
+ elseBlock = CreateAndInsertBasicBlock(BBJ_NONE, thenBlock);
+ GenTreeCall* call = origCall;
+ GenTreeStmt* newStmt = compiler->gtNewStmt(call);
+
+ call->gtFlags &= ~GTF_CALL_INLINE_CANDIDATE;
+ call->SetIsGuarded();
+
+ JITDUMP("Residual call [%06u] moved to block BB%02u\n", compiler->dspTreeID(call), elseBlock->bbNum);
+
+ if (returnTemp != BAD_VAR_NUM)
+ {
+ GenTree* assign = compiler->gtNewTempAssign(returnTemp, call);
+ newStmt->gtStmtExpr = assign;
+ }
+
+ // For stub calls, restore the stub address. For everything else,
+ // null out the candidate info field.
+ if (call->IsVirtualStub())
+ {
+ JITDUMP("Restoring stub addr %p from candidate info\n", call->gtInlineCandidateInfo->stubAddr);
+ call->gtStubCallStubAddr = call->gtInlineCandidateInfo->stubAddr;
+ }
+ else
+ {
+ call->gtInlineCandidateInfo = nullptr;
+ }
+
+ compiler->fgInsertStmtAtEnd(elseBlock, newStmt);
+
+ // Set the original statement to a nop.
+ stmt->gtStmtExpr = compiler->gtNewNothingNode();
+ }
+
+ private:
+ unsigned returnTemp;
};
Compiler* compiler;
@@ -25894,46 +26304,74 @@ private:
#ifdef DEBUG
//------------------------------------------------------------------------
-// fgDebugCheckFatPointerCandidates: callback to make sure there are no more GTF_CALL_M_FAT_POINTER_CHECK calls.
+// fgDebugCheckForTransformableIndirectCalls: callback to make sure there
+// are no more GTF_CALL_M_FAT_POINTER_CHECK or GTF_CALL_M_GUARDED_DEVIRT
+// calls remaining
//
-Compiler::fgWalkResult Compiler::fgDebugCheckFatPointerCandidates(GenTree** pTree, fgWalkData* data)
+Compiler::fgWalkResult Compiler::fgDebugCheckForTransformableIndirectCalls(GenTree** pTree, fgWalkData* data)
{
GenTree* tree = *pTree;
if (tree->IsCall())
{
assert(!tree->AsCall()->IsFatPointerCandidate());
+ assert(!tree->AsCall()->IsGuardedDevirtualizationCandidate());
}
return WALK_CONTINUE;
}
//------------------------------------------------------------------------
-// CheckNoFatPointerCandidatesLeft: walk through blocks and check that there are no fat pointer candidates left.
+// CheckNoTransformableIndirectCallsRemain: walk through blocks and check
+// that there are no indirect call candidates left to transform.
//
-void Compiler::CheckNoFatPointerCandidatesLeft()
+void Compiler::CheckNoTransformableIndirectCallsRemain()
{
assert(!doesMethodHaveFatPointer());
+ assert(!doesMethodHaveGuardedDevirtualization());
+
for (BasicBlock* block = fgFirstBB; block != nullptr; block = block->bbNext)
{
for (GenTreeStmt* stmt = fgFirstBB->firstStmt(); stmt != nullptr; stmt = stmt->gtNextStmt)
{
- fgWalkTreePre(&stmt->gtStmtExpr, fgDebugCheckFatPointerCandidates);
+ fgWalkTreePre(&stmt->gtStmtExpr, fgDebugCheckForTransformableIndirectCalls);
}
}
}
#endif
//------------------------------------------------------------------------
-// fgTransformFatCalli: find and transform fat calls.
+// fgTransformIndirectCalls: find and transform various indirect calls
//
-void Compiler::fgTransformFatCalli()
+// These transformations happen post-import because they may introduce
+// control flow.
+//
+void Compiler::fgTransformIndirectCalls()
{
- assert(IsTargetAbi(CORINFO_CORERT_ABI));
- FatCalliTransformer fatCalliTransformer(this);
- fatCalliTransformer.Run();
- clearMethodHasFatPointer();
-#ifdef DEBUG
- CheckNoFatPointerCandidatesLeft();
-#endif
+ JITDUMP("\n*************** in fgTransformIndirectCalls(%s)\n", compIsForInlining() ? "inlinee" : "root");
+
+ if (doesMethodHaveFatPointer() || doesMethodHaveGuardedDevirtualization())
+ {
+ IndirectCallTransformer indirectCallTransformer(this);
+ int count = indirectCallTransformer.Run();
+
+ if (count > 0)
+ {
+ JITDUMP("\n*************** After fgTransformIndirectCalls() [%d calls transformed]\n", count);
+ INDEBUG(if (verbose) { fgDispBasicBlocks(true); });
+ }
+ else
+ {
+ JITDUMP(" -- no transforms done (?)\n");
+ }
+
+ clearMethodHasFatPointer();
+ clearMethodHasGuardedDevirtualization();
+ }
+ else
+ {
+ JITDUMP(" -- no candidates to transform\n");
+ }
+
+ INDEBUG(CheckNoTransformableIndirectCallsRemain(););
}
//------------------------------------------------------------------------
diff --git a/src/jit/gentree.cpp b/src/jit/gentree.cpp
index 90b6bf896f..542a701d65 100644
--- a/src/jit/gentree.cpp
+++ b/src/jit/gentree.cpp
@@ -7343,8 +7343,9 @@ GenTree* Compiler::gtCloneExpr(
copy->gtCall.setEntryPoint(tree->gtCall.gtEntryPoint);
#endif
-#ifdef DEBUG
+#if defined(DEBUG) || defined(INLINE_DATA)
copy->gtCall.gtInlineObservation = tree->gtCall.gtInlineObservation;
+ copy->gtCall.gtRawILOffset = tree->gtCall.gtRawILOffset;
#endif
copy->AsCall()->CopyOtherRegFlags(tree->AsCall());
@@ -9334,9 +9335,22 @@ void Compiler::gtDispNode(GenTree* tree, IndentStack* indentStack, __in __in_z _
goto DASH;
case GT_CALL:
- if (tree->gtFlags & GTF_CALL_INLINE_CANDIDATE)
+ if (tree->gtCall.IsInlineCandidate())
{
- printf("I");
+ if (tree->gtCall.IsGuardedDevirtualizationCandidate())
+ {
+ printf("&");
+ }
+ else
+ {
+ printf("I");
+ }
+ --msgLength;
+ break;
+ }
+ else if (tree->gtCall.IsGuardedDevirtualizationCandidate())
+ {
+ printf("G");
--msgLength;
break;
}
diff --git a/src/jit/gentree.h b/src/jit/gentree.h
index c6c5841e98..360fa1b43c 100644
--- a/src/jit/gentree.h
+++ b/src/jit/gentree.h
@@ -144,8 +144,8 @@ enum gtCallTypes : BYTE
/*****************************************************************************/
struct BasicBlock;
-
struct InlineCandidateInfo;
+struct GuardedDevirtualizationCandidateInfo;
typedef unsigned short AssertionIndex;
@@ -3547,6 +3547,8 @@ struct GenTreeCall final : public GenTree
// the comma result is unused.
#define GTF_CALL_M_DEVIRTUALIZED 0x00040000 // GT_CALL -- this call was devirtualized
#define GTF_CALL_M_UNBOXED 0x00080000 // GT_CALL -- this call was optimized to use the unboxed entry point
+#define GTF_CALL_M_GUARDED_DEVIRT 0x00100000 // GT_CALL -- this call is a candidate for guarded devirtualization
+#define GTF_CALL_M_GUARDED 0x00200000 // GT_CALL -- this call was transformed by guarded devirtualization
// clang-format on
@@ -3739,6 +3741,11 @@ struct GenTreeCall final : public GenTree
return (gtCallMoreFlags & GTF_CALL_M_FAT_POINTER_CHECK) != 0;
}
+ bool IsGuardedDevirtualizationCandidate() const
+ {
+ return (gtCallMoreFlags & GTF_CALL_M_GUARDED_DEVIRT) != 0;
+ }
+
bool IsPure(Compiler* compiler) const;
bool HasSideEffects(Compiler* compiler, bool ignoreExceptions = false, bool ignoreCctors = false) const;
@@ -3758,11 +3765,31 @@ struct GenTreeCall final : public GenTree
return (gtCallMoreFlags & GTF_CALL_M_DEVIRTUALIZED) != 0;
}
+ bool IsGuarded() const
+ {
+ return (gtCallMoreFlags & GTF_CALL_M_GUARDED) != 0;
+ }
+
bool IsUnboxed() const
{
return (gtCallMoreFlags & GTF_CALL_M_UNBOXED) != 0;
}
+ void ClearGuardedDevirtualizationCandidate()
+ {
+ gtCallMoreFlags &= ~GTF_CALL_M_GUARDED_DEVIRT;
+ }
+
+ void SetGuardedDevirtualizationCandidate()
+ {
+ gtCallMoreFlags |= GTF_CALL_M_GUARDED_DEVIRT;
+ }
+
+ void SetIsGuarded()
+ {
+ gtCallMoreFlags |= GTF_CALL_M_GUARDED;
+ }
+
unsigned gtCallMoreFlags; // in addition to gtFlags
unsigned char gtCallType : 3; // value from the gtCallTypes enumeration
@@ -3774,8 +3801,9 @@ struct GenTreeCall final : public GenTree
// only used for CALLI unmanaged calls (CT_INDIRECT)
GenTree* gtCallCookie;
// gtInlineCandidateInfo is only used when inlining methods
- InlineCandidateInfo* gtInlineCandidateInfo;
- void* gtStubCallStubAddr; // GTF_CALL_VIRT_STUB - these are never inlined
+ InlineCandidateInfo* gtInlineCandidateInfo;
+ GuardedDevirtualizationCandidateInfo* gtGuardedDevirtualizationCandidateInfo;
+ void* gtStubCallStubAddr; // GTF_CALL_VIRT_STUB - these are never inlined
CORINFO_GENERIC_HANDLE compileTimeHelperArgumentHandle; // Used to track type handle argument of dynamic helpers
void* gtDirectCallAddress; // Used to pass direct call address between lower and codegen
};
diff --git a/src/jit/importer.cpp b/src/jit/importer.cpp
index eabc3bc3b2..0ddf3b34ac 100644
--- a/src/jit/importer.cpp
+++ b/src/jit/importer.cpp
@@ -7740,8 +7740,7 @@ var_types Compiler::impImportCall(OPCODE opcode,
}
else
{
- // ok, the stub is available at compile type.
-
+ // The stub address is known at compile time
call = gtNewCallNode(CT_USER_FUNC, callInfo->hMethod, callRetTyp, nullptr, ilOffset);
call->gtCall.gtStubCallStubAddr = callInfo->stubLookup.constLookup.addr;
call->gtFlags |= GTF_CALL_VIRT_STUB;
@@ -8388,8 +8387,9 @@ var_types Compiler::impImportCall(OPCODE opcode,
assert(obj->gtType == TYP_REF);
// See if we can devirtualize.
+ const bool isLateDevirtualization = false;
impDevirtualizeCall(call->AsCall(), &callInfo->hMethod, &callInfo->methodFlags, &callInfo->contextHandle,
- &exactContextHnd);
+ &exactContextHnd, isLateDevirtualization);
}
if (impIsThis(obj))
@@ -8717,26 +8717,43 @@ DONE_CALL:
{
// Sometimes "call" is not a GT_CALL (if we imported an intrinsic that didn't turn into a call)
- bool fatPointerCandidate = call->AsCall()->IsFatPointerCandidate();
+ GenTreeCall* origCall = call->AsCall();
+
+ const bool isFatPointerCandidate = origCall->IsFatPointerCandidate();
+ const bool isInlineCandidate = origCall->IsInlineCandidate();
+ const bool isGuardedDevirtualizationCandidate = origCall->IsGuardedDevirtualizationCandidate();
+
if (varTypeIsStruct(callRetTyp))
{
+ // Need to treat all "split tree" cases here, not just inline candidates
call = impFixupCallStructReturn(call->AsCall(), sig->retTypeClass);
}
- if ((call->gtFlags & GTF_CALL_INLINE_CANDIDATE) != 0)
+ // TODO: consider handling fatcalli cases this way too...?
+ if (isInlineCandidate || isGuardedDevirtualizationCandidate)
{
+ // We should not have made any adjustments in impFixupCallStructReturn
+ // as we defer those until we know the fate of the call.
+ assert(call == origCall);
+
assert(opts.OptEnabled(CLFLG_INLINING));
- assert(!fatPointerCandidate); // We should not try to inline calli.
+ assert(!isFatPointerCandidate); // We should not try to inline calli.
// Make the call its own tree (spill the stack if needed).
impAppendTree(call, (unsigned)CHECK_SPILL_ALL, impCurStmtOffs);
// TODO: Still using the widened type.
- call = gtNewInlineCandidateReturnExpr(call, genActualType(callRetTyp));
+ GenTree* retExpr = gtNewInlineCandidateReturnExpr(call, genActualType(callRetTyp));
+
+ // Link the retExpr to the call so if necessary we can manipulate it later.
+ origCall->gtInlineCandidateInfo->retExpr = retExpr;
+
+ // Propagate retExpr as the placeholder for the call.
+ call = retExpr;
}
else
{
- if (fatPointerCandidate)
+ if (isFatPointerCandidate)
{
// fatPointer candidates should be in statements of the form call() or var = call().
// Such form allows to find statements with fat calls without walking through whole trees
@@ -18378,7 +18395,7 @@ void Compiler::impCanInlineIL(CORINFO_METHOD_HANDLE fncHandle,
/*****************************************************************************
*/
-void Compiler::impCheckCanInline(GenTree* call,
+void Compiler::impCheckCanInline(GenTreeCall* call,
CORINFO_METHOD_HANDLE fncHandle,
unsigned methAttr,
CORINFO_CONTEXT_HANDLE exactContextHnd,
@@ -18391,7 +18408,7 @@ void Compiler::impCheckCanInline(GenTree* call,
struct Param
{
Compiler* pThis;
- GenTree* call;
+ GenTreeCall* call;
CORINFO_METHOD_HANDLE fncHandle;
unsigned methAttr;
CORINFO_CONTEXT_HANDLE exactContextHnd;
@@ -18523,22 +18540,42 @@ void Compiler::impCheckCanInline(GenTree* call,
(varTypeIsStruct(fncRetType) && (fncRealRetType == TYP_STRUCT)));
#endif
+ // Allocate an InlineCandidateInfo structure,
//
- // Allocate an InlineCandidateInfo structure
+ // Or, reuse the existing GuardedDevirtualizationCandidateInfo,
+ // which was pre-allocated to have extra room.
//
InlineCandidateInfo* pInfo;
- pInfo = new (pParam->pThis, CMK_Inlining) InlineCandidateInfo;
-
- pInfo->dwRestrictions = dwRestrictions;
- pInfo->methInfo = methInfo;
- pInfo->methAttr = pParam->methAttr;
- pInfo->clsHandle = clsHandle;
- pInfo->clsAttr = clsAttr;
- pInfo->fncRetType = fncRetType;
- pInfo->exactContextHnd = pParam->exactContextHnd;
- pInfo->ilCallerHandle = pParam->pThis->info.compMethodHnd;
- pInfo->initClassResult = initClassResult;
- pInfo->preexistingSpillTemp = BAD_VAR_NUM;
+
+ if (pParam->call->IsGuardedDevirtualizationCandidate())
+ {
+ pInfo = pParam->call->gtInlineCandidateInfo;
+ }
+ else
+ {
+ pInfo = new (pParam->pThis, CMK_Inlining) InlineCandidateInfo;
+
+ // Null out bits we don't use when we're just inlining
+ pInfo->guardedClassHandle = nullptr;
+ pInfo->guardedMethodHandle = nullptr;
+ pInfo->stubAddr = nullptr;
+ }
+
+ pInfo->methInfo = methInfo;
+ pInfo->ilCallerHandle = pParam->pThis->info.compMethodHnd;
+ pInfo->clsHandle = clsHandle;
+ pInfo->exactContextHnd = pParam->exactContextHnd;
+ pInfo->retExpr = nullptr;
+ pInfo->dwRestrictions = dwRestrictions;
+ pInfo->preexistingSpillTemp = BAD_VAR_NUM;
+ pInfo->clsAttr = clsAttr;
+ pInfo->methAttr = pParam->methAttr;
+ pInfo->initClassResult = initClassResult;
+ pInfo->fncRetType = fncRetType;
+ pInfo->exactContextNeedsRuntimeLookup = false;
+
+ // Note exactContextNeedsRuntimeLookup is reset later on,
+ // over in impMarkInlineCandidate.
*(pParam->ppInlineCandidateInfo) = pInfo;
@@ -19488,6 +19525,60 @@ BOOL Compiler::impInlineIsGuaranteedThisDerefBeforeAnySideEffects(GenTree* ad
// callInfo -- call info from VM
//
// Notes:
+// Mostly a wrapper for impMarkInlineCandidateHelper that also undoes
+// guarded devirtualization for virtual calls where the method we'd
+// devirtualize to cannot be inlined.
+
+void Compiler::impMarkInlineCandidate(GenTree* callNode,
+ CORINFO_CONTEXT_HANDLE exactContextHnd,
+ bool exactContextNeedsRuntimeLookup,
+ CORINFO_CALL_INFO* callInfo)
+{
+ GenTreeCall* call = callNode->AsCall();
+
+ // Do the actual evaluation
+ impMarkInlineCandidateHelper(call, exactContextHnd, exactContextNeedsRuntimeLookup, callInfo);
+
+ // If this call is an inline candidate or is not a guarded devirtualization
+ // candidate, we're done.
+ if (call->IsInlineCandidate() || !call->IsGuardedDevirtualizationCandidate())
+ {
+ return;
+ }
+
+ // If we can't inline the call we'd guardedly devirtualize to,
+ // we undo the guarded devirtualization, as the benefit from
+ // just guarded devirtualization alone is likely not worth the
+ // extra jit time and code size.
+ //
+ // TODO: it is possibly interesting to allow this, but requires
+ // fixes elsewhere too...
+ JITDUMP("Revoking guarded devirtualization candidacy for call [%06u]: target method can't be inlined\n",
+ dspTreeID(call));
+
+ call->ClearGuardedDevirtualizationCandidate();
+
+ // If we have a stub address, restore it back into the union that it shares
+ // with the candidate info.
+ if (call->IsVirtualStub())
+ {
+ JITDUMP("Restoring stub addr %p from guarded devirt candidate info\n",
+ call->gtGuardedDevirtualizationCandidateInfo->stubAddr);
+ call->gtStubCallStubAddr = call->gtGuardedDevirtualizationCandidateInfo->stubAddr;
+ }
+}
+
+//------------------------------------------------------------------------
+// impMarkInlineCandidateHelper: determine if this call can be subsequently
+// inlined
+//
+// Arguments:
+// callNode -- call under scrutiny
+// exactContextHnd -- context handle for inlining
+// exactContextNeedsRuntimeLookup -- true if context required runtime lookup
+// callInfo -- call info from VM
+//
+// Notes:
// If callNode is an inline candidate, this method sets the flag
// GTF_CALL_INLINE_CANDIDATE, and ensures that helper methods have
// filled in the associated InlineCandidateInfo.
@@ -19497,10 +19588,10 @@ BOOL Compiler::impInlineIsGuaranteedThisDerefBeforeAnySideEffects(GenTree* ad
// method may be marked as "noinline" to short-circuit any
// future assessments of calls to this method.
-void Compiler::impMarkInlineCandidate(GenTree* callNode,
- CORINFO_CONTEXT_HANDLE exactContextHnd,
- bool exactContextNeedsRuntimeLookup,
- CORINFO_CALL_INFO* callInfo)
+void Compiler::impMarkInlineCandidateHelper(GenTreeCall* call,
+ CORINFO_CONTEXT_HANDLE exactContextHnd,
+ bool exactContextNeedsRuntimeLookup,
+ CORINFO_CALL_INFO* callInfo)
{
// Let the strategy know there's another call
impInlineRoot()->m_inlineStrategy->NoteCall();
@@ -19525,7 +19616,6 @@ void Compiler::impMarkInlineCandidate(GenTree* callNode,
return;
}
- GenTreeCall* call = callNode->AsCall();
InlineResult inlineResult(this, call, nullptr, "impMarkInlineCandidate");
// Don't inline if not optimizing root method
@@ -19562,14 +19652,20 @@ void Compiler::impMarkInlineCandidate(GenTree* callNode,
if (call->IsVirtual())
{
- inlineResult.NoteFatal(InlineObservation::CALLSITE_IS_NOT_DIRECT);
- return;
+ // Allow guarded devirt calls to be treated as inline candidates,
+ // but reject all other virtual calls.
+ if (!call->IsGuardedDevirtualizationCandidate())
+ {
+ inlineResult.NoteFatal(InlineObservation::CALLSITE_IS_NOT_DIRECT);
+ return;
+ }
}
/* Ignore helper calls */
if (call->gtCallType == CT_HELPER)
{
+ assert(!call->IsGuardedDevirtualizationCandidate());
inlineResult.NoteFatal(InlineObservation::CALLSITE_IS_CALL_TO_HELPER);
return;
}
@@ -19585,17 +19681,27 @@ void Compiler::impMarkInlineCandidate(GenTree* callNode,
* restricts the inliner to non-expanding inlines. I removed the check to allow for non-expanding
* inlining in throw blocks. I should consider the same thing for catch and filter regions. */
- CORINFO_METHOD_HANDLE fncHandle = call->gtCallMethHnd;
+ CORINFO_METHOD_HANDLE fncHandle;
unsigned methAttr;
- // Reuse method flags from the original callInfo if possible
- if (fncHandle == callInfo->hMethod)
+ if (call->IsGuardedDevirtualizationCandidate())
{
- methAttr = callInfo->methodFlags;
+ fncHandle = call->gtGuardedDevirtualizationCandidateInfo->guardedMethodHandle;
+ methAttr = info.compCompHnd->getMethodAttribs(fncHandle);
}
else
{
- methAttr = info.compCompHnd->getMethodAttribs(fncHandle);
+ fncHandle = call->gtCallMethHnd;
+
+ // Reuse method flags from the original callInfo if possible
+ if (fncHandle == callInfo->hMethod)
+ {
+ methAttr = callInfo->methodFlags;
+ }
+ else
+ {
+ methAttr = info.compCompHnd->getMethodAttribs(fncHandle);
+ }
}
#ifdef DEBUG
@@ -19695,14 +19801,13 @@ void Compiler::impMarkInlineCandidate(GenTree* callNode,
return;
}
- // The old value should be NULL
- assert(call->gtInlineCandidateInfo == nullptr);
+ // The old value should be null OR this call should be a guarded devirtualization candidate.
+ assert((call->gtInlineCandidateInfo == nullptr) || call->IsGuardedDevirtualizationCandidate());
- // The new value should not be NULL.
+ // The new value should not be null.
assert(inlineCandidateInfo != nullptr);
inlineCandidateInfo->exactContextNeedsRuntimeLookup = exactContextNeedsRuntimeLookup;
-
- call->gtInlineCandidateInfo = inlineCandidateInfo;
+ call->gtInlineCandidateInfo = inlineCandidateInfo;
// Mark the call node as inline candidate.
call->gtFlags |= GTF_CALL_INLINE_CANDIDATE;
@@ -19837,6 +19942,7 @@ bool Compiler::IsMathIntrinsic(GenTree* tree)
// methodFlags -- [IN/OUT] flags for the method to call. Updated iff call devirtualized.
// contextHandle -- [IN/OUT] context handle for the call. Updated iff call devirtualized.
// exactContextHnd -- [OUT] updated context handle iff call devirtualized
+// isLateDevirtualization -- if devirtualization is happening after importation
//
// Notes:
// Virtual calls in IL will always "invoke" the base class method.
@@ -19861,11 +19967,16 @@ bool Compiler::IsMathIntrinsic(GenTree* tree)
// to instead make a local copy. If that is doable, the call is
// updated to invoke the unboxed entry on the local copy.
//
+// When guarded devirtualization is enabled, this method will mark
+// calls as guarded devirtualization candidates, if the type of `this`
+// is not exactly known, and there is a plausible guess for the type.
+
void Compiler::impDevirtualizeCall(GenTreeCall* call,
CORINFO_METHOD_HANDLE* method,
unsigned* methodFlags,
CORINFO_CONTEXT_HANDLE* contextHandle,
- CORINFO_CONTEXT_HANDLE* exactContextHandle)
+ CORINFO_CONTEXT_HANDLE* exactContextHandle,
+ bool isLateDevirtualization)
{
assert(call != nullptr);
assert(method != nullptr);
@@ -20031,16 +20142,67 @@ void Compiler::impDevirtualizeCall(GenTreeCall* call,
}
#endif // defined(DEBUG)
- // Bail if obj class is an interface.
+ // See if the jit's best type for `obj` is an interface.
// See for instance System.ValueTuple`8::GetHashCode, where lcl 0 is System.IValueTupleInternal
// IL_021d: ldloc.0
// IL_021e: callvirt instance int32 System.Object::GetHashCode()
+ //
+ // If so, we can't devirtualize, but we may be able to do guarded devirtualization.
if ((objClassAttribs & CORINFO_FLG_INTERFACE) != 0)
{
- JITDUMP("--- obj class is interface, sorry\n");
+ // If we're called during early devirtualiztion, attempt guarded devirtualization
+ // if there's currently just one implementing class.
+ if (exactContextHandle == nullptr)
+ {
+ JITDUMP("--- obj class is interface...unable to dervirtualize, sorry\n");
+ return;
+ }
+
+ CORINFO_CLASS_HANDLE uniqueImplementingClass = NO_CLASS_HANDLE;
+
+ // info.compCompHnd->getUniqueImplementingClass(objClass);
+
+ if (uniqueImplementingClass == NO_CLASS_HANDLE)
+ {
+ JITDUMP("No unique implementor of interface %p (%s), sorry\n", objClass, objClassName);
+ return;
+ }
+
+ JITDUMP("Only known implementor of interface %p (%s) is %p (%s)!\n", objClass, objClassName,
+ uniqueImplementingClass, eeGetClassName(uniqueImplementingClass));
+
+ bool guessUniqueInterface = true;
+
+ INDEBUG(guessUniqueInterface = (JitConfig.JitGuardedDevirtualizationGuessUniqueInterface() > 0););
+
+ if (!guessUniqueInterface)
+ {
+ JITDUMP("Guarded devirt for unique interface implementor is not enabled, sorry\n");
+ return;
+ }
+
+ // Ask the runtime to determine the method that would be called based on the guessed-for type.
+ CORINFO_CONTEXT_HANDLE ownerType = *contextHandle;
+ CORINFO_METHOD_HANDLE uniqueImplementingMethod =
+ info.compCompHnd->resolveVirtualMethod(baseMethod, uniqueImplementingClass, ownerType);
+
+ if (uniqueImplementingMethod == nullptr)
+ {
+ JITDUMP("Can't figure out which method would be invoked, sorry\n");
+ return;
+ }
+
+ JITDUMP("Interface call would invoke method %s\n", eeGetMethodName(uniqueImplementingMethod, nullptr));
+ DWORD uniqueMethodAttribs = info.compCompHnd->getMethodAttribs(uniqueImplementingMethod);
+ DWORD uniqueClassAttribs = info.compCompHnd->getClassAttribs(uniqueImplementingClass);
+
+ addGuardedDevirtualizationCandidate(call, uniqueImplementingMethod, uniqueImplementingClass,
+ uniqueMethodAttribs, uniqueClassAttribs);
return;
}
+ // If we get this far, the jit has a lower bound class type for the `this` object being used for dispatch.
+ // It may or may not know enough to devirtualize...
if (isInterface)
{
assert(call->IsVirtualStub());
@@ -20054,6 +20216,9 @@ void Compiler::impDevirtualizeCall(GenTreeCall* call,
// If we failed to get a handle, we can't devirtualize. This can
// happen when prejitting, if the devirtualization crosses
// servicing bubble boundaries.
+ //
+ // Note if we have some way of guessing a better and more likely type we can do something similar to the code
+ // above for the case where the best jit type is an interface type.
if (derivedMethod == nullptr)
{
JITDUMP("--- no derived method, sorry\n");
@@ -20068,7 +20233,7 @@ void Compiler::impDevirtualizeCall(GenTreeCall* call,
const char* derivedClassName = "?derivedClass";
const char* derivedMethodName = "?derivedMethod";
- const char* note = "speculative";
+ const char* note = "inexact or not final";
if (isExact)
{
note = "exact";
@@ -20093,29 +20258,41 @@ void Compiler::impDevirtualizeCall(GenTreeCall* call,
}
#endif // defined(DEBUG)
- if (!isExact && !objClassIsFinal && !derivedMethodIsFinal)
- {
- // Type is not exact, and neither class or method is final.
- //
- // We could speculatively devirtualize, but there's no
- // reason to believe the derived method is the one that
- // is likely to be invoked.
- //
- // If there's currently no further overriding (that is, at
- // the time of jitting, objClass has no subclasses that
- // override this method), then perhaps we'd be willing to
- // make a bet...?
- JITDUMP(" Class not final or exact, method not final, no devirtualization\n");
- return;
- }
+ const bool canDevirtualize = isExact || objClassIsFinal || (!isInterface && derivedMethodIsFinal);
- // For interface calls we must have an exact type or final class.
- if (isInterface && !isExact && !objClassIsFinal)
+ if (!canDevirtualize)
{
- JITDUMP(" Class not final or exact for interface, no devirtualization\n");
+ JITDUMP(" Class not final or exact%s\n", isInterface ? "" : ", and method not final");
+
+ // Have we enabled guarded devirtualization by guessing the jit's best class?
+ bool guessJitBestClass = true;
+ INDEBUG(guessJitBestClass = (JitConfig.JitGuardedDevirtualizationGuessBestClass() > 0););
+
+ if (!guessJitBestClass)
+ {
+ JITDUMP("No guarded devirt: guessing for jit best class disabled\n");
+ return;
+ }
+
+ // Don't try guarded devirtualiztion when we're doing late devirtualization.
+ if (isLateDevirtualization)
+ {
+ JITDUMP("No guarded devirt during late devirtualization\n");
+ return;
+ }
+
+ // We will use the class that introduced the method as our guess
+ // for the runtime class of othe object.
+ CORINFO_CLASS_HANDLE derivedClass = info.compCompHnd->getMethodClass(derivedMethod);
+
+ // Try guarded devirtualization.
+ addGuardedDevirtualizationCandidate(call, derivedMethod, derivedClass, derivedMethodAttribs, objClassAttribs);
return;
}
+ // All checks done. Time to transform the call.
+ assert(canDevirtualize);
+
JITDUMP(" %s; can devirtualize\n", note);
// Make the updates.
@@ -20429,10 +20606,25 @@ private:
{
GenTree* retExpr = *pRetExpr;
assert(retExpr->OperGet() == GT_RET_EXPR);
- JITDUMP("Store return expression %u as a local var.\n", retExpr->gtTreeID);
- unsigned tmp = comp->lvaGrabTemp(true DEBUGARG("spilling ret_expr"));
+ const unsigned tmp = comp->lvaGrabTemp(true DEBUGARG("spilling ret_expr"));
+ JITDUMP("Storing return expression [%06u] to a local var V%02u.\n", comp->dspTreeID(retExpr), tmp);
comp->impAssignTempGen(tmp, retExpr, (unsigned)Compiler::CHECK_SPILL_NONE);
*pRetExpr = comp->gtNewLclvNode(tmp, retExpr->TypeGet());
+
+ if (retExpr->TypeGet() == TYP_REF)
+ {
+ assert(comp->lvaTable[tmp].lvSingleDef == 0);
+ comp->lvaTable[tmp].lvSingleDef = 1;
+ JITDUMP("Marked V%02u as a single def temp\n", tmp);
+
+ bool isExact = false;
+ bool isNonNull = false;
+ CORINFO_CLASS_HANDLE retClsHnd = comp->gtGetClassHandle(retExpr, &isExact, &isNonNull);
+ if (retClsHnd != nullptr)
+ {
+ comp->lvaSetClass(tmp, retClsHnd, isExact);
+ }
+ }
}
private:
@@ -20448,8 +20640,108 @@ private:
//
void Compiler::addFatPointerCandidate(GenTreeCall* call)
{
+ JITDUMP("Marking call [%06u] as fat pointer candidate\n", dspTreeID(call));
setMethodHasFatPointer();
call->SetFatPointerCandidate();
SpillRetExprHelper helper(this);
helper.StoreRetExprResultsInArgs(call);
}
+
+//------------------------------------------------------------------------
+// addGuardedDevirtualizationCandidate: potentially mark the call as a guarded
+// devirtualization candidate
+//
+// Notes:
+//
+// We currently do not mark calls as candidates when prejitting. This was done
+// to simplify bringing up the associated transformation. It is worth revisiting
+// if we think we can come up with a good guess for the class when prejitting.
+//
+// Call sites in rare or unoptimized code, and calls that require cookies are
+// also not marked as candidates.
+//
+// As part of marking the candidate, the code spills GT_RET_EXPRs anywhere in any
+// child tree, because and we need to clone all these trees when we clone the call
+// as part of guarded devirtualization, and these IR nodes can't be cloned.
+//
+// Arguments:
+// call - potentual guarded devirtialization candidate
+// methodHandle - method that will be invoked if the class test succeeds
+// classHandle - class that will be tested for at runtime
+// methodAttr - attributes of the method
+// classAttr - attributes of the class
+//
+void Compiler::addGuardedDevirtualizationCandidate(GenTreeCall* call,
+ CORINFO_METHOD_HANDLE methodHandle,
+ CORINFO_CLASS_HANDLE classHandle,
+ unsigned methodAttr,
+ unsigned classAttr)
+{
+ // This transformation only makes sense for virtual calls
+ assert(call->IsVirtual());
+
+ // Only mark calls if the feature is enabled.
+ const bool isEnabled = JitConfig.JitEnableGuardedDevirtualization() > 0;
+
+ if (!isEnabled)
+ {
+ JITDUMP("NOT Marking call [%06u] as guarded devirtualization candidate -- disabled by jit config\n",
+ dspTreeID(call));
+ return;
+ }
+
+ // Bail when prejitting. We only do this for jitted code.
+ // We shoud revisit this if we think we can come up with good class guesses when prejitting.
+ if (opts.jitFlags->IsSet(JitFlags::JIT_FLAG_PREJIT))
+ {
+ JITDUMP("NOT Marking call [%06u] as guarded devirtualization candidate -- prejitting", dspTreeID(call));
+ return;
+ }
+
+ // Bail if not optimizing or the call site is very likely cold
+ if (compCurBB->isRunRarely() || opts.compDbgCode || opts.MinOpts())
+ {
+ JITDUMP("NOT Marking call [%06u] as guarded devirtualization candidate -- rare / dbg / minopts\n",
+ dspTreeID(call));
+ return;
+ }
+
+ // CT_INDRECT calls may use the cookie, bail if so...
+ //
+ // If transforming these provides a benefit, we could save this off in the same way
+ // we save the stub address below.
+ if ((call->gtCallType == CT_INDIRECT) && (call->gtCall.gtCallCookie != nullptr))
+ {
+ return;
+ }
+
+ // We're all set, proceed with candidate creation.
+ JITDUMP("Marking call [%06u] as guarded devirtualization candidate; will guess for class %s\n", dspTreeID(call),
+ eeGetClassName(classHandle));
+ setMethodHasGuardedDevirtualization();
+ call->SetGuardedDevirtualizationCandidate();
+
+ // Spill off any GT_RET_EXPR subtrees so we can clone the call.
+ SpillRetExprHelper helper(this);
+ helper.StoreRetExprResultsInArgs(call);
+
+ // Gather some information for later. Note we actually allocate InlineCandidateInfo
+ // here, as the devirtualized half of this call will likely become an inline candidate.
+ GuardedDevirtualizationCandidateInfo* pInfo = new (this, CMK_Inlining) InlineCandidateInfo;
+
+ pInfo->guardedMethodHandle = methodHandle;
+ pInfo->guardedClassHandle = classHandle;
+
+ // Save off the stub address since it shares a union with the candidate info.
+ if (call->IsVirtualStub())
+ {
+ JITDUMP("Saving stub addr %p in candidate info\n", call->gtStubCallStubAddr);
+ pInfo->stubAddr = call->gtStubCallStubAddr;
+ }
+ else
+ {
+ pInfo->stubAddr = nullptr;
+ }
+
+ call->gtGuardedDevirtualizationCandidateInfo = pInfo;
+}
diff --git a/src/jit/inline.cpp b/src/jit/inline.cpp
index a1064762ce..f523d2abbe 100644
--- a/src/jit/inline.cpp
+++ b/src/jit/inline.cpp
@@ -337,6 +337,7 @@ InlineContext::InlineContext(InlineStrategy* strategy)
, m_CodeSizeEstimate(0)
, m_Success(true)
, m_Devirtualized(false)
+ , m_Guarded(false)
, m_Unboxed(false)
#if defined(DEBUG) || defined(INLINE_DATA)
, m_Policy(nullptr)
@@ -397,18 +398,19 @@ void InlineContext::Dump(unsigned indent)
const char* inlineReason = InlGetObservationString(m_Observation);
const char* inlineResult = m_Success ? "" : "FAILED: ";
const char* devirtualized = m_Devirtualized ? " devirt" : "";
+ const char* guarded = m_Guarded ? " guarded" : "";
const char* unboxed = m_Unboxed ? " unboxed" : "";
if (m_Offset == BAD_IL_OFFSET)
{
- printf("%*s[%u IL=???? TR=%06u %08X] [%s%s%s%s] %s\n", indent, "", m_Ordinal, m_TreeID, calleeToken,
- inlineResult, inlineReason, devirtualized, unboxed, calleeName);
+ printf("%*s[%u IL=???? TR=%06u %08X] [%s%s%s%s%s] %s\n", indent, "", m_Ordinal, m_TreeID, calleeToken,
+ inlineResult, inlineReason, guarded, devirtualized, unboxed, calleeName);
}
else
{
IL_OFFSET offset = jitGetILoffs(m_Offset);
- printf("%*s[%u IL=%04d TR=%06u %08X] [%s%s%s%s] %s\n", indent, "", m_Ordinal, offset, m_TreeID, calleeToken,
- inlineResult, inlineReason, devirtualized, unboxed, calleeName);
+ printf("%*s[%u IL=%04d TR=%06u %08X] [%s%s%s%s%s] %s\n", indent, "", m_Ordinal, offset, m_TreeID,
+ calleeToken, inlineResult, inlineReason, guarded, devirtualized, unboxed, calleeName);
}
}
@@ -1205,6 +1207,7 @@ InlineContext* InlineStrategy::NewSuccess(InlineInfo* inlineInfo)
calleeContext->m_Observation = inlineInfo->inlineResult->GetObservation();
calleeContext->m_Success = true;
calleeContext->m_Devirtualized = originalCall->IsDevirtualized();
+ calleeContext->m_Guarded = originalCall->IsGuarded();
calleeContext->m_Unboxed = originalCall->IsUnboxed();
#if defined(DEBUG) || defined(INLINE_DATA)
@@ -1266,6 +1269,7 @@ InlineContext* InlineStrategy::NewFailure(GenTreeStmt* stmt, InlineResult* inlin
failedContext->m_Callee = inlineResult->GetCallee();
failedContext->m_Success = false;
failedContext->m_Devirtualized = originalCall->IsDevirtualized();
+ failedContext->m_Guarded = originalCall->IsGuarded();
failedContext->m_Unboxed = originalCall->IsUnboxed();
assert(InlIsValidObservation(failedContext->m_Observation));
diff --git a/src/jit/inline.h b/src/jit/inline.h
index 551341e2e6..e94765a74c 100644
--- a/src/jit/inline.h
+++ b/src/jit/inline.h
@@ -494,22 +494,36 @@ private:
bool m_Reported;
};
+// GuardedDevirtualizationCandidateInfo provides information about
+// a potential target of a virtual call.
+
+struct GuardedDevirtualizationCandidateInfo
+{
+ CORINFO_CLASS_HANDLE guardedClassHandle;
+ CORINFO_METHOD_HANDLE guardedMethodHandle;
+ void* stubAddr;
+};
+
// InlineCandidateInfo provides basic information about a particular
// inline candidate.
+//
+// It is a superset of GuardedDevirtualizationCandidateInfo: calls
+// can start out as GDv candidates and turn into inline candidates
-struct InlineCandidateInfo
+struct InlineCandidateInfo : public GuardedDevirtualizationCandidateInfo
{
- DWORD dwRestrictions;
CORINFO_METHOD_INFO methInfo;
- unsigned methAttr;
+ CORINFO_METHOD_HANDLE ilCallerHandle; // the logical IL caller of this inlinee.
CORINFO_CLASS_HANDLE clsHandle;
+ CORINFO_CONTEXT_HANDLE exactContextHnd;
+ GenTree* retExpr;
+ DWORD dwRestrictions;
+ unsigned preexistingSpillTemp;
unsigned clsAttr;
+ unsigned methAttr;
+ CorInfoInitClassResult initClassResult;
var_types fncRetType;
- CORINFO_METHOD_HANDLE ilCallerHandle; // the logical IL caller of this inlinee.
- CORINFO_CONTEXT_HANDLE exactContextHnd;
bool exactContextNeedsRuntimeLookup;
- CorInfoInitClassResult initClassResult;
- unsigned preexistingSpillTemp;
};
// InlArgInfo describes inline candidate argument properties.
@@ -683,6 +697,11 @@ public:
return m_Devirtualized;
}
+ bool IsGuarded() const
+ {
+ return m_Guarded;
+ }
+
bool IsUnboxed() const
{
return m_Unboxed;
@@ -703,6 +722,7 @@ private:
int m_CodeSizeEstimate; // in bytes * 10
bool m_Success : 1; // true if this was a successful inline
bool m_Devirtualized : 1; // true if this was a devirtualized call
+ bool m_Guarded : 1; // true if this was a guarded call
bool m_Unboxed : 1; // true if this call now invokes the unboxed entry
#if defined(DEBUG) || defined(INLINE_DATA)
diff --git a/src/jit/jitconfigvalues.h b/src/jit/jitconfigvalues.h
index 383264e6a3..649db4ebe6 100644
--- a/src/jit/jitconfigvalues.h
+++ b/src/jit/jitconfigvalues.h
@@ -363,6 +363,15 @@ CONFIG_INTEGER(JitEnableRemoveEmptyTry, W("JitEnableRemoveEmptyTry"), 0)
#endif // defined(FEATURE_CORECLR)
#endif // DEBUG
+// Overall master enable for Guarded Devirtualization. Currently not enabled by default.
+CONFIG_INTEGER(JitEnableGuardedDevirtualization, W("JitEnableGuardedDevirtualization"), 0)
+
+#if defined(DEBUG)
+// Various policies for GuardedDevirtualization
+CONFIG_INTEGER(JitGuardedDevirtualizationGuessUniqueInterface, W("JitGuardedDevirtualizationGuessUniqueInterface"), 1)
+CONFIG_INTEGER(JitGuardedDevirtualizationGuessBestClass, W("JitGuardedDevirtualizationGuessBestClass"), 1)
+#endif // DEBUG
+
#undef CONFIG_INTEGER
#undef CONFIG_STRING
#undef CONFIG_METHODSET
diff --git a/src/jit/lclvars.cpp b/src/jit/lclvars.cpp
index 8ce2b5956b..339f7a5ff7 100644
--- a/src/jit/lclvars.cpp
+++ b/src/jit/lclvars.cpp
@@ -2746,6 +2746,11 @@ void Compiler::lvaUpdateClass(unsigned varNum, CORINFO_CLASS_HANDLE clsHnd, bool
{
varDsc->lvClassHnd = clsHnd;
varDsc->lvClassIsExact = isExact;
+
+#if DEBUG
+ // Note we've modified the type...
+ varDsc->lvClassInfoUpdated = true;
+#endif // DEBUG
}
return;
diff --git a/src/jit/morph.cpp b/src/jit/morph.cpp
index 20e378ed16..566714f276 100644
--- a/src/jit/morph.cpp
+++ b/src/jit/morph.cpp
@@ -6614,42 +6614,59 @@ GenTree* Compiler::fgMorphField(GenTree* tree, MorphAddrContext* mac)
void Compiler::fgMorphCallInline(GenTreeCall* call, InlineResult* inlineResult)
{
- // The call must be a candiate for inlining.
- assert((call->gtFlags & GTF_CALL_INLINE_CANDIDATE) != 0);
+ bool inliningFailed = false;
- // Attempt the inline
- fgMorphCallInlineHelper(call, inlineResult);
+ // Is this call an inline candidate?
+ if (call->IsInlineCandidate())
+ {
+ // Attempt the inline
+ fgMorphCallInlineHelper(call, inlineResult);
- // We should have made up our minds one way or another....
- assert(inlineResult->IsDecided());
+ // We should have made up our minds one way or another....
+ assert(inlineResult->IsDecided());
- // If we failed to inline, we have a bit of work to do to cleanup
- if (inlineResult->IsFailure())
- {
+ // If we failed to inline, we have a bit of work to do to cleanup
+ if (inlineResult->IsFailure())
+ {
#ifdef DEBUG
- // Before we do any cleanup, create a failing InlineContext to
- // capture details of the inlining attempt.
- m_inlineStrategy->NewFailure(fgMorphStmt, inlineResult);
+ // Before we do any cleanup, create a failing InlineContext to
+ // capture details of the inlining attempt.
+ m_inlineStrategy->NewFailure(fgMorphStmt, inlineResult);
#endif
- // It was an inline candidate, but we haven't expanded it.
- if (call->gtCall.gtReturnType != TYP_VOID)
+ inliningFailed = true;
+
+ // Clear the Inline Candidate flag so we can ensure later we tried
+ // inlining all candidates.
+ //
+ call->gtFlags &= ~GTF_CALL_INLINE_CANDIDATE;
+ }
+ }
+ else
+ {
+ // This wasn't an inline candidate. So it must be a GDV candidate.
+ assert(call->IsGuardedDevirtualizationCandidate());
+
+ // We already know we can't inline this call, so don't even bother to try.
+ inliningFailed = true;
+ }
+
+ // If we failed to inline (or didn't even try), do some cleanup.
+ if (inliningFailed)
+ {
+ if (call->gtReturnType != TYP_VOID)
{
+ JITDUMP("Inlining [%06u] failed, so bashing [%06u] to NOP\n", dspTreeID(call), dspTreeID(fgMorphStmt));
+
// Detach the GT_CALL tree from the original statement by
// hanging a "nothing" node to it. Later the "nothing" node will be removed
// and the original GT_CALL tree will be picked up by the GT_RET_EXPR node.
-
noway_assert(fgMorphStmt->gtStmtExpr == call);
fgMorphStmt->gtStmtExpr = gtNewNothingNode();
}
-
- // Clear the Inline Candidate flag so we can ensure later we tried
- // inlining all candidates.
- //
- call->gtFlags &= ~GTF_CALL_INLINE_CANDIDATE;
}
}
@@ -6682,6 +6699,13 @@ void Compiler::fgMorphCallInlineHelper(GenTreeCall* call, InlineResult* result)
return;
}
+ // Re-check this because guarded devirtualization may allow these through.
+ if (gtIsRecursiveCall(call) && call->IsImplicitTailCall())
+ {
+ result->NoteFatal(InlineObservation::CALLSITE_IMPLICIT_REC_TAIL_CALL);
+ return;
+ }
+
// impMarkInlineCandidate() is expected not to mark tail prefixed calls
// and recursive tail calls as inline candidates.
noway_assert(!call->IsTailPrefixedCall());