19 files changed, 1642 insertions, 249 deletions
diff --git a/Documentation/design-docs/GuardedDevirtualization.md b/Documentation/design-docs/GuardedDevirtualization.md
new file mode 100644
index 0000000000..ba1f7bf920
--- /dev/null
+++ b/Documentation/design-docs/GuardedDevirtualization.md
@@ -0,0 +1,533 @@
+# Guarded Devirtualization
+
+## Overview
+
+Guarded devirtualization is a proposed new optimization for the JIT in .NET Core
+3.0. This document describes the motivation, initial design sketch, and highlights
+various issues needing further investigation.
+
+## Motivation
+
+The .NET Core JIT is able to do a limited amount of devirtualization for virtual
+and interface calls. This ability was added in .NET Core 2.0. To devirtualize
+the JIT must be able to demonstrate one of two things: either that it knows the
+type of some reference exactly (say because it has seen a `newobj`) or that the
+declared type of the reference is a `final` class (aka `sealed`). For virtual
+calls the JIT can also devirtualize if it can prove the method is marked as `final`.
+
+However, most of the time the JIT is unable to determine exactness or finalness
+and so devirtualization fails. Statistics show that currently only around 15% of
+virtual call sites can be devirtualized.  Result are even more pessimistic for
+interface calls, where success rates are around 5%.
+
+There are a variety of reasons for this. The JIT analysis is somewhat weak.
+Historically all the JIT cared about was whether some location held **a** reference
+type, not a specific reference type. So the current type propagation has been
+retrofitted and there are places where types just get lost. The JIT analysis
+happens quite early (during importation) and there is only minimal ability to do
+data flow analysis at this stage. So for current devirtualization the source of
+the type information and the consumption must be fairly close in the code. A
+more detailed accounting of some of the shortcomings can be found in 
+[CoreCLR#9908](https://github.com/dotnet/coreclr/issues/9908).
+
+Resolution of these issues will improve the ability of the JIT to devirtualize,
+but even the best analysis possible will still miss out on many cases. Some call
+sites are truly polymorphic. Some others are truly monomorphic but proving this
+would require sophisticated interprocedural analyses that are not practical in
+the JIT or in a system as dynamic as the CLR. And some sites are monomorphic in
+practice but potentially polymorphic.
+
+As an alternative, when devirtualization fails, the JIT can perform *guarded
+devirtualization*. Here the JIT creates an `if-then-else` block set in place of
+a virtual or interface call and inserts a runtime type test (or similar) into
+the `if` -- the "guard". If the guard test succeeds the JIT knows the type of
+the reference, so the `then` block can directly invoke the method corresponding
+to that type. If the test fails then the `else` block is executed and this
+contains the original virtual or interface call.
+
+The upshot is that the JIT conditionally gains the benefit of devirtualization at
+the expense of increased code size, longer JIT times, and slightly longer code
+paths around the call. So long as the JIT's guess at the type is somewhat
+reasonable, this optimization can improve performance.
+
+## Opportunity
+
+One might imagine that the JIT's guess about the type of the reference has to be
+pretty good for devirtualization to pay off. Somewhat surprisingly, at least based
+on our initial results, that is not the case.
+
+### Virtual Calls: The Two-Class Case
+
+Given these class declarations:
+```C#
+class B
+{
+   public virtual int F() { return 33;  }
+}
+
+class D : B
+{
+    public override int F() { return 44; }
+}
+```
+Suppose we have an array `B[]` that is randomly filled with instances of `B` and
+`D` and each element is class `B` with probability `p`.  We time how long
+it takes to invoke `F` on each member of the array (note the JIT will not ever 
+be able to devirtualize these calls), and plot the times as a function of `p`.
+The result is something like the following:
+
+![two classes baseline perf](TwoClassesBaseline.JPG)
+
+Modern hardware includes an indirect branch target predictor and we can see it
+in action here. When the array element type is predictable (`p` very close to
+zero or very close to 1) performance is better. When the element type is
+unpredictable (`p` near 0.5) performance is quite a bit worse.
+
+From this we can see that a correctly predicted virtual call requires about
+19 time units and worst case incorrect prediction around 55 time units. There is
+some timing overhead here too so the real costs are a bit lower.
+
+Now imagine we update the JIT to do guarded devirtualization and check if the
+element is indeed type `B`. If so the JIT can call `B.F` directly and in our 
+prototype the JIT will also inline the call. So we would expect that if the
+element types are mostly `B`s (that is if `p` is near 1.0) we'd see very good
+performance, and if the element type is mostly `D` (that is `p` near 0.0)
+performance should perhaps slightly worse than the un-optimized case as there is
+now extra code to run check before the call.
+
+![two classes devirt perf](TwoClassesDevirt.JPG)
+
+However as you can see the performance of devirtualized case (blue line) is as
+good or better than the un-optimized case for all values of `p`. This is perhaps
+unexpected and deserves some explanation.
+
+Recall that modern hardware also includes a branch predictor. For small or large
+values of `p` this predictor will correctly guess whether the test added by the
+JIT will resolve to the `then` or `else` case. For small values of `p` the JIT
+guess will be wrong and control will flow to the `else` block. But unlike the
+original example, the indirect call here will only see instances of type `D` and
+so the indirect branch predictor will work extremely well. So the overhead for
+the small `p` case is similar to the well-predicted indirect case without guarded
+devirtualization. As `p` increases the branch predictor starts to mispredict and
+that costs some cycles. But when it mispredicts control reaches the `then` block
+which executes the inlined call. So the cost of misprediction is offset by the
+faster execution and the cost stays relatively flat. 
+
+As `p` passes 0.5 the branch predictor flips its prediction to prefer the `then`
+case. As before mispredicts are costly and send us down the `else` path but there
+we still execute a correctly predicted indirect call.
+
+And as `p` approaches 1.0 the cost falls as the branch predictor is almost always
+correct and so the cost is simply that of the inlined call.
+
+So oddly enough the guarded devirtualization case shown here does not require any
+sort of perf tradeoff. The JIT is better off guessing the more likely case but
+even guessing the less likely case can pay off and doesn't hurt performance.
+
+One might suspect at this point that the two class case is a special case and that
+the results do not hold up in more complex cases. More on that shortly.
+
+Before moving on, we should point out that virtual calls in the current
+CLR are a bit more expensive than in C++, because the CLR uses a two-level method
+table. That is, the indirect call sequence is something like:
+```asm
+000095 mov      rax, qword ptr [rcx]                 ; fetch method table
+000098 mov      rax, qword ptr [rax+72]              ; fetch proper chunk
+00009C call     qword ptr [rax+32]B:F():int:this     ; call indirect
+```
+This is a chain of 3 dependent loads and so best-case will require at least 3x
+the best cache latency (plus any indirect prediction overhead).
+
+So the virtual call costs for the CLR are high. The chunked method table design
+was adopted to save space (chunks can be shared by different classes) at the
+expense of some performance. And this apparently makes guarded devirtualization
+pay off over a wider range of class distributions than one might expect.
+
+And for completeness, the full guarded `if-then-else` sequence measured above is:
+```asm
+00007A mov      rcx, gword ptr [rsi+8*rcx+16]        ; fetch array element
+00007F mov      rax, 0x7FFC9CFB4A90                  ; B's method table
+000089 cmp      qword ptr [rcx], rax                 ; method table test
+00008C jne      SHORT G_M30756_IG06                  ; jump if class is not B
+
+00008E mov      eax, 33                              ; inlined B.F
+000093 jmp      SHORT G_M30756_IG07
+
+G_M30756_IG06:
+
+000095 mov      rax, qword ptr [rcx]                 ; fetch method table
+000098 mov      rax, qword ptr [rax+72]              ; fetch proper chunk
+00009C call     qword ptr [rax+32]B:F():int:this     ; call indirect
+
+G_M30756_IG07:
+```
+Note there is a redundant load of the method table (hidden in the `cmp`) that
+could be eliminated with a bit more work on the prototype. So guarded
+devirtualization perf could potentially be even better than is shown above,
+especially for smaller values of `p`.
+
+### Virtual Calls: The Three-Class Case
+
+Now to return to the question we asked above: is there something about the two
+class case that made guarded devirtualization especially attractive? Read on.
+
+Suppose we introduce a third class into the mix and repeat the above measurement.
+There are now two probabilities in play: `p`, the probability that the element
+has class `B`, and `p1`, the probability that the element has class `D`, and
+there is a third class `E`. To avoid introducing a 3D plot we'll first simply
+average the results for the various values of `p1` and plot performance as a
+function of `p`:
+
+![three classes devirt perf](ThreeClassesDevirt.JPG)
+
+The right-hand side (`p` near 1.0) looks a lot like the previous chart. This is
+not surprising as there are relatively few instances of that third class. But the
+middle and left hand side differ and are more costly.
+
+For the un-optimized case (orange) the difference is directly attributable to
+the performance of the indirect ranch predictor. Even when `p` is small there
+are still two viable branch targets (on average) and some some degree of indirect
+misprediction.
+
+For the optimized case we now see that guarded devirtualization performs worse
+than no optimization if the JIT's guess is completely wrong. The penalty is not
+that bad because the JIT-introduced branch is predictable. But even at very
+modest values of `p` guarded devirtualization starts to win out.
+
+Because we've averaged over `p1` you might suspect that we're hiding something.
+The following chart shows the min and max values as well as the average, and also
+shows the two-class result (dashed lines).
+
+![three classes devirt perf ranges](ThreeClassesDevirtFull.JPG)
+
+You can see the minimum values are very similar to the two class case; these
+are cases where the `p1` is close to 0 or close to 1. And that makes sense because
+if there really are only two classes despite the potential of there being three
+then we'd expect to see similar results as in the case where there only can be
+two classes.
+
+And as noted above, if `p` is high enough then the curves also converge to the
+two class case, as the relative mixture of `D` and `E` is doesn't matter: the
+predominance of `B` wins out.
+
+For low values of `p` the actual class at the call site is some mixture of `D`
+and `E`. Here's some detail (the x axis now shows `p1` and `p` as upper and
+lower values respectively).
+
+![three classes devirt perf detail](ThreeClassesDevirtDetail.JPG)
+
+The worst case for perf for both is when the mixture of `D` and `E` is
+unpredictably 50-50 and there are no `B`s. Once we mix in just 10% of `B` then
+guarded devirt performs better no matter what distribution we have for the other
+two classes. Worst case overhead -- where the JIT guesses a class that never
+appears, and the other classes are evenly distributed -- is around 20%.
+
+So it seems reasonable to say that so long as the JIT can make a credible guess
+about the possible class -- say a guess that is right at least 10% of the time
+-- then there is quite likely a performance benefit to guarded
+devirtualization for virtual calls.
+
+We'll need to verify this with more scenarios, but these initial results are
+certainly encouraging.
+
+### Virtual Calls: Testing for Multiple Cases
+
+One might deduce from the above that if there are two likely candidates the JIT
+should test for each. This is certainly a possibility and in C++ compilers that
+do indirect call profiling there are cases where multiple tests are considered
+a good idea. But there's also additional code size and another branch. 
+
+This is something we'll look into further.
+
+### Interface Calls: The Two Class Case
+
+Interface calls on the CLR are implemented via [Virtual Stub Dispatch](
+https://github.com/dotnet/coreclr/blob/master/Documentation/botr/virtual-stub-dispatch.md
+)  (aka VSD). Calls are made through an indirection cell that initially points
+at a lookup stub. On the first call, the interface target is identified from the
+object's method table and the lookup stub is replaced with a dispatch stub that
+checks for that specific method table in a manner quite similar to guarded
+devirtualization.
+
+If the method table check fails a counter is incremented, and once the counter
+reaches a threshold the dispatch stub is replaced with a resolve stub that looks
+up the right target in a process-wide hash table.
+
+For interface call sites that are monomorphic, the VSD mechanism (via the dispatch
+stub) executes the following code sequence (here for x64)
+```asm
+; JIT-produced code
+;
+; set up R11 with interface target info
+mov R11, ...         ; additional VSD info for call
+mov RCX, ...         ; dispatch target object
+cmp [rcx], rcx       ; null check (unnecessary)
+call [addr]          ; call indirect through indir cell
+
+; dispatch stub
+cmp [RCX], targetMT  ; check for right method table
+jne DISPATCH-FAIL    ; bail to resolve stub if check fails (uses R11 info)
+jmp targetCode       ; else "tail call" the right method
+```
+
+At first glance it might appear that adding guarded devirtualization on top of 
+VSD may not provide much benefit for monomorphic sites. However the guarded
+devirtualization test doesn't use an indirection cell and doesn't require R11
+setup, may be able to optimize away the null check, and opens the door for
+inlining. So it should be slightly cheaper on average and significantly cheaper
+in some cases.
+
+(Note [CoreCLR#1422](https://github.com/dotnet/coreclr/issues/14222) indicates
+we should be able to optimize away the null check in any case).
+
+If the guarded tests fails we've filtered out one method table the dispatch cell
+now works well even if a call site alternates between two classes. So we'd expect
+the combination of guarded devirtualization and VSD to perform well on the two
+class test and only show limitations when faced with mixtures of three or more
+classes.
+
+If the guard test always fails we have the up-front cost for the vtable fetch
+(which should amortize pretty well with the subsequent fetch in the) stub plus
+the predicted not taken branch. So we'd expect the cost for the two-class cases
+where the JIT's prediction is always wrong to be a bit higher).
+
+The graph below shows the measured results. To makes sure we're not overly impacted
+by residual VSD state we use a fresh call site for each value of p. The solid
+orange line is the current cost. The dashed orange line is the corresponding cost
+for a virtual call with the same value of p. The solid blue line is the cost with
+an up-front guarded test. As noted there is some slowdown when the JIT always
+guesses the wrong class, but the break-even point (not shown) is at a relatively
+small probability of a correct guess.
+
+![two classes interface devirt](TwoClassesInterface.JPG)
+
+### Interface Calls: The Three Class Case
+
+As with virtual calls you may strongly suspect the two class case for interface
+calls is special. And you'd be right.
+
+If we mix a third class in as we did above, we see similar changes in the
+performance mix for interface calls, as seen below. But also as with virtual calls
+the JIT's guess doesn't have to be all that good to see payoffs. At around 10%
+correct, guessing wins on average, and around 30% correct guessing is always a 
+perf win.
+
+![three classes interface devirt](ThreeClassesInterface.JPG)
+
+### Delegate Speculation
+
+While we have been discussing this topic in the context of virtual calls, the
+method is general and can be applied to indirect calls as well. Here the guard
+test may just test for a particular function rather than a type.
+
+`Delegate.Invoke` is a special method that can eventually turns into an indirect
+call. The JIT could speculate about the possible target of this call. Choosing
+a good target here would require some kind of indirect call profiling.
+
+### Calli Speculation
+
+Indirect calls also arise via the `calli` opcode. As with delegates, choosing a
+target here likely requires specialized profiling.
+
+### Costs
+
+Given the optimistic take on performance, it is important to remember that
+there are also some costs involved to guarded devirtualization: increased code
+size and increased JIT time. There may also be some second-order effects on
+the local code generation as we've introduced control flow into the method where
+it didn't exist previously.
+
+A naive implementation that aggressively performs guarded devirtualization
+increases code size overall by about 5% as measured by PMI. JIT time increase
+was not measured but should be in that same ballpark. Some assemblies see code
+size increasing by as much as 12%.
+
+However the guarded devirtualization only kicks in for about 15% of the methods.
+So the average relative size increase in a method with virtual calls is probably
+more like 33%.
+
+There may be some inefficiencies in the current prototype that can be fixed to
+reduce the code size impact. Aside from the extra method table fetch noted above 
+the duplicated calls have the same sets of arguments and so we might be able to
+amortize argument evaluation costs better. And there are some complexities around
+handling return values (especially for implicit by-reference structures) that
+likewise might be able to be tightened up.
+
+Nevertheless, blindly optimizing all virtual calls with guarded devirtualization
+is not likely the right approach. Something more selective is almost certainly
+needed.
+
+However we have done code-expanding optimizations somewhat blindly before, and
+we could contain the size growth risk by restricting this optimization to Tier1.
+Also PMI can overstate size impact seen in real scenarios as it may over-count
+the impact of changes in methods that are always inlined. So we should look at
+size increases from some actual scenarios.
+
+And perhaps I'll look at the size impact of loop cloning as a precedent.
+
+## Implementation Considerations
+
+To get the data above and a better feel for the challenges involved we have
+implemented a prototype. It is currently located on this branch:
+[GuardedDevirtFoundations](https://github.com/AndyAyersMS/coreclr/tree/GuardedDevirtFoundations).
+
+The prototype can introduce guarded devirtualization for some virtual and
+interface calls. It supports inlining of the directly invoked method. It uses
+the JIT's "best known type" as the class to predict. It also anticipates being
+able to query the runtime for implementing classes of an interface.
+
+### Phase Ordering
+
+For the most part, devirtualization is done very early on in the JIT, during
+importation. This allows devirtualized calls to subsequently be inlined, and for
+devirtualization of call sites in inlinees to take advantage of type information
+propagating down into the inlinee from inlined arguments.
+
+We want those same properties to hold for guarded devirtualization candidates.
+So conceptually the transformation should happen in the same place. However it is
+not possible to introduce new control flow in the importer (ignoring for the moment
+the possibility of using question ops). So the actual transformation must be
+deferred until sometime after the importer runs and before the inliner runs.
+
+This deferral is a bit problematic as some key bits of importer state are needed
+to query the runtime about the properties of a call target. So if we defer the
+transformation we need to somehow capture the data needed for these queries and
+make it available later. The current prototype uses (abuses?) the inline
+candidate information for this. As part of this we require that all speculative
+devirtualization sites be treated as inline candidates, at least initially.
+This has the side effect of hoisting the call to be a top level (statement)
+expression and introduces a return value placeholder.
+
+We currently already have a similar transformation in the JIT, the "fat calli"
+transformation needed on CoreRT. This transformation runs at the right time --
+after the importer and before the inliner -- and introduces the right kind of 
+`if-then-else` control flow structure. So the thought is to generalize this to
+handle guarded devirtualization as well.
+
+### Recognition
+
+In the prototype, candidates are recognized during the initial importer driven
+call to `impDevirtualizeCall`. If the only reason devirtualization fails is lack
+of exactness, then the call is marked as a guarded devirtualization candidate.
+
+### Devirtualization
+
+To produce the direct call the prototype updates the `this` passed in the `then`
+version of the call so it has the exact predicted type. It then re-invokes 
+`impDevirtualizeCall` which should now succeed as the type is now exactly 
+known. The benefit of reuse here is that certain special cases of devirtualization
+are now more likely to be handled.
+
+### Inline Candidacy
+
+The prototype currently sets up all virtual and interface calls as potential
+inline candidates. One open question is whether it is worth doing guarded
+devirtualization simply to introduce a direct call. As an alternative we could 
+insist that the directly called method also be something that is potentially
+inlineable. One can argue that call overhead matters much more for small methods
+that are also likely good inline candidates.
+
+The inline candidate info is based on the apparent method invoked at the virtual
+site. This is the base method, the one that introduces the virtual slot. So if we
+speculatively check for some class and that class overrides, we need to somehow
+update the inline info. How to best do this is unclear.
+
+### Return Values
+
+Because the candidate calls are handled as inline candidates, the JIT hoists the
+call to a top level expression (which is good) during importation and introduces
+a return value placeholder into the place the call occupied in its original tree.
+(Oddly we introduce return value placeholders for some calls that don't return a
+a value -- we should fix this). The placeholder points back at the call.
+
+When we split the call into two calls we can't keep this structure intact as there
+needs to be a 1-1 relationship between call and placeholder. So the prototype
+needs to save the return value in a new local and then update the placeholder to
+refer to that local. This can be tricky because in some cases we haven't yet settled
+on what the actual type of the return value is.
+
+The handling of return values in the early stages of the JIT (arguably, in the entire
+JIT) is quite messy. The ABI details bleed through quite early and do so somewhat
+unevenly. This mostly impacts methods that return structures as different ABIs have
+quite different conventions, and the IR is transformed to reflect those conventions
+at different times for un-inlined calls, inlineable calls that end up not getting
+inlined, and for calls that get inlined. In particular, structures that are small
+enough to be returned by value (in a register or set of registers) need careful
+handling.
+
+The prototype skips over such by-value-returning struct methods today. Some of
+the logic found in `fgUpdateInlineReturnExpressionPlaceHolder` needs to be pulled
+in to properly type the call return value so we can properly type the temp. Or
+perhaps we could leverage some of importer-time transformations that are done for
+the fat calli cases.
+
+For larger structs we should arrange so that the call(s) write their return values
+directly into the new temp, instead of copying the value from wherever they
+return it into a temp, to avoid one level of struct copy. Doing so may require
+upstream zero init of the return value struct and this should only happen in one
+place.
+
+## Open Issues
+
+Here are some of the issues that need to be looked into more carefully.
+
+### Policy
+
+- what is the best mechanism for guessing which class to test for?
+  - instrument Tier0 code?
+  - look at types of arguments?
+  - ask runtime for set of known classes?
+  - harvest info from runtime caches (VSD)?
+  - add instrumenting Tier1 to collect data and Tier2 to optimize?
+- is there some efficient way to test for class ranges? Currently the JIT is
+doing an exact type test. But we really care more about what method is going to
+be invoked. So if there is a range of types `D1...DN` that all will invoke some
+particular method can we test for them all somehow?
+- or should we test the method after the method lookup (possibly worse tradeoff
+because of the chunked method table arrangement, also tricky as a method can
+have multiple addresses over time. Since many types can share a chunk this
+might allow devirtualization over a wider set of classes (good) but we'd lose
+knowledge of exact types (bad). Not clear how these tradeoffs play out.
+- interaction of guarded devirt with VSD? For interface calls we are sort of 
+inlining the first level of the VSD into the JITted code.
+- revocation or reworking of the guard if the JIT's prediction turns out to bad?
+- improve regular devirtualization to reduce need for guarded
+devirtualization.
+- should we enable this for preJITted code? In preJITted code the target method
+table is not a JIT-time constant and must be looked up.
+- in the prototype, guarded devirtualization and late devirtualization sometimes
+conflict. Say we fail to devirtualize a site, and so expand via guarded devirtualization
+guessing some class X. The residual virtual call then may be optimizable via late
+devirtualization, and this may discover the actual class. In that case the guarded
+devirtualization is not needed. But currently it can't be undone.
+- we probably don't want to bother with guarded devirtualization if we can't also
+inline. But it takes us several evaluation steps to determine if a call can
+be inlined, some of these happening *after* we've done the guarded expansion.
+Again this expansion can't be undone.
+- so perhaps we need to build an undo capability for the cases where guarded
+devirtualization doesn't lead to inlining and/or where late devirtualization also
+applies.
+
+### Implementation
+
+- avoid re-fetching method table for latent virtual call (should reduce code
+size and improve overall perf win)
+- look at how effectively we are sharing argument setup (might reduce code size
+and JIT time impact) -- perhaps implement head merging?
+- handle return values in full generality
+- il offsets
+- flag residual calls as not needing null checks
+- properly establish inline candidacy
+- decide if the refactoring of `InlineCandidateInfo` is the right way to pass
+information from importer to the indirect transform phase
+
+### Futures
+
+- can we cover multiple calls with one test? This can happen already if the
+subsequent call is introduced via inlining of the directly called method, as we
+know the exact type along that path. But for back to back calls to virtual 
+methods off of the same object it would be nice to do just one test.
+- should we test for multiple types? Once we've peeled off the "most likely" case
+if the conditional probability of the next most likely case is high it is probably
+worth testing for it too. I believe the C++ compiler will test up to 3 candidates
+this way... but that's a lot of code expansion.
+\ No newline at end of file
diff --git a/Documentation/design-docs/ThreeClassesDevirt.JPG b/Documentation/design-docs/ThreeClassesDevirt.JPG
new file mode 100644
index 0000000000..1ae19baab4
--- /dev/null
+++ b/Documentation/design-docs/ThreeClassesDevirt.JPG
diff --git a/Documentation/design-docs/ThreeClassesDevirtDetail.JPG b/Documentation/design-docs/ThreeClassesDevirtDetail.JPG
new file mode 100644
index 0000000000..0edba7479e
--- /dev/null
+++ b/Documentation/design-docs/ThreeClassesDevirtDetail.JPG
diff --git a/Documentation/design-docs/ThreeClassesDevirtFull.JPG b/Documentation/design-docs/ThreeClassesDevirtFull.JPG
new file mode 100644
index 0000000000..2c09920a7c
--- /dev/null
+++ b/Documentation/design-docs/ThreeClassesDevirtFull.JPG
diff --git a/Documentation/design-docs/ThreeClassesInterface.JPG b/Documentation/design-docs/ThreeClassesInterface.JPG
new file mode 100644
index 0000000000..cbd3551f74
--- /dev/null
+++ b/Documentation/design-docs/ThreeClassesInterface.JPG
diff --git a/Documentation/design-docs/TwoClassesBaseline.JPG b/Documentation/design-docs/TwoClassesBaseline.JPG
new file mode 100644
index 0000000000..3a8b4b21e8
--- /dev/null
+++ b/Documentation/design-docs/TwoClassesBaseline.JPG
diff --git a/Documentation/design-docs/TwoClassesDevirt.JPG b/Documentation/design-docs/TwoClassesDevirt.JPG
new file mode 100644
index 0000000000..7c48264eef
--- /dev/null
+++ b/Documentation/design-docs/TwoClassesDevirt.JPG
diff --git a/Documentation/design-docs/TwoClassesInterface.JPG b/Documentation/design-docs/TwoClassesInterface.JPG
new file mode 100644
index 0000000000..69063dc28a
--- /dev/null
+++ b/Documentation/design-docs/TwoClassesInterface.JPG
diff --git a/src/jit/compiler.cpp b/src/jit/compiler.cpp
index 86a85cec52..2851f3d854 100644
--- a/src/jit/compiler.cpp
+++ b/src/jit/compiler.cpp
@@ -4561,10 +4561,8 @@ void Compiler::compCompile(void** methodCodePtr, ULONG* methodCodeSize, JitFlags
         fgRemovePreds();
     }
 
-    if (IsTargetAbi(CORINFO_CORERT_ABI) && doesMethodHaveFatPointer())
-    {
-        fgTransformFatCalli();
-    }
+    // Transform indirect calls that require control flow expansion.
+    fgTransformIndirectCalls();
 
     EndPhase(PHASE_IMPORTATION);
 
diff --git a/src/jit/compiler.h b/src/jit/compiler.h
index a1309621b4..a3f46bda86 100644
--- a/src/jit/compiler.h
+++ b/src/jit/compiler.h
@@ -3298,6 +3298,13 @@ public:
         return IsTargetAbi(CORINFO_CORERT_ABI) ? TYP_I_IMPL : TYP_REF;
     }
 
+    void impDevirtualizeCall(GenTreeCall*            call,
+                             CORINFO_METHOD_HANDLE*  method,
+                             unsigned*               methodFlags,
+                             CORINFO_CONTEXT_HANDLE* contextHandle,
+                             CORINFO_CONTEXT_HANDLE* exactContextHandle,
+                             bool                    isLateDevirtualization);
+
     //=========================================================================
     //                          PROTECTED
     //=========================================================================
@@ -3355,12 +3362,6 @@ protected:
                             CORINFO_CALL_INFO* callInfo,
                             IL_OFFSET          rawILOffset);
 
-    void impDevirtualizeCall(GenTreeCall*            call,
-                             CORINFO_METHOD_HANDLE*  method,
-                             unsigned*               methodFlags,
-                             CORINFO_CONTEXT_HANDLE* contextHandle,
-                             CORINFO_CONTEXT_HANDLE* exactContextHandle);
-
     CORINFO_CLASS_HANDLE impGetSpecialIntrinsicExactReturnType(CORINFO_METHOD_HANDLE specialIntrinsicHandle);
 
     bool impMethodInfo_hasRetBuffArg(CORINFO_METHOD_INFO* methInfo);
@@ -3865,7 +3866,7 @@ private:
                         bool                  forceInline,
                         InlineResult*         inlineResult);
 
-    void impCheckCanInline(GenTree*               call,
+    void impCheckCanInline(GenTreeCall*           call,
                            CORINFO_METHOD_HANDLE  fncHandle,
                            unsigned               methAttr,
                            CORINFO_CONTEXT_HANDLE exactContextHnd,
@@ -3894,6 +3895,11 @@ private:
                                 bool                   exactContextNeedsRuntimeLookup,
                                 CORINFO_CALL_INFO*     callInfo);
 
+    void impMarkInlineCandidateHelper(GenTreeCall*           call,
+                                      CORINFO_CONTEXT_HANDLE exactContextHnd,
+                                      bool                   exactContextNeedsRuntimeLookup,
+                                      CORINFO_CALL_INFO*     callInfo);
+
     bool impTailCallRetTypeCompatible(var_types            callerRetType,
                                       CORINFO_CLASS_HANDLE callerRetTypeClass,
                                       var_types            calleeRetType,
@@ -4133,7 +4139,7 @@ public:
 
     void fgImport();
 
-    void fgTransformFatCalli();
+    void fgTransformIndirectCalls();
 
     void fgInline();
 
@@ -5354,8 +5360,8 @@ private:
 #ifdef DEBUG
     static fgWalkPreFn fgDebugCheckInlineCandidates;
 
-    void               CheckNoFatPointerCandidatesLeft();
-    static fgWalkPreFn fgDebugCheckFatPointerCandidates;
+    void               CheckNoTransformableIndirectCallsRemain();
+    static fgWalkPreFn fgDebugCheckForTransformableIndirectCalls;
 #endif
 
     void fgPromoteStructs();
@@ -6138,6 +6144,7 @@ public:
 #define OMF_HAS_NULLCHECK 0x00000010     // Method contains null check.
 #define OMF_HAS_FATPOINTER 0x00000020    // Method contains call, that needs fat pointer transformation.
 #define OMF_HAS_OBJSTACKALLOC 0x00000040 // Method contains an object allocated on the stack.
+#define OMF_HAS_GUARDEDDEVIRT 0x00000080 // Method contains guarded devirtualization candidate
 
     bool doesMethodHaveFatPointer()
     {
@@ -6156,6 +6163,27 @@ public:
 
     void addFatPointerCandidate(GenTreeCall* call);
 
+    bool doesMethodHaveGuardedDevirtualization()
+    {
+        return (optMethodFlags & OMF_HAS_GUARDEDDEVIRT) != 0;
+    }
+
+    void setMethodHasGuardedDevirtualization()
+    {
+        optMethodFlags |= OMF_HAS_GUARDEDDEVIRT;
+    }
+
+    void clearMethodHasGuardedDevirtualization()
+    {
+        optMethodFlags &= ~OMF_HAS_GUARDEDDEVIRT;
+    }
+
+    void addGuardedDevirtualizationCandidate(GenTreeCall*          call,
+                                             CORINFO_METHOD_HANDLE methodHandle,
+                                             CORINFO_CLASS_HANDLE  classHandle,
+                                             unsigned              methodAttr,
+                                             unsigned              classAttr);
+
     unsigned optMethodFlags;
 
     // Recursion bound controls how far we can go backwards tracking for a SSA value.
diff --git a/src/jit/flowgraph.cpp b/src/jit/flowgraph.cpp
index dd184539fd..ad89317159 100644
--- a/src/jit/flowgraph.cpp
+++ b/src/jit/flowgraph.cpp
@@ -5892,6 +5892,21 @@ void Compiler::fgFindBasicBlocks()
             {
                 // This temp should already have the type of the return value.
                 JITDUMP("\nInliner: re-using pre-existing spill temp V%02u\n", lvaInlineeReturnSpillTemp);
+
+                if (info.compRetType == TYP_REF)
+                {
+                    // We may have co-opted an existing temp for the return spill.
+                    // We likely assumed it was single-def at the time, but now
+                    // we can see it has multiple definitions.
+                    if ((retBlocks > 1) && (lvaTable[lvaInlineeReturnSpillTemp].lvSingleDef == 1))
+                    {
+                        // Make sure it is no longer marked single def. This is only safe
+                        // to do if we haven't ever updated the type.
+                        assert(!lvaTable[lvaInlineeReturnSpillTemp].lvClassInfoUpdated);
+                        JITDUMP("Marked return spill temp V%02u as NOT single def temp\n", lvaInlineeReturnSpillTemp);
+                        lvaTable[lvaInlineeReturnSpillTemp].lvSingleDef = 0;
+                    }
+                }
             }
             else
             {
@@ -21859,43 +21874,58 @@ void Compiler::fgInline()
 
     do
     {
-        /* Make the current basic block address available globally */
-
+        // Make the current basic block address available globally
         compCurBB = block;
 
-        GenTreeStmt* stmt;
-        GenTree*     expr;
-
-        for (stmt = block->firstStmt(); stmt != nullptr; stmt = stmt->gtNextStmt)
+        for (GenTreeStmt* stmt = block->firstStmt(); stmt != nullptr; stmt = stmt->gtNextStmt)
         {
-            expr = stmt->gtStmtExpr;
 
-            // See if we can expand the inline candidate
-            if ((expr->gtOper == GT_CALL) && ((expr->gtFlags & GTF_CALL_INLINE_CANDIDATE) != 0))
+#ifdef DEBUG
+            // In debug builds we want the inline tree to show all failed
+            // inlines. Some inlines may fail very early and never make it to
+            // candidate stage. So scan the tree looking for those early failures.
+            fgWalkTreePre(&stmt->gtStmtExpr, fgFindNonInlineCandidate, stmt);
+#endif
+
+            GenTree* expr = stmt->gtStmtExpr;
+
+            // The importer ensures that all inline candidates are
+            // statement expressions.  So see if we have a call.
+            if (expr->IsCall())
             {
                 GenTreeCall* call = expr->AsCall();
-                InlineResult inlineResult(this, call, stmt, "fgInline");
 
-                fgMorphStmt = stmt;
+                // We do. Is it an inline candidate?
+                //
+                // Note we also process GuardeDevirtualizationCandidates here as we've
+                // split off GT_RET_EXPRs for them even when they are not inline candidates
+                // as we need similar processing to ensure they get patched back to where
+                // they belong.
+                if (call->IsInlineCandidate() || call->IsGuardedDevirtualizationCandidate())
+                {
+                    InlineResult inlineResult(this, call, stmt, "fgInline");
+
+                    fgMorphStmt = stmt;
 
-                fgMorphCallInline(call, &inlineResult);
+                    fgMorphCallInline(call, &inlineResult);
 
-                if (stmt->gtStmtExpr->IsNothingNode())
-                {
-                    fgRemoveStmt(block, stmt);
-                    continue;
+                    // fgMorphCallInline may have updated the
+                    // statement expression to a GT_NOP if the
+                    // call returned a value, regardless of
+                    // whether the inline succeeded or failed.
+                    //
+                    // If so, remove the GT_NOP and continue
+                    // on with the next statement.
+                    if (stmt->gtStmtExpr->IsNothingNode())
+                    {
+                        fgRemoveStmt(block, stmt);
+                        continue;
+                    }
                 }
             }
-            else
-            {
-#ifdef DEBUG
-                // Look for non-candidates.
-                fgWalkTreePre(&stmt->gtStmtExpr, fgFindNonInlineCandidate, stmt);
-#endif
-            }
 
-            // See if we need to replace the return value place holder.
-            // Also, see if this update enables further devirtualization.
+            // See if we need to replace some return value place holders.
+            // Also, see if this replacement enables further devirtualization.
             //
             // Note we have both preorder and postorder callbacks here.
             //
@@ -22008,6 +22038,11 @@ Compiler::fgWalkResult Compiler::fgFindNonInlineCandidate(GenTree** pTree, fgWal
 
 void Compiler::fgNoteNonInlineCandidate(GenTreeStmt* stmt, GenTreeCall* call)
 {
+    if (call->IsInlineCandidate() || call->IsGuardedDevirtualizationCandidate())
+    {
+        return;
+    }
+
     InlineResult      inlineResult(this, call, nullptr, "fgNotInlineCandidate");
     InlineObservation currentObservation = InlineObservation::CALLSITE_NOT_CANDIDATE;
 
@@ -22471,10 +22506,11 @@ Compiler::fgWalkResult Compiler::fgLateDevirtualization(GenTree** pTree, fgWalkD
             }
 #endif // DEBUG
 
-            CORINFO_METHOD_HANDLE  method      = call->gtCallMethHnd;
-            unsigned               methodFlags = 0;
-            CORINFO_CONTEXT_HANDLE context     = nullptr;
-            comp->impDevirtualizeCall(call, &method, &methodFlags, &context, nullptr);
+            CORINFO_METHOD_HANDLE  method                 = call->gtCallMethHnd;
+            unsigned               methodFlags            = 0;
+            CORINFO_CONTEXT_HANDLE context                = nullptr;
+            const bool             isLateDevirtualization = true;
+            comp->impDevirtualizeCall(call, &method, &methodFlags, &context, nullptr, isLateDevirtualization);
         }
     }
     else if (tree->OperGet() == GT_ASG)
@@ -25507,8 +25543,10 @@ bool Compiler::fgRetargetBranchesToCanonicalCallFinally(BasicBlock*      block,
     return true;
 }
 
-// FatCalliTransformer transforms calli that can use fat function pointer.
-// Fat function pointer is pointer with the second least significant bit set,
+// IndirectCallTransformer transforms indirect calls that involve fat pointers
+// or guarded devirtualization.
+//
+// A fat function pointer is pointer with the second least significant bit set,
 // if the bit is set, the pointer (after clearing the bit) actually points to
 // a tuple <method pointer, instantiation argument pointer> where
 // instantiationArgument is a hidden first argument required by method pointer.
@@ -25557,38 +25595,60 @@ bool Compiler::fgRetargetBranchesToCanonicalCallFinally(BasicBlock*      block,
 //     subsequent statements
 //   }
 //
-class FatCalliTransformer
+class IndirectCallTransformer
 {
 public:
-    FatCalliTransformer(Compiler* compiler) : compiler(compiler)
+    IndirectCallTransformer(Compiler* compiler) : compiler(compiler)
     {
     }
 
     //------------------------------------------------------------------------
     // Run: run transformation for each block.
     //
-    void Run()
+    // Returns:
+    //   Count of calls transformed.
+    int Run()
     {
+        int count = 0;
+
         for (BasicBlock* block = compiler->fgFirstBB; block != nullptr; block = block->bbNext)
         {
-            TransformBlock(block);
+            count += TransformBlock(block);
         }
+
+        return count;
     }
 
 private:
     //------------------------------------------------------------------------
-    // TransformBlock: look through statements and transform statements with fat pointer calls.
+    // TransformBlock: look through statements and transform statements with
+    //   particular indirect calls
     //
-    void TransformBlock(BasicBlock* block)
+    // Returns:
+    //   Count of calls transformed.
+    //
+    int TransformBlock(BasicBlock* block)
     {
+        int count = 0;
+
         for (GenTreeStmt* stmt = block->firstStmt(); stmt != nullptr; stmt = stmt->gtNextStmt)
         {
             if (ContainsFatCalli(stmt))
             {
-                StatementTransformer stmtTransformer(compiler, block, stmt);
-                stmtTransformer.Run();
+                FatPointerCallTransformer transformer(compiler, block, stmt);
+                transformer.Run();
+                count++;
+            }
+
+            if (ContainsGuardedDevirtualizationCandidate(stmt))
+            {
+                GuardedDevirtualizationTransformer transformer(compiler, block, stmt);
+                transformer.Run();
+                count++;
             }
         }
+
+        return count;
     }
 
     //------------------------------------------------------------------------
@@ -25609,39 +25669,158 @@ private:
         return fatPointerCandidate->IsCall() && fatPointerCandidate->AsCall()->IsFatPointerCandidate();
     }
 
-    class StatementTransformer
+    //------------------------------------------------------------------------
+    // ContainsGuardedDevirtualizationCandidate: check does this statement contain a virtual
+    // call that we'd like to guardedly devirtualize?
+    //
+    // Return Value:
+    //    true if contains, false otherwise.
+    //
+    // Notes:
+    //    calls are hoisted to top level ... (we hope)
+    bool ContainsGuardedDevirtualizationCandidate(GenTreeStmt* stmt)
+    {
+        GenTree* candidate = stmt->gtStmtExpr;
+        return candidate->IsCall() && candidate->AsCall()->IsGuardedDevirtualizationCandidate();
+    }
+
+    class Transformer
     {
     public:
-        StatementTransformer(Compiler* compiler, BasicBlock* block, GenTreeStmt* stmt)
+        Transformer(Compiler* compiler, BasicBlock* block, GenTreeStmt* stmt)
             : compiler(compiler), currBlock(block), stmt(stmt)
         {
-            remainderBlock  = nullptr;
-            checkBlock      = nullptr;
-            thenBlock       = nullptr;
-            elseBlock       = nullptr;
-            doesReturnValue = stmt->gtStmtExpr->OperIs(GT_ASG);
-            origCall        = GetCall(stmt);
-            fptrAddress     = origCall->gtCallAddr;
-            pointerType     = fptrAddress->TypeGet();
+            remainderBlock = nullptr;
+            checkBlock     = nullptr;
+            thenBlock      = nullptr;
+            elseBlock      = nullptr;
+            origCall       = nullptr;
         }
 
         //------------------------------------------------------------------------
         // Run: transform the statement as described above.
         //
-        void Run()
+        virtual void Run()
+        {
+            origCall = GetCall(stmt);
+            Transform();
+        }
+
+        void Transform()
         {
-            ClearFatFlag();
+            JITDUMP("*** %s: transforming [%06u]\n", Name(), compiler->dspTreeID(stmt));
+            FixupRetExpr();
+            ClearFlag();
             CreateRemainder();
             CreateCheck();
             CreateThen();
             CreateElse();
-
             RemoveOldStatement();
             SetWeights();
             ChainFlow();
         }
 
-    private:
+    protected:
+        virtual const char*  Name()                         = 0;
+        virtual void         ClearFlag()                    = 0;
+        virtual GenTreeCall* GetCall(GenTreeStmt* callStmt) = 0;
+        virtual void FixupRetExpr()                         = 0;
+
+        //------------------------------------------------------------------------
+        // CreateRemainder: split current block at the call stmt and
+        // insert statements after the call into remainderBlock.
+        //
+        void CreateRemainder()
+        {
+            remainderBlock          = compiler->fgSplitBlockAfterStatement(currBlock, stmt);
+            unsigned propagateFlags = currBlock->bbFlags & BBF_GC_SAFE_POINT;
+            remainderBlock->bbFlags |= BBF_JMP_TARGET | BBF_HAS_LABEL | propagateFlags;
+        }
+
+        virtual void CreateCheck() = 0;
+
+        //------------------------------------------------------------------------
+        // CreateAndInsertBasicBlock: ask compiler to create new basic block.
+        // and insert in into the basic block list.
+        //
+        // Arguments:
+        //    jumpKind - jump kind for the new basic block
+        //    insertAfter - basic block, after which compiler has to insert the new one.
+        //
+        // Return Value:
+        //    new basic block.
+        BasicBlock* CreateAndInsertBasicBlock(BBjumpKinds jumpKind, BasicBlock* insertAfter)
+        {
+            BasicBlock* block = compiler->fgNewBBafter(jumpKind, insertAfter, true);
+            if ((insertAfter->bbFlags & BBF_INTERNAL) == 0)
+            {
+                block->bbFlags &= ~BBF_INTERNAL;
+                block->bbFlags |= BBF_IMPORTED;
+            }
+            return block;
+        }
+
+        virtual void CreateThen() = 0;
+        virtual void CreateElse() = 0;
+
+        //------------------------------------------------------------------------
+        // RemoveOldStatement: remove original stmt from current block.
+        //
+        void RemoveOldStatement()
+        {
+            compiler->fgRemoveStmt(currBlock, stmt);
+        }
+
+        //------------------------------------------------------------------------
+        // SetWeights: set weights for new blocks.
+        //
+        void SetWeights()
+        {
+            remainderBlock->inheritWeight(currBlock);
+            checkBlock->inheritWeight(currBlock);
+            thenBlock->inheritWeightPercentage(currBlock, HIGH_PROBABILITY);
+            elseBlock->inheritWeightPercentage(currBlock, 100 - HIGH_PROBABILITY);
+        }
+
+        //------------------------------------------------------------------------
+        // ChainFlow: link new blocks into correct cfg.
+        //
+        void ChainFlow()
+        {
+            assert(!compiler->fgComputePredsDone);
+            checkBlock->bbJumpDest = elseBlock;
+            thenBlock->bbJumpDest  = remainderBlock;
+        }
+
+        Compiler*    compiler;
+        BasicBlock*  currBlock;
+        BasicBlock*  remainderBlock;
+        BasicBlock*  checkBlock;
+        BasicBlock*  thenBlock;
+        BasicBlock*  elseBlock;
+        GenTreeStmt* stmt;
+        GenTreeCall* origCall;
+
+        const int HIGH_PROBABILITY = 80;
+    };
+
+    class FatPointerCallTransformer final : public Transformer
+    {
+    public:
+        FatPointerCallTransformer(Compiler* compiler, BasicBlock* block, GenTreeStmt* stmt)
+            : Transformer(compiler, block, stmt)
+        {
+            doesReturnValue = stmt->gtStmtExpr->OperIs(GT_ASG);
+            fptrAddress     = origCall->gtCallAddr;
+            pointerType     = fptrAddress->TypeGet();
+        }
+
+    protected:
+        virtual const char* Name()
+        {
+            return "FatPointerCall";
+        }
+
         //------------------------------------------------------------------------
         // GetCall: find a call in a statement.
         //
@@ -25650,7 +25829,7 @@ private:
         //
         // Return Value:
         //    call tree node pointer.
-        GenTreeCall* GetCall(GenTreeStmt* callStmt)
+        virtual GenTreeCall* GetCall(GenTreeStmt* callStmt)
         {
             GenTree*     tree = callStmt->gtStmtExpr;
             GenTreeCall* call = nullptr;
@@ -25667,28 +25846,22 @@ private:
         }
 
         //------------------------------------------------------------------------
-        // ClearFatFlag: clear fat pointer candidate flag from the original call.
+        // ClearFlag: clear fat pointer candidate flag from the original call.
         //
-        void ClearFatFlag()
+        virtual void ClearFlag()
         {
             origCall->ClearFatPointerCandidate();
         }
 
-        //------------------------------------------------------------------------
-        // CreateRemainder: split current block at the fat call stmt and
-        // insert statements after the call into remainderBlock.
-        //
-        void CreateRemainder()
+        // FixupRetExpr: no action needed as we handle this in the importer.
+        virtual void FixupRetExpr()
         {
-            remainderBlock          = compiler->fgSplitBlockAfterStatement(currBlock, stmt);
-            unsigned propagateFlags = currBlock->bbFlags & BBF_GC_SAFE_POINT;
-            remainderBlock->bbFlags |= BBF_JMP_TARGET | BBF_HAS_LABEL | propagateFlags;
         }
 
         //------------------------------------------------------------------------
         // CreateCheck: create check block, that checks fat pointer bit set.
         //
-        void CreateCheck()
+        virtual void CreateCheck()
         {
             checkBlock               = CreateAndInsertBasicBlock(BBJ_COND, currBlock);
             GenTree* fatPointerMask  = new (compiler, GT_CNS_INT) GenTreeIntCon(TYP_I_IMPL, FAT_POINTER_MASK);
@@ -25702,19 +25875,20 @@ private:
         }
 
         //------------------------------------------------------------------------
-        // CreateCheck: create then block, that is executed if call address is not fat pointer.
+        // CreateThen: create then block, that is executed if the check succeeds.
+        // This simply executes the original call.
         //
-        void CreateThen()
+        virtual void CreateThen()
         {
-            thenBlock               = CreateAndInsertBasicBlock(BBJ_ALWAYS, checkBlock);
-            GenTree* nonFatCallStmt = compiler->gtCloneExpr(stmt)->AsStmt();
-            compiler->fgInsertStmtAtEnd(thenBlock, nonFatCallStmt);
+            thenBlock                   = CreateAndInsertBasicBlock(BBJ_ALWAYS, checkBlock);
+            GenTree* copyOfOriginalStmt = compiler->gtCloneExpr(stmt)->AsStmt();
+            compiler->fgInsertStmtAtEnd(thenBlock, copyOfOriginalStmt);
         }
 
         //------------------------------------------------------------------------
-        // CreateCheck: create else block, that is executed if call address is fat pointer.
+        // CreateElse: create else block, that is executed if call address is fat pointer.
         //
-        void CreateElse()
+        virtual void CreateElse()
         {
             elseBlock = CreateAndInsertBasicBlock(BBJ_NONE, thenBlock);
 
@@ -25727,27 +25901,6 @@ private:
         }
 
         //------------------------------------------------------------------------
-        // CreateAndInsertBasicBlock: ask compiler to create new basic block.
-        // and insert in into the basic block list.
-        //
-        // Arguments:
-        //    jumpKind - jump kind for the new basic block
-        //    insertAfter - basic block, after which compiler has to insert the new one.
-        //
-        // Return Value:
-        //    new basic block.
-        BasicBlock* CreateAndInsertBasicBlock(BBjumpKinds jumpKind, BasicBlock* insertAfter)
-        {
-            BasicBlock* block = compiler->fgNewBBafter(jumpKind, insertAfter, true);
-            if ((insertAfter->bbFlags & BBF_INTERNAL) == 0)
-            {
-                block->bbFlags &= ~BBF_INTERNAL;
-                block->bbFlags |= BBF_IMPORTED;
-            }
-            return block;
-        }
-
-        //------------------------------------------------------------------------
         // GetFixedFptrAddress: clear fat pointer bit from fat pointer address.
         //
         // Return Value:
@@ -25843,49 +25996,306 @@ private:
             iterator->Rest() = compiler->gtNewArgList(hiddenArgument);
         }
 
+    private:
+        const int FAT_POINTER_MASK = 0x2;
+
+        GenTree*  fptrAddress;
+        var_types pointerType;
+        bool      doesReturnValue;
+    };
+
+    class GuardedDevirtualizationTransformer final : public Transformer
+    {
+    public:
+        GuardedDevirtualizationTransformer(Compiler* compiler, BasicBlock* block, GenTreeStmt* stmt)
+            : Transformer(compiler, block, stmt), returnTemp(BAD_VAR_NUM)
+        {
+        }
+
         //------------------------------------------------------------------------
-        // RemoveOldStatement: remove original stmt from current block.
+        // Run: transform the statement as described above.
         //
-        void RemoveOldStatement()
+        virtual void Run()
         {
-            compiler->fgRemoveStmt(currBlock, stmt);
+            origCall = GetCall(stmt);
+
+            JITDUMP("*** %s contemplating [%06u]\n", Name(), compiler->dspTreeID(origCall));
+
+            // We currently need inline candidate info to guarded devirt.
+            if (!origCall->IsInlineCandidate())
+            {
+                JITDUMP("*** %s Bailing on [%06u] -- not an inline candidate\n", Name(), compiler->dspTreeID(origCall));
+                ClearFlag();
+                return;
+            }
+
+            // For now, bail on transforming calls that still appear
+            // to return structs by value as there is deferred work
+            // needed to fix up the return type.
+            //
+            // See for instance fgUpdateInlineReturnExpressionPlaceHolder.
+            if (origCall->TypeGet() == TYP_STRUCT)
+            {
+                JITDUMP("*** %s Bailing on [%06u] -- can't handle by-value struct returns yet\n", Name(),
+                        compiler->dspTreeID(origCall));
+                ClearFlag();
+
+                // For stub calls restore the stub address
+                if (origCall->IsVirtualStub())
+                {
+                    origCall->gtStubCallStubAddr = origCall->gtInlineCandidateInfo->stubAddr;
+                }
+                return;
+            }
+
+            Transform();
+        }
+
+    protected:
+        virtual const char* Name()
+        {
+            return "GuardedDevirtualization";
         }
 
         //------------------------------------------------------------------------
-        // SetWeights: set weights for new blocks.
+        // GetCall: find a call in a statement.
         //
-        void SetWeights()
+        // Arguments:
+        //    callStmt - the statement with the call inside.
+        //
+        // Return Value:
+        //    call tree node pointer.
+        virtual GenTreeCall* GetCall(GenTreeStmt* callStmt)
         {
-            remainderBlock->inheritWeight(currBlock);
-            checkBlock->inheritWeight(currBlock);
-            thenBlock->inheritWeightPercentage(currBlock, HIGH_PROBABILITY);
-            elseBlock->inheritWeightPercentage(currBlock, 100 - HIGH_PROBABILITY);
+            GenTree* tree = callStmt->gtStmtExpr;
+            assert(tree->IsCall());
+            GenTreeCall* call = tree->AsCall();
+            return call;
         }
 
         //------------------------------------------------------------------------
-        // ChainFlow: link new blocks into correct cfg.
+        // ClearFlag: clear guarded devirtualization candidate flag from the original call.
         //
-        void ChainFlow()
+        virtual void ClearFlag()
         {
-            assert(!compiler->fgComputePredsDone);
-            checkBlock->bbJumpDest = elseBlock;
-            thenBlock->bbJumpDest  = remainderBlock;
+            origCall->ClearGuardedDevirtualizationCandidate();
         }
 
-        Compiler*    compiler;
-        BasicBlock*  currBlock;
-        BasicBlock*  remainderBlock;
-        BasicBlock*  checkBlock;
-        BasicBlock*  thenBlock;
-        BasicBlock*  elseBlock;
-        GenTreeStmt* stmt;
-        GenTreeCall* origCall;
-        GenTree*     fptrAddress;
-        var_types    pointerType;
-        bool         doesReturnValue;
+        //------------------------------------------------------------------------
+        // CreateCheck: create check block and check method table
+        //
+        virtual void CreateCheck()
+        {
+            checkBlock = CreateAndInsertBasicBlock(BBJ_COND, currBlock);
 
-        const int FAT_POINTER_MASK = 0x2;
-        const int HIGH_PROBABILITY = 80;
+            // Fetch method table from object arg to call.
+            GenTree* thisTree = compiler->gtCloneExpr(origCall->gtCallObjp);
+
+            // Create temp for this if the tree is costly.
+            if (!thisTree->IsLocal())
+            {
+                const unsigned thisTempNum = compiler->lvaGrabTemp(true DEBUGARG("guarded devirt this temp"));
+                // lvaSetClass(thisTempNum, ...);
+                GenTree* asgTree = compiler->gtNewTempAssign(thisTempNum, thisTree);
+                GenTree* asgStmt = compiler->fgNewStmtFromTree(asgTree, stmt->gtStmt.gtStmtILoffsx);
+                compiler->fgInsertStmtAtEnd(checkBlock, asgStmt);
+
+                thisTree = compiler->gtNewLclvNode(thisTempNum, TYP_REF);
+
+                // Propagate the new this to the call. Must be a new expr as the call
+                // will live on in the else block and thisTree is used below.
+                origCall->gtCallObjp = compiler->gtNewLclvNode(thisTempNum, TYP_REF);
+            }
+
+            GenTree* methodTable = compiler->gtNewIndir(TYP_I_IMPL, thisTree);
+            methodTable->gtFlags |= GTF_IND_INVARIANT;
+
+            // Find target method table
+            GuardedDevirtualizationCandidateInfo* guardedInfo       = origCall->gtGuardedDevirtualizationCandidateInfo;
+            CORINFO_CLASS_HANDLE                  clsHnd            = guardedInfo->guardedClassHandle;
+            GenTree*                              targetMethodTable = compiler->gtNewIconEmbClsHndNode(clsHnd);
+
+            // Compare and jump to else (which does the indirect call) if NOT equal
+            GenTree* methodTableCompare = compiler->gtNewOperNode(GT_NE, TYP_INT, targetMethodTable, methodTable);
+            GenTree* jmpTree            = compiler->gtNewOperNode(GT_JTRUE, TYP_VOID, methodTableCompare);
+            GenTree* jmpStmt            = compiler->fgNewStmtFromTree(jmpTree, stmt->gtStmt.gtStmtILoffsx);
+            compiler->fgInsertStmtAtEnd(checkBlock, jmpStmt);
+        }
+
+        //------------------------------------------------------------------------
+        // FixupRetExpr: set up to repair return value placeholder from call
+        //
+        virtual void FixupRetExpr()
+        {
+            // If call returns a value, we need to copy it to a temp, and
+            // update the associated GT_RET_EXPR to refer to the temp instead
+            // of the call.
+            //
+            // Note implicit by-ref returns should have already been converted
+            // so any struct copy we induce here should be cheap.
+            //
+            // Todo: make sure we understand how this interacts with return type
+            // munging for small structs.
+            InlineCandidateInfo* inlineInfo = origCall->gtInlineCandidateInfo;
+            GenTree*             retExpr    = inlineInfo->retExpr;
+
+            // Sanity check the ret expr if non-null: it should refer to the original call.
+            if (retExpr != nullptr)
+            {
+                assert(retExpr->gtRetExpr.gtInlineCandidate == origCall);
+            }
+
+            if (origCall->TypeGet() != TYP_VOID)
+            {
+                returnTemp = compiler->lvaGrabTemp(false DEBUGARG("guarded devirt return temp"));
+                JITDUMP("Reworking call(s) to return value via a new temp V%02u\n", returnTemp);
+
+                if (varTypeIsStruct(origCall))
+                {
+                    compiler->lvaSetStruct(returnTemp, origCall->gtRetClsHnd, false);
+                }
+
+                GenTree* tempTree = compiler->gtNewLclvNode(returnTemp, origCall->TypeGet());
+
+                JITDUMP("Updating GT_RET_EXPR [%06u] to refer to temp V%02u\n", compiler->dspTreeID(retExpr),
+                        returnTemp);
+                retExpr->gtRetExpr.gtInlineCandidate = tempTree;
+            }
+            else if (retExpr != nullptr)
+            {
+                // We still oddly produce GT_RET_EXPRs for some void
+                // returning calls. Just patch the ret expr to a NOP.
+                //
+                // Todo: consider bagging creation of these RET_EXPRs. The only possible
+                // benefit they provide is stitching back larger trees for failed inlines
+                // of void-returning methods. But then the calls likely sit in commas and
+                // the benefit of a larger tree is unclear.
+                JITDUMP("Updating GT_RET_EXPR [%06u] for VOID return to refer to a NOP\n",
+                        compiler->dspTreeID(retExpr));
+                GenTree* nopTree                     = compiler->gtNewNothingNode();
+                retExpr->gtRetExpr.gtInlineCandidate = nopTree;
+            }
+            else
+            {
+                // We do not produce GT_RET_EXPRs for CTOR calls, so there is nothing to patch.
+            }
+        }
+
+        //------------------------------------------------------------------------
+        // CreateThen: create else block with direct call to method
+        //
+        virtual void CreateThen()
+        {
+            thenBlock = CreateAndInsertBasicBlock(BBJ_ALWAYS, checkBlock);
+
+            InlineCandidateInfo* inlineInfo = origCall->gtInlineCandidateInfo;
+            CORINFO_CLASS_HANDLE clsHnd     = inlineInfo->clsHandle;
+
+            // copy 'this' to temp with exact type.
+            const unsigned thisTemp  = compiler->lvaGrabTemp(false DEBUGARG("guarded devirt this exact temp"));
+            GenTree*       clonedObj = compiler->gtCloneExpr(origCall->gtCallObjp);
+            GenTree*       assign    = compiler->gtNewTempAssign(thisTemp, clonedObj);
+            compiler->lvaSetClass(thisTemp, clsHnd, true);
+            GenTreeStmt* assignStmt = compiler->gtNewStmt(assign);
+            compiler->fgInsertStmtAtEnd(thenBlock, assignStmt);
+
+            // Clone call
+            GenTreeCall* call = compiler->gtCloneExpr(origCall)->AsCall();
+            call->gtCallObjp  = compiler->gtNewLclvNode(thisTemp, TYP_REF);
+            call->SetIsGuarded();
+
+            JITDUMP("Direct call [%06u] in block BB%02u\n", compiler->dspTreeID(call), thenBlock->bbNum);
+
+            // Then invoke impDevirtualizeCall do actually
+            // transform the call for us. It should succeed.... as we have
+            // now provided an exact typed this.
+            CORINFO_METHOD_HANDLE  methodHnd              = inlineInfo->methInfo.ftn;
+            unsigned               methodFlags            = inlineInfo->methAttr;
+            CORINFO_CONTEXT_HANDLE context                = inlineInfo->exactContextHnd;
+            const bool             isLateDevirtualization = true;
+            compiler->impDevirtualizeCall(call, &methodHnd, &methodFlags, &context, nullptr, isLateDevirtualization);
+
+            // Presumably devirt might fail? If so we should try and avoid
+            // making this a guarded devirt candidate instead of ending
+            // up here.
+            assert(!call->IsVirtual());
+
+            // Re-establish this call as an inline candidate.
+            GenTree* oldRetExpr         = inlineInfo->retExpr;
+            inlineInfo->clsHandle       = clsHnd;
+            inlineInfo->exactContextHnd = context;
+            call->gtInlineCandidateInfo = inlineInfo;
+
+            // Add the call
+            GenTreeStmt* callStmt = compiler->gtNewStmt(call);
+            compiler->fgInsertStmtAtEnd(thenBlock, callStmt);
+
+            // If there was a ret expr for this call, we need to create a new one
+            // and append it just after the call.
+            //
+            // Note the original GT_RET_EXPR is sitting at the join point of the
+            // guarded expansion and for non-void calls, and now refers to a temp local;
+            // we set all this up in FixupRetExpr().
+            if (oldRetExpr != nullptr)
+            {
+                GenTree* retExpr    = compiler->gtNewInlineCandidateReturnExpr(call, call->TypeGet());
+                inlineInfo->retExpr = retExpr;
+
+                if (returnTemp != BAD_VAR_NUM)
+                {
+                    retExpr = compiler->gtNewTempAssign(returnTemp, retExpr);
+                }
+                else
+                {
+                    // We should always have a return temp if we return results by value
+                    assert(origCall->TypeGet() == TYP_VOID);
+                }
+
+                GenTreeStmt* resultStmt = compiler->gtNewStmt(retExpr);
+                compiler->fgInsertStmtAtEnd(thenBlock, resultStmt);
+            }
+        }
+
+        //------------------------------------------------------------------------
+        // CreateElse: create else block. This executes the unaltered indirect call.
+        //
+        virtual void CreateElse()
+        {
+            elseBlock            = CreateAndInsertBasicBlock(BBJ_NONE, thenBlock);
+            GenTreeCall* call    = origCall;
+            GenTreeStmt* newStmt = compiler->gtNewStmt(call);
+
+            call->gtFlags &= ~GTF_CALL_INLINE_CANDIDATE;
+            call->SetIsGuarded();
+
+            JITDUMP("Residual call [%06u] moved to block BB%02u\n", compiler->dspTreeID(call), elseBlock->bbNum);
+
+            if (returnTemp != BAD_VAR_NUM)
+            {
+                GenTree* assign     = compiler->gtNewTempAssign(returnTemp, call);
+                newStmt->gtStmtExpr = assign;
+            }
+
+            // For stub calls, restore the stub address. For everything else,
+            // null out the candidate info field.
+            if (call->IsVirtualStub())
+            {
+                JITDUMP("Restoring stub addr %p from candidate info\n", call->gtInlineCandidateInfo->stubAddr);
+                call->gtStubCallStubAddr = call->gtInlineCandidateInfo->stubAddr;
+            }
+            else
+            {
+                call->gtInlineCandidateInfo = nullptr;
+            }
+
+            compiler->fgInsertStmtAtEnd(elseBlock, newStmt);
+
+            // Set the original statement to a nop.
+            stmt->gtStmtExpr = compiler->gtNewNothingNode();
+        }
+
+    private:
+        unsigned returnTemp;
     };
 
     Compiler* compiler;
@@ -25894,46 +26304,74 @@ private:
 #ifdef DEBUG
 
 //------------------------------------------------------------------------
-// fgDebugCheckFatPointerCandidates: callback to make sure there are no more GTF_CALL_M_FAT_POINTER_CHECK calls.
+// fgDebugCheckForTransformableIndirectCalls: callback to make sure there
+//  are no more GTF_CALL_M_FAT_POINTER_CHECK or GTF_CALL_M_GUARDED_DEVIRT
+//  calls remaining
 //
-Compiler::fgWalkResult Compiler::fgDebugCheckFatPointerCandidates(GenTree** pTree, fgWalkData* data)
+Compiler::fgWalkResult Compiler::fgDebugCheckForTransformableIndirectCalls(GenTree** pTree, fgWalkData* data)
 {
     GenTree* tree = *pTree;
     if (tree->IsCall())
     {
         assert(!tree->AsCall()->IsFatPointerCandidate());
+        assert(!tree->AsCall()->IsGuardedDevirtualizationCandidate());
     }
     return WALK_CONTINUE;
 }
 
 //------------------------------------------------------------------------
-// CheckNoFatPointerCandidatesLeft: walk through blocks and check that there are no fat pointer candidates left.
+// CheckNoTransformableIndirectCallsRemain: walk through blocks and check
+//    that there are no indirect call candidates left to transform.
 //
-void Compiler::CheckNoFatPointerCandidatesLeft()
+void Compiler::CheckNoTransformableIndirectCallsRemain()
 {
     assert(!doesMethodHaveFatPointer());
+    assert(!doesMethodHaveGuardedDevirtualization());
+
     for (BasicBlock* block = fgFirstBB; block != nullptr; block = block->bbNext)
     {
         for (GenTreeStmt* stmt = fgFirstBB->firstStmt(); stmt != nullptr; stmt = stmt->gtNextStmt)
         {
-            fgWalkTreePre(&stmt->gtStmtExpr, fgDebugCheckFatPointerCandidates);
+            fgWalkTreePre(&stmt->gtStmtExpr, fgDebugCheckForTransformableIndirectCalls);
         }
     }
 }
 #endif
 
 //------------------------------------------------------------------------
-// fgTransformFatCalli: find and transform fat calls.
+// fgTransformIndirectCalls: find and transform various indirect calls
 //
-void Compiler::fgTransformFatCalli()
+// These transformations happen post-import because they may introduce
+// control flow.
+//
+void Compiler::fgTransformIndirectCalls()
 {
-    assert(IsTargetAbi(CORINFO_CORERT_ABI));
-    FatCalliTransformer fatCalliTransformer(this);
-    fatCalliTransformer.Run();
-    clearMethodHasFatPointer();
-#ifdef DEBUG
-    CheckNoFatPointerCandidatesLeft();
-#endif
+    JITDUMP("\n*************** in fgTransformIndirectCalls(%s)\n", compIsForInlining() ? "inlinee" : "root");
+
+    if (doesMethodHaveFatPointer() || doesMethodHaveGuardedDevirtualization())
+    {
+        IndirectCallTransformer indirectCallTransformer(this);
+        int                     count = indirectCallTransformer.Run();
+
+        if (count > 0)
+        {
+            JITDUMP("\n*************** After fgTransformIndirectCalls() [%d calls transformed]\n", count);
+            INDEBUG(if (verbose) { fgDispBasicBlocks(true); });
+        }
+        else
+        {
+            JITDUMP(" -- no transforms done (?)\n");
+        }
+
+        clearMethodHasFatPointer();
+        clearMethodHasGuardedDevirtualization();
+    }
+    else
+    {
+        JITDUMP(" -- no candidates to transform\n");
+    }
+
+    INDEBUG(CheckNoTransformableIndirectCallsRemain(););
 }
 
 //------------------------------------------------------------------------
diff --git a/src/jit/gentree.cpp b/src/jit/gentree.cpp
index 90b6bf896f..542a701d65 100644
--- a/src/jit/gentree.cpp
+++ b/src/jit/gentree.cpp
@@ -7343,8 +7343,9 @@ GenTree* Compiler::gtCloneExpr(
             copy->gtCall.setEntryPoint(tree->gtCall.gtEntryPoint);
 #endif
 
-#ifdef DEBUG
+#if defined(DEBUG) || defined(INLINE_DATA)
             copy->gtCall.gtInlineObservation = tree->gtCall.gtInlineObservation;
+            copy->gtCall.gtRawILOffset       = tree->gtCall.gtRawILOffset;
 #endif
 
             copy->AsCall()->CopyOtherRegFlags(tree->AsCall());
@@ -9334,9 +9335,22 @@ void Compiler::gtDispNode(GenTree* tree, IndentStack* indentStack, __in __in_z _
                 goto DASH;
 
             case GT_CALL:
-                if (tree->gtFlags & GTF_CALL_INLINE_CANDIDATE)
+                if (tree->gtCall.IsInlineCandidate())
                 {
-                    printf("I");
+                    if (tree->gtCall.IsGuardedDevirtualizationCandidate())
+                    {
+                        printf("&");
+                    }
+                    else
+                    {
+                        printf("I");
+                    }
+                    --msgLength;
+                    break;
+                }
+                else if (tree->gtCall.IsGuardedDevirtualizationCandidate())
+                {
+                    printf("G");
                     --msgLength;
                     break;
                 }
diff --git a/src/jit/gentree.h b/src/jit/gentree.h
index c6c5841e98..360fa1b43c 100644
--- a/src/jit/gentree.h
+++ b/src/jit/gentree.h
@@ -144,8 +144,8 @@ enum gtCallTypes : BYTE
 /*****************************************************************************/
 
 struct BasicBlock;
-
 struct InlineCandidateInfo;
+struct GuardedDevirtualizationCandidateInfo;
 
 typedef unsigned short AssertionIndex;
 
@@ -3547,6 +3547,8 @@ struct GenTreeCall final : public GenTree
                                                     // the comma result is unused.
 #define GTF_CALL_M_DEVIRTUALIZED         0x00040000 // GT_CALL -- this call was devirtualized
 #define GTF_CALL_M_UNBOXED               0x00080000 // GT_CALL -- this call was optimized to use the unboxed entry point
+#define GTF_CALL_M_GUARDED_DEVIRT        0x00100000 // GT_CALL -- this call is a candidate for guarded devirtualization
+#define GTF_CALL_M_GUARDED               0x00200000 // GT_CALL -- this call was transformed by guarded devirtualization
 
     // clang-format on
 
@@ -3739,6 +3741,11 @@ struct GenTreeCall final : public GenTree
         return (gtCallMoreFlags & GTF_CALL_M_FAT_POINTER_CHECK) != 0;
     }
 
+    bool IsGuardedDevirtualizationCandidate() const
+    {
+        return (gtCallMoreFlags & GTF_CALL_M_GUARDED_DEVIRT) != 0;
+    }
+
     bool IsPure(Compiler* compiler) const;
 
     bool HasSideEffects(Compiler* compiler, bool ignoreExceptions = false, bool ignoreCctors = false) const;
@@ -3758,11 +3765,31 @@ struct GenTreeCall final : public GenTree
         return (gtCallMoreFlags & GTF_CALL_M_DEVIRTUALIZED) != 0;
     }
 
+    bool IsGuarded() const
+    {
+        return (gtCallMoreFlags & GTF_CALL_M_GUARDED) != 0;
+    }
+
     bool IsUnboxed() const
     {
         return (gtCallMoreFlags & GTF_CALL_M_UNBOXED) != 0;
     }
 
+    void ClearGuardedDevirtualizationCandidate()
+    {
+        gtCallMoreFlags &= ~GTF_CALL_M_GUARDED_DEVIRT;
+    }
+
+    void SetGuardedDevirtualizationCandidate()
+    {
+        gtCallMoreFlags |= GTF_CALL_M_GUARDED_DEVIRT;
+    }
+
+    void SetIsGuarded()
+    {
+        gtCallMoreFlags |= GTF_CALL_M_GUARDED;
+    }
+
     unsigned gtCallMoreFlags; // in addition to gtFlags
 
     unsigned char gtCallType : 3;   // value from the gtCallTypes enumeration
@@ -3774,8 +3801,9 @@ struct GenTreeCall final : public GenTree
         // only used for CALLI unmanaged calls (CT_INDIRECT)
         GenTree* gtCallCookie;
         // gtInlineCandidateInfo is only used when inlining methods
-        InlineCandidateInfo*   gtInlineCandidateInfo;
-        void*                  gtStubCallStubAddr;              // GTF_CALL_VIRT_STUB - these are never inlined
+        InlineCandidateInfo*                  gtInlineCandidateInfo;
+        GuardedDevirtualizationCandidateInfo* gtGuardedDevirtualizationCandidateInfo;
+        void*                                 gtStubCallStubAddr; // GTF_CALL_VIRT_STUB - these are never inlined
         CORINFO_GENERIC_HANDLE compileTimeHelperArgumentHandle; // Used to track type handle argument of dynamic helpers
         void*                  gtDirectCallAddress; // Used to pass direct call address between lower and codegen
     };
diff --git a/src/jit/importer.cpp b/src/jit/importer.cpp
index eabc3bc3b2..0ddf3b34ac 100644
--- a/src/jit/importer.cpp
+++ b/src/jit/importer.cpp
@@ -7740,8 +7740,7 @@ var_types Compiler::impImportCall(OPCODE                  opcode,
                 }
                 else
                 {
-                    // ok, the stub is available at compile type.
-
+                    // The stub address is known at compile time
                     call = gtNewCallNode(CT_USER_FUNC, callInfo->hMethod, callRetTyp, nullptr, ilOffset);
                     call->gtCall.gtStubCallStubAddr = callInfo->stubLookup.constLookup.addr;
                     call->gtFlags |= GTF_CALL_VIRT_STUB;
@@ -8388,8 +8387,9 @@ var_types Compiler::impImportCall(OPCODE                  opcode,
             assert(obj->gtType == TYP_REF);
 
             // See if we can devirtualize.
+            const bool isLateDevirtualization = false;
             impDevirtualizeCall(call->AsCall(), &callInfo->hMethod, &callInfo->methodFlags, &callInfo->contextHandle,
-                                &exactContextHnd);
+                                &exactContextHnd, isLateDevirtualization);
         }
 
         if (impIsThis(obj))
@@ -8717,26 +8717,43 @@ DONE_CALL:
         {
             // Sometimes "call" is not a GT_CALL (if we imported an intrinsic that didn't turn into a call)
 
-            bool fatPointerCandidate = call->AsCall()->IsFatPointerCandidate();
+            GenTreeCall* origCall = call->AsCall();
+
+            const bool isFatPointerCandidate              = origCall->IsFatPointerCandidate();
+            const bool isInlineCandidate                  = origCall->IsInlineCandidate();
+            const bool isGuardedDevirtualizationCandidate = origCall->IsGuardedDevirtualizationCandidate();
+
             if (varTypeIsStruct(callRetTyp))
             {
+                // Need to treat all "split tree" cases here, not just inline candidates
                 call = impFixupCallStructReturn(call->AsCall(), sig->retTypeClass);
             }
 
-            if ((call->gtFlags & GTF_CALL_INLINE_CANDIDATE) != 0)
+            // TODO: consider handling fatcalli cases this way too...?
+            if (isInlineCandidate || isGuardedDevirtualizationCandidate)
             {
+                // We should not have made any adjustments in impFixupCallStructReturn
+                // as we defer those until we know the fate of the call.
+                assert(call == origCall);
+
                 assert(opts.OptEnabled(CLFLG_INLINING));
-                assert(!fatPointerCandidate); // We should not try to inline calli.
+                assert(!isFatPointerCandidate); // We should not try to inline calli.
 
                 // Make the call its own tree (spill the stack if needed).
                 impAppendTree(call, (unsigned)CHECK_SPILL_ALL, impCurStmtOffs);
 
                 // TODO: Still using the widened type.
-                call = gtNewInlineCandidateReturnExpr(call, genActualType(callRetTyp));
+                GenTree* retExpr = gtNewInlineCandidateReturnExpr(call, genActualType(callRetTyp));
+
+                // Link the retExpr to the call so if necessary we can manipulate it later.
+                origCall->gtInlineCandidateInfo->retExpr = retExpr;
+
+                // Propagate retExpr as the placeholder for the call.
+                call = retExpr;
             }
             else
             {
-                if (fatPointerCandidate)
+                if (isFatPointerCandidate)
                 {
                     // fatPointer candidates should be in statements of the form call() or var = call().
                     // Such form allows to find statements with fat calls without walking through whole trees
@@ -18378,7 +18395,7 @@ void Compiler::impCanInlineIL(CORINFO_METHOD_HANDLE fncHandle,
 /*****************************************************************************
  */
 
-void Compiler::impCheckCanInline(GenTree*               call,
+void Compiler::impCheckCanInline(GenTreeCall*           call,
                                  CORINFO_METHOD_HANDLE  fncHandle,
                                  unsigned               methAttr,
                                  CORINFO_CONTEXT_HANDLE exactContextHnd,
@@ -18391,7 +18408,7 @@ void Compiler::impCheckCanInline(GenTree*               call,
     struct Param
     {
         Compiler*              pThis;
-        GenTree*               call;
+        GenTreeCall*           call;
         CORINFO_METHOD_HANDLE  fncHandle;
         unsigned               methAttr;
         CORINFO_CONTEXT_HANDLE exactContextHnd;
@@ -18523,22 +18540,42 @@ void Compiler::impCheckCanInline(GenTree*               call,
                    (varTypeIsStruct(fncRetType) && (fncRealRetType == TYP_STRUCT)));
 #endif
 
+            // Allocate an InlineCandidateInfo structure,
             //
-            // Allocate an InlineCandidateInfo structure
+            // Or, reuse the existing GuardedDevirtualizationCandidateInfo,
+            // which was pre-allocated to have extra room.
             //
             InlineCandidateInfo* pInfo;
-            pInfo = new (pParam->pThis, CMK_Inlining) InlineCandidateInfo;
-
-            pInfo->dwRestrictions       = dwRestrictions;
-            pInfo->methInfo             = methInfo;
-            pInfo->methAttr             = pParam->methAttr;
-            pInfo->clsHandle            = clsHandle;
-            pInfo->clsAttr              = clsAttr;
-            pInfo->fncRetType           = fncRetType;
-            pInfo->exactContextHnd      = pParam->exactContextHnd;
-            pInfo->ilCallerHandle       = pParam->pThis->info.compMethodHnd;
-            pInfo->initClassResult      = initClassResult;
-            pInfo->preexistingSpillTemp = BAD_VAR_NUM;
+
+            if (pParam->call->IsGuardedDevirtualizationCandidate())
+            {
+                pInfo = pParam->call->gtInlineCandidateInfo;
+            }
+            else
+            {
+                pInfo = new (pParam->pThis, CMK_Inlining) InlineCandidateInfo;
+
+                // Null out bits we don't use when we're just inlining
+                pInfo->guardedClassHandle  = nullptr;
+                pInfo->guardedMethodHandle = nullptr;
+                pInfo->stubAddr            = nullptr;
+            }
+
+            pInfo->methInfo                       = methInfo;
+            pInfo->ilCallerHandle                 = pParam->pThis->info.compMethodHnd;
+            pInfo->clsHandle                      = clsHandle;
+            pInfo->exactContextHnd                = pParam->exactContextHnd;
+            pInfo->retExpr                        = nullptr;
+            pInfo->dwRestrictions                 = dwRestrictions;
+            pInfo->preexistingSpillTemp           = BAD_VAR_NUM;
+            pInfo->clsAttr                        = clsAttr;
+            pInfo->methAttr                       = pParam->methAttr;
+            pInfo->initClassResult                = initClassResult;
+            pInfo->fncRetType                     = fncRetType;
+            pInfo->exactContextNeedsRuntimeLookup = false;
+
+            // Note exactContextNeedsRuntimeLookup is reset later on,
+            // over in impMarkInlineCandidate.
 
             *(pParam->ppInlineCandidateInfo) = pInfo;
 
@@ -19488,6 +19525,60 @@ BOOL Compiler::impInlineIsGuaranteedThisDerefBeforeAnySideEffects(GenTree*    ad
 //    callInfo -- call info from VM
 //
 // Notes:
+//    Mostly a wrapper for impMarkInlineCandidateHelper that also undoes
+//    guarded devirtualization for virtual calls where the method we'd
+//    devirtualize to cannot be inlined.
+
+void Compiler::impMarkInlineCandidate(GenTree*               callNode,
+                                      CORINFO_CONTEXT_HANDLE exactContextHnd,
+                                      bool                   exactContextNeedsRuntimeLookup,
+                                      CORINFO_CALL_INFO*     callInfo)
+{
+    GenTreeCall* call = callNode->AsCall();
+
+    // Do the actual evaluation
+    impMarkInlineCandidateHelper(call, exactContextHnd, exactContextNeedsRuntimeLookup, callInfo);
+
+    // If this call is an inline candidate or is not a guarded devirtualization
+    // candidate, we're done.
+    if (call->IsInlineCandidate() || !call->IsGuardedDevirtualizationCandidate())
+    {
+        return;
+    }
+
+    // If we can't inline the call we'd guardedly devirtualize to,
+    // we undo the guarded devirtualization, as the benefit from
+    // just guarded devirtualization alone is likely not worth the
+    // extra jit time and code size.
+    //
+    // TODO: it is possibly interesting to allow this, but requires
+    // fixes elsewhere too...
+    JITDUMP("Revoking guarded devirtualization candidacy for call [%06u]: target method can't be inlined\n",
+            dspTreeID(call));
+
+    call->ClearGuardedDevirtualizationCandidate();
+
+    // If we have a stub address, restore it back into the union that it shares
+    // with the candidate info.
+    if (call->IsVirtualStub())
+    {
+        JITDUMP("Restoring stub addr %p from guarded devirt candidate info\n",
+                call->gtGuardedDevirtualizationCandidateInfo->stubAddr);
+        call->gtStubCallStubAddr = call->gtGuardedDevirtualizationCandidateInfo->stubAddr;
+    }
+}
+
+//------------------------------------------------------------------------
+// impMarkInlineCandidateHelper: determine if this call can be subsequently
+//     inlined
+//
+// Arguments:
+//    callNode -- call under scrutiny
+//    exactContextHnd -- context handle for inlining
+//    exactContextNeedsRuntimeLookup -- true if context required runtime lookup
+//    callInfo -- call info from VM
+//
+// Notes:
 //    If callNode is an inline candidate, this method sets the flag
 //    GTF_CALL_INLINE_CANDIDATE, and ensures that helper methods have
 //    filled in the associated InlineCandidateInfo.
@@ -19497,10 +19588,10 @@ BOOL Compiler::impInlineIsGuaranteedThisDerefBeforeAnySideEffects(GenTree*    ad
 //    method may be marked as "noinline" to short-circuit any
 //    future assessments of calls to this method.
 
-void Compiler::impMarkInlineCandidate(GenTree*               callNode,
-                                      CORINFO_CONTEXT_HANDLE exactContextHnd,
-                                      bool                   exactContextNeedsRuntimeLookup,
-                                      CORINFO_CALL_INFO*     callInfo)
+void Compiler::impMarkInlineCandidateHelper(GenTreeCall*           call,
+                                            CORINFO_CONTEXT_HANDLE exactContextHnd,
+                                            bool                   exactContextNeedsRuntimeLookup,
+                                            CORINFO_CALL_INFO*     callInfo)
 {
     // Let the strategy know there's another call
     impInlineRoot()->m_inlineStrategy->NoteCall();
@@ -19525,7 +19616,6 @@ void Compiler::impMarkInlineCandidate(GenTree*               callNode,
         return;
     }
 
-    GenTreeCall* call = callNode->AsCall();
     InlineResult inlineResult(this, call, nullptr, "impMarkInlineCandidate");
 
     // Don't inline if not optimizing root method
@@ -19562,14 +19652,20 @@ void Compiler::impMarkInlineCandidate(GenTree*               callNode,
 
     if (call->IsVirtual())
     {
-        inlineResult.NoteFatal(InlineObservation::CALLSITE_IS_NOT_DIRECT);
-        return;
+        // Allow guarded devirt calls to be treated as inline candidates,
+        // but reject all other virtual calls.
+        if (!call->IsGuardedDevirtualizationCandidate())
+        {
+            inlineResult.NoteFatal(InlineObservation::CALLSITE_IS_NOT_DIRECT);
+            return;
+        }
     }
 
     /* Ignore helper calls */
 
     if (call->gtCallType == CT_HELPER)
     {
+        assert(!call->IsGuardedDevirtualizationCandidate());
         inlineResult.NoteFatal(InlineObservation::CALLSITE_IS_CALL_TO_HELPER);
         return;
     }
@@ -19585,17 +19681,27 @@ void Compiler::impMarkInlineCandidate(GenTree*               callNode,
      * restricts the inliner to non-expanding inlines.  I removed the check to allow for non-expanding
      * inlining in throw blocks.  I should consider the same thing for catch and filter regions. */
 
-    CORINFO_METHOD_HANDLE fncHandle = call->gtCallMethHnd;
+    CORINFO_METHOD_HANDLE fncHandle;
     unsigned              methAttr;
 
-    // Reuse method flags from the original callInfo if possible
-    if (fncHandle == callInfo->hMethod)
+    if (call->IsGuardedDevirtualizationCandidate())
     {
-        methAttr = callInfo->methodFlags;
+        fncHandle = call->gtGuardedDevirtualizationCandidateInfo->guardedMethodHandle;
+        methAttr  = info.compCompHnd->getMethodAttribs(fncHandle);
     }
     else
     {
-        methAttr = info.compCompHnd->getMethodAttribs(fncHandle);
+        fncHandle = call->gtCallMethHnd;
+
+        // Reuse method flags from the original callInfo if possible
+        if (fncHandle == callInfo->hMethod)
+        {
+            methAttr = callInfo->methodFlags;
+        }
+        else
+        {
+            methAttr = info.compCompHnd->getMethodAttribs(fncHandle);
+        }
     }
 
 #ifdef DEBUG
@@ -19695,14 +19801,13 @@ void Compiler::impMarkInlineCandidate(GenTree*               callNode,
         return;
     }
 
-    // The old value should be NULL
-    assert(call->gtInlineCandidateInfo == nullptr);
+    // The old value should be null OR this call should be a guarded devirtualization candidate.
+    assert((call->gtInlineCandidateInfo == nullptr) || call->IsGuardedDevirtualizationCandidate());
 
-    // The new value should not be NULL.
+    // The new value should not be null.
     assert(inlineCandidateInfo != nullptr);
     inlineCandidateInfo->exactContextNeedsRuntimeLookup = exactContextNeedsRuntimeLookup;
-
-    call->gtInlineCandidateInfo = inlineCandidateInfo;
+    call->gtInlineCandidateInfo                         = inlineCandidateInfo;
 
     // Mark the call node as inline candidate.
     call->gtFlags |= GTF_CALL_INLINE_CANDIDATE;
@@ -19837,6 +19942,7 @@ bool Compiler::IsMathIntrinsic(GenTree* tree)
 //     methodFlags -- [IN/OUT] flags for the method to call. Updated iff call devirtualized.
 //     contextHandle -- [IN/OUT] context handle for the call. Updated iff call devirtualized.
 //     exactContextHnd -- [OUT] updated context handle iff call devirtualized
+//     isLateDevirtualization -- if devirtualization is happening after importation
 //
 // Notes:
 //     Virtual calls in IL will always "invoke" the base class method.
@@ -19861,11 +19967,16 @@ bool Compiler::IsMathIntrinsic(GenTree* tree)
 //     to instead make a local copy. If that is doable, the call is
 //     updated to invoke the unboxed entry on the local copy.
 //
+//     When guarded devirtualization is enabled, this method will mark
+//     calls as guarded devirtualization candidates, if the type of `this`
+//     is not exactly known, and there is a plausible guess for the type.
+
 void Compiler::impDevirtualizeCall(GenTreeCall*            call,
                                    CORINFO_METHOD_HANDLE*  method,
                                    unsigned*               methodFlags,
                                    CORINFO_CONTEXT_HANDLE* contextHandle,
-                                   CORINFO_CONTEXT_HANDLE* exactContextHandle)
+                                   CORINFO_CONTEXT_HANDLE* exactContextHandle,
+                                   bool                    isLateDevirtualization)
 {
     assert(call != nullptr);
     assert(method != nullptr);
@@ -20031,16 +20142,67 @@ void Compiler::impDevirtualizeCall(GenTreeCall*            call,
     }
 #endif // defined(DEBUG)
 
-    // Bail if obj class is an interface.
+    // See if the jit's best type for `obj` is an interface.
     // See for instance System.ValueTuple`8::GetHashCode, where lcl 0 is System.IValueTupleInternal
     //   IL_021d:  ldloc.0
     //   IL_021e:  callvirt   instance int32 System.Object::GetHashCode()
+    //
+    // If so, we can't devirtualize, but we may be able to do guarded devirtualization.
     if ((objClassAttribs & CORINFO_FLG_INTERFACE) != 0)
     {
-        JITDUMP("--- obj class is interface, sorry\n");
+        // If we're called during early devirtualiztion, attempt guarded devirtualization
+        // if there's currently just one implementing class.
+        if (exactContextHandle == nullptr)
+        {
+            JITDUMP("--- obj class is interface...unable to dervirtualize, sorry\n");
+            return;
+        }
+
+        CORINFO_CLASS_HANDLE uniqueImplementingClass = NO_CLASS_HANDLE;
+
+        // info.compCompHnd->getUniqueImplementingClass(objClass);
+
+        if (uniqueImplementingClass == NO_CLASS_HANDLE)
+        {
+            JITDUMP("No unique implementor of interface %p (%s), sorry\n", objClass, objClassName);
+            return;
+        }
+
+        JITDUMP("Only known implementor of interface %p (%s) is %p (%s)!\n", objClass, objClassName,
+                uniqueImplementingClass, eeGetClassName(uniqueImplementingClass));
+
+        bool guessUniqueInterface = true;
+
+        INDEBUG(guessUniqueInterface = (JitConfig.JitGuardedDevirtualizationGuessUniqueInterface() > 0););
+
+        if (!guessUniqueInterface)
+        {
+            JITDUMP("Guarded devirt for unique interface implementor is not enabled, sorry\n");
+            return;
+        }
+
+        // Ask the runtime to determine the method that would be called based on the guessed-for type.
+        CORINFO_CONTEXT_HANDLE ownerType = *contextHandle;
+        CORINFO_METHOD_HANDLE  uniqueImplementingMethod =
+            info.compCompHnd->resolveVirtualMethod(baseMethod, uniqueImplementingClass, ownerType);
+
+        if (uniqueImplementingMethod == nullptr)
+        {
+            JITDUMP("Can't figure out which method would be invoked, sorry\n");
+            return;
+        }
+
+        JITDUMP("Interface call would invoke method %s\n", eeGetMethodName(uniqueImplementingMethod, nullptr));
+        DWORD uniqueMethodAttribs = info.compCompHnd->getMethodAttribs(uniqueImplementingMethod);
+        DWORD uniqueClassAttribs  = info.compCompHnd->getClassAttribs(uniqueImplementingClass);
+
+        addGuardedDevirtualizationCandidate(call, uniqueImplementingMethod, uniqueImplementingClass,
+                                            uniqueMethodAttribs, uniqueClassAttribs);
         return;
     }
 
+    // If we get this far, the jit has a lower bound class type for the `this` object being used for dispatch.
+    // It may or may not know enough to devirtualize...
     if (isInterface)
     {
         assert(call->IsVirtualStub());
@@ -20054,6 +20216,9 @@ void Compiler::impDevirtualizeCall(GenTreeCall*            call,
     // If we failed to get a handle, we can't devirtualize.  This can
     // happen when prejitting, if the devirtualization crosses
     // servicing bubble boundaries.
+    //
+    // Note if we have some way of guessing a better and more likely type we can do something similar to the code
+    // above for the case where the best jit type is an interface type.
     if (derivedMethod == nullptr)
     {
         JITDUMP("--- no derived method, sorry\n");
@@ -20068,7 +20233,7 @@ void Compiler::impDevirtualizeCall(GenTreeCall*            call,
     const char* derivedClassName  = "?derivedClass";
     const char* derivedMethodName = "?derivedMethod";
 
-    const char* note = "speculative";
+    const char* note = "inexact or not final";
     if (isExact)
     {
         note = "exact";
@@ -20093,29 +20258,41 @@ void Compiler::impDevirtualizeCall(GenTreeCall*            call,
     }
 #endif // defined(DEBUG)
 
-    if (!isExact && !objClassIsFinal && !derivedMethodIsFinal)
-    {
-        // Type is not exact, and neither class or method is final.
-        //
-        // We could speculatively devirtualize, but there's no
-        // reason to believe the derived method is the one that
-        // is likely to be invoked.
-        //
-        // If there's currently no further overriding (that is, at
-        // the time of jitting, objClass has no subclasses that
-        // override this method), then perhaps we'd be willing to
-        // make a bet...?
-        JITDUMP("    Class not final or exact, method not final, no devirtualization\n");
-        return;
-    }
+    const bool canDevirtualize = isExact || objClassIsFinal || (!isInterface && derivedMethodIsFinal);
 
-    // For interface calls we must have an exact type or final class.
-    if (isInterface && !isExact && !objClassIsFinal)
+    if (!canDevirtualize)
     {
-        JITDUMP("    Class not final or exact for interface, no devirtualization\n");
+        JITDUMP("    Class not final or exact%s\n", isInterface ? "" : ", and method not final");
+
+        // Have we enabled guarded devirtualization by guessing the jit's best class?
+        bool guessJitBestClass = true;
+        INDEBUG(guessJitBestClass = (JitConfig.JitGuardedDevirtualizationGuessBestClass() > 0););
+
+        if (!guessJitBestClass)
+        {
+            JITDUMP("No guarded devirt: guessing for jit best class disabled\n");
+            return;
+        }
+
+        // Don't try guarded devirtualiztion when we're doing late devirtualization.
+        if (isLateDevirtualization)
+        {
+            JITDUMP("No guarded devirt during late devirtualization\n");
+            return;
+        }
+
+        // We will use the class that introduced the method as our guess
+        // for the runtime class of othe object.
+        CORINFO_CLASS_HANDLE derivedClass = info.compCompHnd->getMethodClass(derivedMethod);
+
+        // Try guarded devirtualization.
+        addGuardedDevirtualizationCandidate(call, derivedMethod, derivedClass, derivedMethodAttribs, objClassAttribs);
         return;
     }
 
+    // All checks done. Time to transform the call.
+    assert(canDevirtualize);
+
     JITDUMP("    %s; can devirtualize\n", note);
 
     // Make the updates.
@@ -20429,10 +20606,25 @@ private:
     {
         GenTree* retExpr = *pRetExpr;
         assert(retExpr->OperGet() == GT_RET_EXPR);
-        JITDUMP("Store return expression %u  as a local var.\n", retExpr->gtTreeID);
-        unsigned tmp = comp->lvaGrabTemp(true DEBUGARG("spilling ret_expr"));
+        const unsigned tmp = comp->lvaGrabTemp(true DEBUGARG("spilling ret_expr"));
+        JITDUMP("Storing return expression [%06u] to a local var V%02u.\n", comp->dspTreeID(retExpr), tmp);
         comp->impAssignTempGen(tmp, retExpr, (unsigned)Compiler::CHECK_SPILL_NONE);
         *pRetExpr = comp->gtNewLclvNode(tmp, retExpr->TypeGet());
+
+        if (retExpr->TypeGet() == TYP_REF)
+        {
+            assert(comp->lvaTable[tmp].lvSingleDef == 0);
+            comp->lvaTable[tmp].lvSingleDef = 1;
+            JITDUMP("Marked V%02u as a single def temp\n", tmp);
+
+            bool                 isExact   = false;
+            bool                 isNonNull = false;
+            CORINFO_CLASS_HANDLE retClsHnd = comp->gtGetClassHandle(retExpr, &isExact, &isNonNull);
+            if (retClsHnd != nullptr)
+            {
+                comp->lvaSetClass(tmp, retClsHnd, isExact);
+            }
+        }
     }
 
 private:
@@ -20448,8 +20640,108 @@ private:
 //
 void Compiler::addFatPointerCandidate(GenTreeCall* call)
 {
+    JITDUMP("Marking call [%06u] as fat pointer candidate\n", dspTreeID(call));
     setMethodHasFatPointer();
     call->SetFatPointerCandidate();
     SpillRetExprHelper helper(this);
     helper.StoreRetExprResultsInArgs(call);
 }
+
+//------------------------------------------------------------------------
+// addGuardedDevirtualizationCandidate: potentially mark the call as a guarded
+//    devirtualization candidate
+//
+// Notes:
+//
+// We currently do not mark calls as candidates when prejitting. This was done
+// to simplify bringing up the associated transformation. It is worth revisiting
+// if we think we can come up with a good guess for the class when prejitting.
+//
+// Call sites in rare or unoptimized code, and calls that require cookies are
+// also not marked as candidates.
+//
+// As part of marking the candidate, the code spills GT_RET_EXPRs anywhere in any
+// child tree, because and we need to clone all these trees when we clone the call
+// as part of guarded devirtualization, and these IR nodes can't be cloned.
+//
+// Arguments:
+//    call - potentual guarded devirtialization candidate
+//    methodHandle - method that will be invoked if the class test succeeds
+//    classHandle - class that will be tested for at runtime
+//    methodAttr - attributes of the method
+//    classAttr - attributes of the class
+//
+void Compiler::addGuardedDevirtualizationCandidate(GenTreeCall*          call,
+                                                   CORINFO_METHOD_HANDLE methodHandle,
+                                                   CORINFO_CLASS_HANDLE  classHandle,
+                                                   unsigned              methodAttr,
+                                                   unsigned              classAttr)
+{
+    // This transformation only makes sense for virtual calls
+    assert(call->IsVirtual());
+
+    // Only mark calls if the feature is enabled.
+    const bool isEnabled = JitConfig.JitEnableGuardedDevirtualization() > 0;
+
+    if (!isEnabled)
+    {
+        JITDUMP("NOT Marking call [%06u] as guarded devirtualization candidate -- disabled by jit config\n",
+                dspTreeID(call));
+        return;
+    }
+
+    // Bail when prejitting. We only do this for jitted code.
+    // We shoud revisit this if we think we can come up with good class guesses when prejitting.
+    if (opts.jitFlags->IsSet(JitFlags::JIT_FLAG_PREJIT))
+    {
+        JITDUMP("NOT Marking call [%06u] as guarded devirtualization candidate -- prejitting", dspTreeID(call));
+        return;
+    }
+
+    // Bail if not optimizing or the call site is very likely cold
+    if (compCurBB->isRunRarely() || opts.compDbgCode || opts.MinOpts())
+    {
+        JITDUMP("NOT Marking call [%06u] as guarded devirtualization candidate -- rare / dbg / minopts\n",
+                dspTreeID(call));
+        return;
+    }
+
+    // CT_INDRECT calls may use the cookie, bail if so...
+    //
+    // If transforming these provides a benefit, we could save this off in the same way
+    // we save the stub address below.
+    if ((call->gtCallType == CT_INDIRECT) && (call->gtCall.gtCallCookie != nullptr))
+    {
+        return;
+    }
+
+    // We're all set, proceed with candidate creation.
+    JITDUMP("Marking call [%06u] as guarded devirtualization candidate; will guess for class %s\n", dspTreeID(call),
+            eeGetClassName(classHandle));
+    setMethodHasGuardedDevirtualization();
+    call->SetGuardedDevirtualizationCandidate();
+
+    // Spill off any GT_RET_EXPR subtrees so we can clone the call.
+    SpillRetExprHelper helper(this);
+    helper.StoreRetExprResultsInArgs(call);
+
+    // Gather some information for later. Note we actually allocate InlineCandidateInfo
+    // here, as the devirtualized half of this call will likely become an inline candidate.
+    GuardedDevirtualizationCandidateInfo* pInfo = new (this, CMK_Inlining) InlineCandidateInfo;
+
+    pInfo->guardedMethodHandle = methodHandle;
+    pInfo->guardedClassHandle  = classHandle;
+
+    // Save off the stub address since it shares a union with the candidate info.
+    if (call->IsVirtualStub())
+    {
+        JITDUMP("Saving stub addr %p in candidate info\n", call->gtStubCallStubAddr);
+        pInfo->stubAddr = call->gtStubCallStubAddr;
+    }
+    else
+    {
+        pInfo->stubAddr = nullptr;
+    }
+
+    call->gtGuardedDevirtualizationCandidateInfo = pInfo;
+}
diff --git a/src/jit/inline.cpp b/src/jit/inline.cpp
index a1064762ce..f523d2abbe 100644
--- a/src/jit/inline.cpp
+++ b/src/jit/inline.cpp
@@ -337,6 +337,7 @@ InlineContext::InlineContext(InlineStrategy* strategy)
     , m_CodeSizeEstimate(0)
     , m_Success(true)
     , m_Devirtualized(false)
+    , m_Guarded(false)
     , m_Unboxed(false)
 #if defined(DEBUG) || defined(INLINE_DATA)
     , m_Policy(nullptr)
@@ -397,18 +398,19 @@ void InlineContext::Dump(unsigned indent)
         const char* inlineReason  = InlGetObservationString(m_Observation);
         const char* inlineResult  = m_Success ? "" : "FAILED: ";
         const char* devirtualized = m_Devirtualized ? " devirt" : "";
+        const char* guarded       = m_Guarded ? " guarded" : "";
         const char* unboxed       = m_Unboxed ? " unboxed" : "";
 
         if (m_Offset == BAD_IL_OFFSET)
         {
-            printf("%*s[%u IL=???? TR=%06u %08X] [%s%s%s%s] %s\n", indent, "", m_Ordinal, m_TreeID, calleeToken,
-                   inlineResult, inlineReason, devirtualized, unboxed, calleeName);
+            printf("%*s[%u IL=???? TR=%06u %08X] [%s%s%s%s%s] %s\n", indent, "", m_Ordinal, m_TreeID, calleeToken,
+                   inlineResult, inlineReason, guarded, devirtualized, unboxed, calleeName);
         }
         else
         {
             IL_OFFSET offset = jitGetILoffs(m_Offset);
-            printf("%*s[%u IL=%04d TR=%06u %08X] [%s%s%s%s] %s\n", indent, "", m_Ordinal, offset, m_TreeID, calleeToken,
-                   inlineResult, inlineReason, devirtualized, unboxed, calleeName);
+            printf("%*s[%u IL=%04d TR=%06u %08X] [%s%s%s%s%s] %s\n", indent, "", m_Ordinal, offset, m_TreeID,
+                   calleeToken, inlineResult, inlineReason, guarded, devirtualized, unboxed, calleeName);
         }
     }
 
@@ -1205,6 +1207,7 @@ InlineContext* InlineStrategy::NewSuccess(InlineInfo* inlineInfo)
     calleeContext->m_Observation   = inlineInfo->inlineResult->GetObservation();
     calleeContext->m_Success       = true;
     calleeContext->m_Devirtualized = originalCall->IsDevirtualized();
+    calleeContext->m_Guarded       = originalCall->IsGuarded();
     calleeContext->m_Unboxed       = originalCall->IsUnboxed();
 
 #if defined(DEBUG) || defined(INLINE_DATA)
@@ -1266,6 +1269,7 @@ InlineContext* InlineStrategy::NewFailure(GenTreeStmt* stmt, InlineResult* inlin
     failedContext->m_Callee        = inlineResult->GetCallee();
     failedContext->m_Success       = false;
     failedContext->m_Devirtualized = originalCall->IsDevirtualized();
+    failedContext->m_Guarded       = originalCall->IsGuarded();
     failedContext->m_Unboxed       = originalCall->IsUnboxed();
 
     assert(InlIsValidObservation(failedContext->m_Observation));
diff --git a/src/jit/inline.h b/src/jit/inline.h
index 551341e2e6..e94765a74c 100644
--- a/src/jit/inline.h
+++ b/src/jit/inline.h
@@ -494,22 +494,36 @@ private:
     bool                  m_Reported;
 };
 
+// GuardedDevirtualizationCandidateInfo provides information about
+// a potential target of a virtual call.
+
+struct GuardedDevirtualizationCandidateInfo
+{
+    CORINFO_CLASS_HANDLE  guardedClassHandle;
+    CORINFO_METHOD_HANDLE guardedMethodHandle;
+    void*                 stubAddr;
+};
+
 // InlineCandidateInfo provides basic information about a particular
 // inline candidate.
+//
+// It is a superset of GuardedDevirtualizationCandidateInfo: calls
+// can start out as GDv candidates and turn into inline candidates
 
-struct InlineCandidateInfo
+struct InlineCandidateInfo : public GuardedDevirtualizationCandidateInfo
 {
-    DWORD                  dwRestrictions;
     CORINFO_METHOD_INFO    methInfo;
-    unsigned               methAttr;
+    CORINFO_METHOD_HANDLE  ilCallerHandle; // the logical IL caller of this inlinee.
     CORINFO_CLASS_HANDLE   clsHandle;
+    CORINFO_CONTEXT_HANDLE exactContextHnd;
+    GenTree*               retExpr;
+    DWORD                  dwRestrictions;
+    unsigned               preexistingSpillTemp;
     unsigned               clsAttr;
+    unsigned               methAttr;
+    CorInfoInitClassResult initClassResult;
     var_types              fncRetType;
-    CORINFO_METHOD_HANDLE  ilCallerHandle; // the logical IL caller of this inlinee.
-    CORINFO_CONTEXT_HANDLE exactContextHnd;
     bool                   exactContextNeedsRuntimeLookup;
-    CorInfoInitClassResult initClassResult;
-    unsigned               preexistingSpillTemp;
 };
 
 // InlArgInfo describes inline candidate argument properties.
@@ -683,6 +697,11 @@ public:
         return m_Devirtualized;
     }
 
+    bool IsGuarded() const
+    {
+        return m_Guarded;
+    }
+
     bool IsUnboxed() const
     {
         return m_Unboxed;
@@ -703,6 +722,7 @@ private:
     int               m_CodeSizeEstimate;  // in bytes * 10
     bool              m_Success : 1;       // true if this was a successful inline
     bool              m_Devirtualized : 1; // true if this was a devirtualized call
+    bool              m_Guarded : 1;       // true if this was a guarded call
     bool              m_Unboxed : 1;       // true if this call now invokes the unboxed entry
 
 #if defined(DEBUG) || defined(INLINE_DATA)
diff --git a/src/jit/jitconfigvalues.h b/src/jit/jitconfigvalues.h
index 383264e6a3..649db4ebe6 100644
--- a/src/jit/jitconfigvalues.h
+++ b/src/jit/jitconfigvalues.h
@@ -363,6 +363,15 @@ CONFIG_INTEGER(JitEnableRemoveEmptyTry, W("JitEnableRemoveEmptyTry"), 0)
 #endif // defined(FEATURE_CORECLR)
 #endif // DEBUG
 
+// Overall master enable for Guarded Devirtualization. Currently not enabled by default.
+CONFIG_INTEGER(JitEnableGuardedDevirtualization, W("JitEnableGuardedDevirtualization"), 0)
+
+#if defined(DEBUG)
+// Various policies for GuardedDevirtualization
+CONFIG_INTEGER(JitGuardedDevirtualizationGuessUniqueInterface, W("JitGuardedDevirtualizationGuessUniqueInterface"), 1)
+CONFIG_INTEGER(JitGuardedDevirtualizationGuessBestClass, W("JitGuardedDevirtualizationGuessBestClass"), 1)
+#endif // DEBUG
+
 #undef CONFIG_INTEGER
 #undef CONFIG_STRING
 #undef CONFIG_METHODSET
diff --git a/src/jit/lclvars.cpp b/src/jit/lclvars.cpp
index 8ce2b5956b..339f7a5ff7 100644
--- a/src/jit/lclvars.cpp
+++ b/src/jit/lclvars.cpp
@@ -2746,6 +2746,11 @@ void Compiler::lvaUpdateClass(unsigned varNum, CORINFO_CLASS_HANDLE clsHnd, bool
     {
         varDsc->lvClassHnd     = clsHnd;
         varDsc->lvClassIsExact = isExact;
+
+#if DEBUG
+        // Note we've modified the type...
+        varDsc->lvClassInfoUpdated = true;
+#endif // DEBUG
     }
 
     return;
diff --git a/src/jit/morph.cpp b/src/jit/morph.cpp
index 20e378ed16..566714f276 100644
--- a/src/jit/morph.cpp
+++ b/src/jit/morph.cpp
@@ -6614,42 +6614,59 @@ GenTree* Compiler::fgMorphField(GenTree* tree, MorphAddrContext* mac)
 
 void Compiler::fgMorphCallInline(GenTreeCall* call, InlineResult* inlineResult)
 {
-    // The call must be a candiate for inlining.
-    assert((call->gtFlags & GTF_CALL_INLINE_CANDIDATE) != 0);
+    bool inliningFailed = false;
 
-    // Attempt the inline
-    fgMorphCallInlineHelper(call, inlineResult);
+    // Is this call an inline candidate?
+    if (call->IsInlineCandidate())
+    {
+        // Attempt the inline
+        fgMorphCallInlineHelper(call, inlineResult);
 
-    // We should have made up our minds one way or another....
-    assert(inlineResult->IsDecided());
+        // We should have made up our minds one way or another....
+        assert(inlineResult->IsDecided());
 
-    // If we failed to inline, we have a bit of work to do to cleanup
-    if (inlineResult->IsFailure())
-    {
+        // If we failed to inline, we have a bit of work to do to cleanup
+        if (inlineResult->IsFailure())
+        {
 
 #ifdef DEBUG
 
-        // Before we do any cleanup, create a failing InlineContext to
-        // capture details of the inlining attempt.
-        m_inlineStrategy->NewFailure(fgMorphStmt, inlineResult);
+            // Before we do any cleanup, create a failing InlineContext to
+            // capture details of the inlining attempt.
+            m_inlineStrategy->NewFailure(fgMorphStmt, inlineResult);
 
 #endif
 
-        // It was an inline candidate, but we haven't expanded it.
-        if (call->gtCall.gtReturnType != TYP_VOID)
+            inliningFailed = true;
+
+            // Clear the Inline Candidate flag so we can ensure later we tried
+            // inlining all candidates.
+            //
+            call->gtFlags &= ~GTF_CALL_INLINE_CANDIDATE;
+        }
+    }
+    else
+    {
+        // This wasn't an inline candidate. So it must be a GDV candidate.
+        assert(call->IsGuardedDevirtualizationCandidate());
+
+        // We already know we can't inline this call, so don't even bother to try.
+        inliningFailed = true;
+    }
+
+    // If we failed to inline (or didn't even try), do some cleanup.
+    if (inliningFailed)
+    {
+        if (call->gtReturnType != TYP_VOID)
         {
+            JITDUMP("Inlining [%06u] failed, so bashing [%06u] to NOP\n", dspTreeID(call), dspTreeID(fgMorphStmt));
+
             // Detach the GT_CALL tree from the original statement by
             // hanging a "nothing" node to it. Later the "nothing" node will be removed
             // and the original GT_CALL tree will be picked up by the GT_RET_EXPR node.
-
             noway_assert(fgMorphStmt->gtStmtExpr == call);
             fgMorphStmt->gtStmtExpr = gtNewNothingNode();
         }
-
-        // Clear the Inline Candidate flag so we can ensure later we tried
-        // inlining all candidates.
-        //
-        call->gtFlags &= ~GTF_CALL_INLINE_CANDIDATE;
     }
 }
 
@@ -6682,6 +6699,13 @@ void Compiler::fgMorphCallInlineHelper(GenTreeCall* call, InlineResult* result)
         return;
     }
 
+    // Re-check this because guarded devirtualization may allow these through.
+    if (gtIsRecursiveCall(call) && call->IsImplicitTailCall())
+    {
+        result->NoteFatal(InlineObservation::CALLSITE_IMPLICIT_REC_TAIL_CALL);
+        return;
+    }
+
     // impMarkInlineCandidate() is expected not to mark tail prefixed calls
     // and recursive tail calls as inline candidates.
     noway_assert(!call->IsTailPrefixedCall());