Age | Commit message (Collapse) | Author | Files | Lines |
|
This change is the result of running clang-tidy and clang-format on jit
sources.
|
|
Now that we can trust `optMethodFlags`, note if the root caller has
newarr or newobj. We can't yet tell if these operations might feed
a particular call's arguments but their presence in the root method
alone might be enough to build correlations with performance changes.
Count number of returns in the callee. Note if callee has same this
as the (root) caller. Note when callee is a class constructor.
Add code to dump out the new observations. No policies act on these yet.
|
|
Methods that do not contain return blocks can't ever exit normally, they either throw or loop indefinitely. Inlining is not beneficial in such cases as it increases the code size without providing any speed benefits. In the particular case of throws the inlined code can easily be 10 times larger than the call site.
The call to fgMoreThanOneReturnBlock has been replaced with code that does the same thing but also detects the existence of at least one return block. This avoids walking the basic block list twice.
Note that BBJ_RETURN blocks are also generated for CEE_JMP. Methods exiting via CEE_JMP instead of CEE_RET will continue to be inlined (assuming they were inlined before this change).
|
|
Two related changes to better capture cases where args or constant args
are seen at inline call sites.
On the observation side, the inliner's stack modelling of the callee is
crude and will often overestimate the evaluation stack depth. So when
looking at an opcode that takes just one stack operand (eg BRFALSE) the
inliner's check should be whether the stack has at least one element
instead of checking whether the stack has exactly one element. For
compatibility reasons, the check for exactly one element is still used
when the inline is evaluated via the LegacyPolicy.
On the policy side, instead of keeping a bool flag to note that there
were interesting arg cases, count the number of observations.
LegacyPolicy continues to act as before, checking if count is zero or
nonzero instead of whether the flag was false or true. The count is
available for use in other heuristics and is reported in inline data.
|
|
Updates to bring CS and TP impact of the ModelPolicy into more acceptable
ranges.
For CS, reduce the call site weights to values that are more in keeping
with the legacy policy weights. Local test runs show this does not
drastically alter CQ and brings CS down below LegacyPolicy levels, on
average.
Implement an early out rejection based solely on ILSize. The threshold
value is set by conservatively determining when ILSize alone indicates
the method in question will never be inlined (note the policy itself
does not have an explicit ILSize cutoff). See comments for
`ModelPolicy::NoteInt` for details. Note if we adjust the model's size
and profitability estimates, this threshould will also need updating.
CQ (as measured by the CoreCLR perf tests) continues to show about a
2.5% geomean improvement over LegacyPolicy.
|
|
Make the ModelPolicy available in release builds of the jit. Enable this
policy by setting:
COMPLUS_JitInlinePolicyModel=1
This works for either jitting or prejitting.
Update the ModelPolicy so callee never inline decisions are always
propagated back to the runtime.
|
|
Add block weight as observation. At first blush I suspect this will be
useful only when the jit has IBC data, and even then only in the root
method blocks.
We'll likely need to either build up better profile estimates for the
inliner or else just try and cope without it somehow.
Dump out block weight and the per-call profitability in the observation
schema and data. Among other things this lets us cross-validate how well
the ModelPolicy predictions match the abstract GLMNET models...
|
|
First (and very preliminary) cut at a profitability model, based
on inline data modelling.
ModelPolicy heuristic is updated to use this information to decide
when to inline. The policy has two parameters: one for the local
call site weight and another for the overall size/speed tradeoff
that we find acceptable. The first parameter is temporary as we
should be able to model this weight via observations.
Size/speed tradeoff is a policy decision and is a tuning parameter.
I've set it for now to roughly match the LegacyPolicy behavior, in
aggregate, across the jit perf benchmarks.
|
|
ReplayPolicy only impacts discretionary inlines, so could not control
force inline candidates. This has lead to some confusion during inline
studies.
Since most cases of force inline are performance related, allow replay
to override and control these force inline candidates.
This might be useful down the road to study how likely it is that a force
inline directive is either unnecessary or perhaps even detremental to
performance.
|
|
Replay needs the ability to distinguish among multiple calls to the same
callee. The IL offset of the call site can be used for this, so start
checking for it during replay.
Note there is subsequent follow-up work to ensure that we actually track
offsets for calls.
Refactor the way the ReplayPolicy gets notified of the current inline
context (and the IL offset of the call). Instead of trying to pass
this information as arguments to the factory method that creates
policies, add special notifications for this information that only the
ReplayPolicy will act upon.
|
|
Move CritSecObject into util.h, and use it to lock around reading
and writing inline Xml. Introduce CritSecHolder for RAII management
of the locks.
Add a simple file position cache for methods to speed up replay
when the inline xml file is large.
|
|
Emit the hash for callees, and check it during replay.
Capture the root method name, and do a simple Xml-safe conversion.
Add a missing newline to the banner message announcing that the replay
policy is in effect.
|
|
The ReplayPolicy reads an external script to determine which inlines
to perform. The script is the same Xml syntax that's produced by
the inliner when JitInlineDumpXml is enabled. This format can be edited
by hand or tool to force particular inlining patterns to occur. Methods
or calls sites not mentioned in the script are considered as noinline.
There's a bunch of work still left to make this fully robust, but in
testing it works well enough for my immediate use case that I'll hold
off on further polish until it's needed. But, for future reference,
here's a laundry list:
* Need better ways to identify methods. Token and hash are not enough.
* Need better ways to identify call sites. Callee token is not enough.
* Consider preparsing or mapping the script into memory.
* Consider caching node positions in the InlineContexts.
* Make it robust for multithreading.
* Handle the prejit root case somehow.
* Possibly allow overriding of inline attributes.
|
|
Add a new inline policy that tries to inline as much as possible
without increasing method size.
Fix one test that had an expected inline that doesn't happen under
the SizePolicy, by marking a method with AggressiveInline.
|
|
The FullPolicy will be used to grow maximal potential inline trees
for methods. It inlines all legal candidates subject to depth and
size limits. Add a config flag to select this as the policy to use.
Also add a config flag so the inline depth limit becomes adjustable,
and rework the code so that checking this limit is now a matter of
policy. Add in an implementation max depth limit of 1000.
Revise names of some statics to make their intent clearer, and add some
missing header comments.
|
|
Accumulate the per-inline size impact estimates into an overall inline
size impact estimate. No size budgeting (yet); this just provides data
for further study.
|
|
Add a model-based size predictor to the `DiscretionaryPolicy` and
dump its data along with observations. This can be used to assess
the performance of the model. Currently using a linear model that
can explain about 55% of the variability in code size impact for
mscorlib. Whether this holds up for other situations is still to
be determined.
Create a `ModelPolicy` that relies on this code size model to make
inlining decisions. This is very much a work in progress as the code
size models are still not well vetted and there is no real code
quality model yet.
|
|
Have the `DiscretionaryPolicy` do some simple opcode histogramming
to try and provide better observations for code size estimation.
Likewise, look at the types and sizes of arguments and report those.
|
|
Now that we have multiple policies, refactor their implementation
to extract common elements. In particular all the existing
policies have the same legality and ability constraints, so this
logic is extracted to a partial `LegalPolicy` superclass.
Adresses relevant feedback on #3697, where the number of policies
increased from one to two.
No changes in behavior.
|
|
This is an initial cut at collecting the time spent jitting a method
as part of observing the incremental impact of inlines.
It currently only measures the time after inlining has run, because there
are substantial interactions with the EE before this point and the time
spent in those calls is not always going to properly attribute to the
current method. We've had some good internal discussion on other
approaches to this and will likely revisit as time goes on. There was
already some timing support in place for SQM so I've generalized that
and shared it for this case too.
This also fixes an off-by-one issue with the method version which came
up when trying to back-correlate the inline data into the inline trees.
The data dumps now include root method data, and so we capture the IL
size and force inline state of the root method in the data dumps. This
allows us to investiage if observed inliner data is corrlated with the
current size of a method.
Data output is now also supported in RET builds. A custom build that
defines `INLINE_DATA` is required. Output is sent to stderr to work
around issues with crossgen.exe reopening stdout in a non-ascii mode.
|
|
Some initial work to gather data to drive the modelling of inline
code size impact. See #3775 for context.
Update the `DiscretionaryPolicy` to capture more observations.
Add a limit feature `JitInlineLimit` so the jit will stop inlining after
a given number of successful inlines.
Add a `DumpData` method to `InlinePolicy` which will display all the
observations that were made to evaluate an inline. Implement this for the
`DiscretionaryPolicy`. Add a matching `DumpSchema` that writes out
column headers for each of the observations.
Modify `InlineResult` to cache the last successful policy on the root
compiler instance. Use that along with the `JitInlineLimit` and
`DumpData` to display detailed information about the last sucessful
inline along with the code size for the method. All this is displayed
if `JitInlineDumpData ` is enabled.
This allows for isolating code size measurements as follows. Compile or
jit something with `JitInlineLimit = 0` and `JitInlineDumpData =1` (and
for now, `JitInlinePolicyDiscretionary = 1`, since other policies do not
implement any interesting data dumping).
Record the this root set of method sizes and IDs. Repeat with
`JitInlineLimit=1`. For each method where there was an inline, the size
impact of that inline can be computed by comparing this version versus
the version from the root set. Repeat with `JitInlineLimit=2`, comparing
versus the `=1` versions.
Currently the method token is used to identify the root method. This is
not sufficiently unique but unfortunately there currently aren't
substantially better alternatives. See #1957 for some discussion.
|
|
There are still some observations within `fgFindJumpTargets` that are
only made when inlining. This change enables them to also be made for
the prejit root case. This partially addresses the inconsistencies
noted in #3482.
Most notably, the simple stack model is now fully enabled for the
prejit root case, and the policy is able to more broadly observe
things like `CALLEE_LOOKS_LIKE_WRAPPER` and
`CALLEE_ARG_FEEDS_CONSTANT_TEST`. As before, the `LegacyPolicy`
ignores these observations for the prejit root. Forthcoming policies
will likely behave differently.
Two observations that were fatal -- `CALLEE_UNSUPPORTED_OPCODE` and
`CALLEE_STORES_TO_ARGUMENT` -- are now conditionally fatal, since they
do not cause failure for the prejit root case (though arguably they
should).
The `CALLEE_IS_MOSTLY_LOAD_STORE` observation has been removed, and
the conditions under which it fired have been encapsulated within the
`LegacyPolicy`.
No change in behavior is expected. I verified crossgen of mscorlib
still makes all the same inline decisions for the same reasons.
|
|
The `DiscretionaryPolicy` is similar to the `LegacyPolicy`, but does not
use size limits. So there is no "always" inline class and no "too big to
inline" class. Also, discretionary failures do not trigger noinline
stamping.
It is installed if `JitInlinePolicyDiscretionary` is set nonzero.
See #3775 for some background on where this new policy will be used.
This is a first cut and further refinement is likely.
Also removed the unused `NoteDouble` methods since the double-valued
observations now are kept internally within `LegacyPolicy` and it's
not very likely we'll introduce new cases where doubles are needed.
|
|
When constructing an InlineResult (and subsequent InlinePolicy)
always use the root compiler instance, so that all policy evaluations
done in a method are consistent.
Add an assert that the initial fact replay during `fgFindJumpTargets`
(force inline state and code size) doesn't change the candidacy when
inlining -- it should just be re-establishing what we knew in the
initial candidate scan.
Closes #3760.
|
|
This change creates an inline RandomPolicy which assigns a random
benefit to each discretionary inline candidate. Force inlines are
honored as they are with LegacyPolicy. The pool of discretionary
candidates is widened to include the "always inline" cases from the
LegacyPolicy. Discretionary inlines are accepted with a likelihood
that decreases with IL size, in a manner that mirrors the acceptance
probability of the LegacyPolicy.
The random inliner is intended to be used mainly as a stress testing
tool.
The RandomPolicy is enabled in non-retail builds when `JitStress` is
set to a nonzero value. By default the RandomPolicy will be used for
50% of the method being compiled, and the LegacyPolicy (in stress mode)
for the other 50%. Which policy is used for a given method depends on
the stress value. The stress value also forms part of the random seed,
so different random patterns can be invoked by varying the `JitStress`
value.
To enable the RandomPolicy for all methods, set `JitStress` to something
nonzero, then set `JitStressModeName=STRESS_RANDOM_INLINE`.
To do this without also enabling other stress modes, set `JitStressModeNameOnly=1'.
|
|
LegacyPolicy now encapsulates all the computations needed to evaluate
whether an inline is profitable or not.
This completes the main objective of the refactoring effort, which
was to preserve and encapsulate the current inliner's behavior.
|
|
Per the jit coding conventions, rename all class member functions to be
CamelCase. Rename all class data members to begin with `m_`.
|
|
Move the native call site estimation logic into `LegacyPolicy`. Rework
the remaining per-callsite bonus multipliers so that they fit into the
observation framework. Remove `hasNativeSizeEstimate` and instead
check that the inline is a candidate and its observation is a
discretionary inline. Fix header comment in `fgFindJumpTargets`.
|
|
The state machine used to estimate callee native code size is now an
internal part of the LegacyPolicy.
During `fgFindJumpTargets` the policy determines if the state machine
is need, and if so, sets one up. `fgFindJumpTargets` then notifies the
policy of the various opcodes in the method, and the policy uses this
to drive the state machine forward.
When the IL scan is completed, the `fgFindJumpTargets` notifies the
policy, then, for the discretionary inline case, grabs the native size
estimate, and uses that to evaluate inlinability.
The knowledge of when inlines are "always", "forced", or
"discretionary" is now encapsulated by the policy instead of being
distributed in various places in the code.
|
|
Rework the code to determine the key aspects of candidacy up front
rather than figuring it out eventually. In particular this ensures
that the two stages of evaulation for inlines both start at the same
point for candidates.
This will help streamline subsequent work to move the state machine
into the policy, since the state machine is only needed for the
case where the candidate is a discretionary inline, and the policy
now tracks this directly.
For the `LegacyPolicy`, if an inline is both smaller than the always
inline size and marked force inline, attribute its inlining to its
small size.
Remove bulk of the of the "evaluation in progress" candidate
observations since they don't add much value. As a result `noteCandidate`
is not needed as an external API, and can be repurposed internally as
`setCandidate`.
Change how the prejit root case is tracked to make it explicit
from the context rather than a callback to the compiler.
|
|
This change updates the inlining code to use observations in place of
the InlineHints and hint-like things (eg "HasSimd"). A number of new
observations were added in support of this.
The `compInlineeHints` member of the compiler object was removed as
the same information is now tracked by the inline policy. The policy
also now contains most of the weights and combining logic used to
compute the profitabiliy boost for the candidate.
There is one subtle and tricky aspect to the change. For the most part
the hints were ignored during the prejit-root analysis, but the mostly
load-store hint was propagated and influenced the inlines done when
crossgenning mscorlib. See #3482 for more on this particular quirk.
I've preserved this behavior by "passing" the prejit inline result
into `fgFindJumpTargets`. In actuality the result is passed by
temporarily setting the `compInlineResult` compiler member variable.
Then, in `fgFindJumpTargets`, the load-store observation is the only
one not guarded by `compIsForInlining`. In a subsequent change I'll redo
the guards at other observation sites and put the policy aspects of these
observations into the policy object.
This addresses another piece of the work outlined in #3371.
|
|
The LegacyPolicy now allows repeated observations leading to never
or failing inlines, provided the policy is used as part of the prejit
scan and the observations all have the same impact (that is, all nevers
or all failures). Only the first bad observation is remembered and it
is used as the reason for badness.
This addresses another of the items in issue #3371.
Also, clean up a few things:
Fix a few copy and paste errors in comment headers.
Fix broken "disallow copy-assignment" pattern for InlineResult
and uncomment and fix same for InlinePolicy.
Remove an accidentally duplicated comment block (original is still
there in inline.h).
|
|
Rework the logic for force inline, basic block count, il size,
and maxstack so that the policy decides when these values
should inhibit inlining.
|
|
Move inline policies to their own header and cpp file.
Add a method to the policy class to indicate if the policy wants newly
discovered `Never` inline cases to change the callee method attributes
to Noinline. This is an existing optimization that saves time when the
jit sees calls to this callee elsewhere as a possible inline candidates.
For example, in the trace below, at for the call at offset 31, the jit
determines that `ToInt32` is too large to be inlined and so marks it as
noinline. Then when it sees another call to `ToInt31` at offset 44 it
immediately fails the inline attempt.
```
Inlines into RegistryTimeZoneInformation:.ctor(ref):this
...
[IL=0031 TR=000040] [FAILED: too many il bytes] System.BitConverter:ToInt32(ref,int):int
[IL=0044 TR=000049] [FAILED: noinline per IL/cached result] System.BitConverter:ToInt32(ref,int):int
[IL=0057 TR=000058] [FAILED: noinline per IL/cached result] System.BitConverter:ToInt32(ref,int):int
```
Diagnostic and experimental policies may choose to disable this
optimization to make it easier to locally reason about failed inlines.
There were 5 calls to `setMethodAttribs` passing `CORINFO_FLG_BAD_INLINEE`.
This change consolidates 4 of them into the inlining code. The remaining
call is for a method with verification errors. I've left it as is.
|