Age | Commit message (Collapse) | Author | Files | Lines |
|
* oportunistically -> opportunistically
* oppportunity -> opportunity
* Oppporunity -> Opportunity
* optinal -> optional
* optimisitic -> optimistic
* optionaly -> optionally
* origianl -> original
* orignally -> originally
* otheriwse -> otherwise
* otherrwise -> otherwise
|
|
|
|
* acquringing -> acquiring
* Activ -> Active
* activley -> actively
* acutal -> actual
* bIncomingIPAdddefed -> bIncomingIPAddRefed
* adddr -> addr
* readding -> reading
* Addfunction -> AddFunction
* additionnal -> additional
* Additonal -> Additional
* Additonally -> Additionally
* Addresss -> Address
* addtion -> addition
* aded -> added
* aditional -> additional
* adjustements -> adjustments
* Adress -> Address
* afer -> after
* aformentioned -> aforementioned
* afte -> after
* agains -> against
* agaisnt -> against
* aggresively -> aggressively
* aggreates -> aggregates
* aggregious -> egregious
* aginst -> against
* agregates -> aggregates
* Agressive -> Aggressive
* ahve -> have
* ajdust -> adjust
* ajust -> adjust
* alement -> element
* algoritm -> algorithm
* alighnment -> alignment
* alignmant -> alignment
* constraits -> constraints
* Allcator -> Allocator
* alllocate -> allocate
* alloacted -> allocated
* allocatate -> allocate
* allocatoror -> allocator
* alloctaed -> allocated
* alloction -> allocation
* alloted -> allotted
* allt he -> all the
* alltogether -> altogether
* alocate -> allocate
* alocated -> allocated
* Alocates -> Allocates
* alogrithm -> algorithm
* aloocate -> allocate
* alot -> a lot
* alwasy -> always
* alwyas -> always
* alwys -> always
|
|
Allow CALLEE_IS_FORCE_INLINE precedent over CALLEE_DOES_NOT_RETURN
|
|
In DEBUG/CHECK builds the jit tries to keep track of failed inlines.
Because inlines can be rejected "early" (when the parent method
is being imported) as well as "late" (when their call site is encountered
by the inliner) there is a tracking mechanism to convey the early observations
that cause failures to be resurrected later on.
These observations sometimes didn't end up in the inline context, leading
to assertions when dumping methods.
Fix is to add a new way to propagate the earlier observation to the context
that bypasses some of the policy sanity checks and simply record the reason
that the inline failed.
|
|
|
|
When prejitting, the jit assesses whether each root method is a potential
inline candidate for any possible caller. Methods deemed un-inlinable in any
caller are marked in the runtime as "noinline" to save the jit some work
later on when it sees calls to these methods.
This assessment was too conservative and led to prejit-ordering dependences
for inlines. It also meant that prejitting was missing some inlines that
would have happened had we not done the prejit root assessment.
This change removes some of the prejit observation blockers. These mostly
will enable more prejit methods to become candidates. We also now track when
a method argument reaches a test.
When we are assessing profitability for a prejit root, assume the call site
best possible case.
Also, update the inline XML to capture the prejit assessments.
This increases the number of inlines considered and performed when prejitting
and will also lead to slightly more inlining when jitting. However we do not
expect a large througput impact when jitting -- the main impact of this change
is that inlining when prejitting is now just as aggressive as inlining when
jitting, and the decisions no longer depend on the order in which methods are
prejitted.
Closes #14441.
Closes #3482.
|
|
Merge the LegacyPolicy and EnhancedLegacyPolicy into a unified policy that
behaves like the EnhancedLegacyPolicy. Rename this policy to the DefaultPolicy
since it is in fact the default inline policy.
We had been keeping the LegacyPolicy around in case we ever needed to revert
back to the initial RyuJit inline behavior, but that safeguard no longer seems
necessary.
Remove some of the checks in flowgraph.cpp that alter behavior based on policy
as they are no longer needed.
Remove the jit config setting that allowed selection of the LegacyPolicy.
This is the first stage in fixing #14441.
|
|
Optimize fixed sized locallocs of 32 bytes or less to use local buffers,
if the localloc is not in a loop.
Also "optimize" the degenerate 0 byte case.
Allow inline candidates containing localloc, but fail inlining if any
of a candidate's locallocs do not convert to local buffers.
The 32 byte size threshold was arrived at empirically; larger values did
not enable many more cases and started seeinge size bloat because of
larger stack offsets.
We can revise this threshold if we are willing to reorder locals and see
fixed sized cases larger than 32 bytes.
Closes #8542.
Also add missing handler for the callsite is in try region, this was
an oversight.
|
|
* Remove sscanf
* Remove sprintf
|
|
An inline pinvoke is a pinvoke where the managed/native transition
overhead is reduced by inlining parts of the transition bookkeeping
around the call site. A normal pinvoke does this bookkeeping in
a stub method that interposes between the managed caller and the
native callee.
Previously the jit would not allow pinvoke calls that came from inlines
to be optimized via inline pinvoke. This sometimes caused performance
surprises for users who wrap DLL imports with managed methods. See for
instance #2373.
This change lifts this limitation. Pinvokes from inlined method bodies
are now given the same treatment as pinvokes in the root method. The
legality check for inline pinvokes has been streamlined slightly to
remove a redundant check. Inline pinvokes introduced by inlining are
handled by accumulating the unmanaged method count with the value from
inlinees, and deferring insertion of the special basic blocks until after
inlining, so that if the only inline pinvokes come from inline instances
they are still properly processed.
Inline pinvokes are still disallowed in try and handler regions
(catches, filters, and finallies).
X87 liveness tracking was updated to handle the implicit inline frame
var references. This was a pre-existing issue that now can show up more
frequently. Added a test case that fails with the stock legacy jit
(and also with the new enhancements to pinvoke). Now both the original
failing case and this case pass.
Inline pinvokes are also now suppressed in rarely executed blocks,
for instance blocks leading up to throws or similar.
The inliner is now also changed to preferentially report inline
reasons as forced instead of always when both are applicable.
This change adds a new test case that shows the variety of
situations that can occur with pinvoke, inlining, and EH.
|
|
Fix issues with RandomPolicy setup in release builds with -DINLINE_DATA.
Reparent RandomPolicy on top of DiscretionaryPolicy to enable inline
data dumps from random inline runs. Remove some now-unneeded overrides.
Add a full dump mode to JitInlineDumpData that reports the inliner-visible
data for all inlines, as opposed to inliner-visible and post-inline
data for just the most recent inline. Small mods to the dumper code to
adjust comma placement for this new mode.
Have the RandomPolicy make the full set of profitability observations
for each accepted inline so they can be dumped.
|
|
Cache the random state on the InlineStrategy instead of on the compiler
instance so that the state is reinitialized and private to each jit request.
That way the random policy evaluations made compiling one method won't
alter the evaluations for subsequent methods. This should make it somewhat
easier to do A/B comparisons under jit stress when changing the jit.
Make it possible to enable the RandomPolicy outside of stress. This may
prove useful in various randomized inline performance studies or as
a simple stress mode on its own.
Random state seed is built from an external seed value (via JitStress or
JitInlinePolicyRandom) and an internal seed value (method hash), so that
random sequences potentially differ for each method but are deterministic
across runs and changes to the jit.
|
|
The inliner now can inline a subset of methods with pinned locals --
namely those where the language compiler (eg CSC) can determine that a
try/finally is not necessary to ensure unpinning, and where the jit
likewise an determines that a try/finally is likewise not needed in
the root method.
Generally speaking this allows inlining in cases where the inline method
is fairly simple and does not any contain exception handling, and the
call site is not within a try region.
When inlining methods that have pinned locals and also return a value,
the jit ensures that the return value is spilled to a temp so that the
unpins can be placed just after the inlined method body and won't alter
the return value.
Mark FixedAddressValueType as GCStressIncompatible since the "unfixed"
class static may get re-allocated at the same address. This seems to
happen even without these changes but happens much more frequently with
them.
Closes #7774.
|
|
Rebase the DiscretionaryPolicy on the EnhancedLegacyPolicy, and
capture and report some of the new observations that have been added
recently.
This change adds a new HAS_GC_STRUCT observation in addition to the
more specific RARE_GC_STRUCT, so that the DiscretionaryPolicy can
be notified of GC struct locals and temps for all candidates, not
just ones invoked from rare call sites.
No changes in codegen.
|
|
Closes #7792.
Since this observation isn't always fatal, change its impact to INFORMATION.
Also, refresh the assertion checking around fatal observations so that the
EnhancedLegacyPolicy (which is the current default) will catch this kind of
error going forward.
|
|
If an inline introduces a local or temp GC struct, the initialization
of that struct is done in the root method prolog. This can hurt
performance if the inline call site is cold. Detect this during
importation and update the inline policy to back out of the inline if
the inline is an always or discretionary candidate.
Jit diffs shows size wins in the core library with no regressions.
```
Total bytes of diff: -8789 (-0.29 % of base)
diff is an improvement.
Total byte diff includes 0 bytes from reconciling methods
Base had 0 unique methods, 0 unique bytes
Diff had 0 unique methods, 0 unique bytes
Top file improvements by size (bytes):
-8789 : System.Private.CoreLib.dasm (-0.29 % of base)
1 total files with size differences.
Top method improvements by size (bytes):
-320 : System.Private.CoreLib.dasm - FrameSecurityDescriptor:CheckDemand2(ref,ref,long,bool):bool:this
-224 : System.Private.CoreLib.dasm - RuntimeConstructorInfo:CheckCanCreateInstance(ref,bool)
-214 : System.Private.CoreLib.dasm - RuntimeType:GetMethodBase(ref,long):ref
-150 : System.Private.CoreLib.dasm - CultureInfo:GetCultureInfoByIetfLanguageTag(ref):ref
-146 : System.Private.CoreLib.dasm - GC:RegisterForFullGCNotification(int,int)
103 total methods with size differences.
```
Desktop testing shows similar wins in a number of places and only
a few sparse small regressions.
Added a perf test. Thanks to @hypersw for noticing this.
Closes #7569.
|
|
This change is the result of running clang-tidy and clang-format on jit
sources.
|
|
Now that we can trust `optMethodFlags`, note if the root caller has
newarr or newobj. We can't yet tell if these operations might feed
a particular call's arguments but their presence in the root method
alone might be enough to build correlations with performance changes.
Count number of returns in the callee. Note if callee has same this
as the (root) caller. Note when callee is a class constructor.
Add code to dump out the new observations. No policies act on these yet.
|
|
Do not inline methods that never return
|
|
The inliner's restriction that methods that make use of the `starg` opcode
cannot be inlined is somewhat artificial. The comments indicate that the
reason for this excluion is because the inliner forward-substitutes
argument expressions in place of parameter references, but the codepaths
required to force evaluating an argument expression at the callsite and
reference that result in place of parameter references is already in place
to handle `ldarga` and complex argument expressions.
This change
- adds an `argHasStargOp` flag to the `inlArgInfo` type, similar
to the existing `argHasLdargaOp` flag
- removes the `CALLEE_STORES_TO_ARGUMENT` inline observation and instead
sets the new flag for `starg` cases
- updates `impInlineFetchArg` to force evaluation to a temp for arguments
that are marked `argHasStargOp`
- updates `starg` processing in the importer to store to the
corresponding temp when importing for an inlinee
Resolves #6014.
|
|
Methods that do not contain return blocks can't ever exit normally, they either throw or loop indefinitely. Inlining is not beneficial in such cases as it increases the code size without providing any speed benefits. In the particular case of throws the inlined code can easily be 10 times larger than the call site.
The call to fgMoreThanOneReturnBlock has been replaced with code that does the same thing but also detects the existence of at least one return block. This avoids walking the basic block list twice.
Note that BBJ_RETURN blocks are also generated for CEE_JMP. Methods exiting via CEE_JMP instead of CEE_RET will continue to be inlined (assuming they were inlined before this change).
|
|
This change starts the process of updating the jit code to make it ready
for being formatted by clang-format. Changes mostly include reflowing
comments that go past our column limit and moving comments around ifdefs
so clang-format does not modify the indentation. Additionally, some
header files are manually reformatted for pointer alignment and marked
as clang-format off so that we do not lose the current formatting.
|
|
Two related changes to better capture cases where args or constant args
are seen at inline call sites.
On the observation side, the inliner's stack modelling of the callee is
crude and will often overestimate the evaluation stack depth. So when
looking at an opcode that takes just one stack operand (eg BRFALSE) the
inliner's check should be whether the stack has at least one element
instead of checking whether the stack has exactly one element. For
compatibility reasons, the check for exactly one element is still used
when the inline is evaluated via the LegacyPolicy.
On the policy side, instead of keeping a bool flag to note that there
were interesting arg cases, count the number of observations.
LegacyPolicy continues to act as before, checking if count is zero or
nonzero instead of whether the flag was false or true. The count is
available for use in other heuristics and is reported in inline data.
|
|
When using ReplayPolicy and dumping data, capture the profitability
estimate made by the ModelPolicy, so we can more easily cross-validate
it against externally measured profitability.
|
|
The inliner currently will record detailed data about the last successful
inline performed (given a build with DEBUG or INLINE_DATA defined).
However, for purposes of inline profitability analysis we might be more
interested in the data from an earlier inline.
This change creates a mechanism where the replay log can flag one inline
per method as the target of data collection. The inliner checks for this
attribute during replay and captures that inline's data.
|
|
Give the ModelPolicy the standard depth bailout logic. This mainly
serves to prevent runaway inlining for recursive methods marked with
AggressiveInlining attributes.
|
|
Updates to bring CS and TP impact of the ModelPolicy into more acceptable
ranges.
For CS, reduce the call site weights to values that are more in keeping
with the legacy policy weights. Local test runs show this does not
drastically alter CQ and brings CS down below LegacyPolicy levels, on
average.
Implement an early out rejection based solely on ILSize. The threshold
value is set by conservatively determining when ILSize alone indicates
the method in question will never be inlined (note the policy itself
does not have an explicit ILSize cutoff). See comments for
`ModelPolicy::NoteInt` for details. Note if we adjust the model's size
and profitability estimates, this threshould will also need updating.
CQ (as measured by the CoreCLR perf tests) continues to show about a
2.5% geomean improvement over LegacyPolicy.
|
|
Make the ModelPolicy available in release builds of the jit. Enable this
policy by setting:
COMPLUS_JitInlinePolicyModel=1
This works for either jitting or prejitting.
Update the ModelPolicy so callee never inline decisions are always
propagated back to the runtime.
|
|
Add block weight as observation. At first blush I suspect this will be
useful only when the jit has IBC data, and even then only in the root
method blocks.
We'll likely need to either build up better profile estimates for the
inliner or else just try and cope without it somehow.
Dump out block weight and the per-call profitability in the observation
schema and data. Among other things this lets us cross-validate how well
the ModelPolicy predictions match the abstract GLMNET models...
|
|
First (and very preliminary) cut at a profitability model, based
on inline data modelling.
ModelPolicy heuristic is updated to use this information to decide
when to inline. The policy has two parameters: one for the local
call site weight and another for the overall size/speed tradeoff
that we find acceptable. The first parameter is temporary as we
should be able to model this weight via observations.
Size/speed tradeoff is a policy decision and is a tuning parameter.
I've set it for now to roughly match the LegacyPolicy behavior, in
aggregate, across the jit perf benchmarks.
|
|
ReplayPolicy only impacts discretionary inlines, so could not control
force inline candidates. This has lead to some confusion during inline
studies.
Since most cases of force inline are performance related, allow replay
to override and control these force inline candidates.
This might be useful down the road to study how likely it is that a force
inline directive is either unnecessary or perhaps even detremental to
performance.
|
|
Replay needs the ability to distinguish among multiple calls to the same
callee. The IL offset of the call site can be used for this, so start
checking for it during replay.
Note there is subsequent follow-up work to ensure that we actually track
offsets for calls.
Refactor the way the ReplayPolicy gets notified of the current inline
context (and the IL offset of the call). Instead of trying to pass
this information as arguments to the factory method that creates
policies, add special notifications for this information that only the
ReplayPolicy will act upon.
|
|
Refactor the inline data dumps (the dumps that capture the observations
made by the inliner when evaluating an inline) so that the data can be
added to the inline Xml or dumped as standalone data.
I haven't tried to Xml-ify the actual data. Just dumping it as a large
comma delimited string is ok for now.
|
|
Move CritSecObject into util.h, and use it to lock around reading
and writing inline Xml. Introduce CritSecHolder for RAII management
of the locks.
Add a simple file position cache for methods to speed up replay
when the inline xml file is large.
|
|
Inliner: small changes to inline xml
|
|
Emit the hash for callees, and check it during replay.
Capture the root method name, and do a simple Xml-safe conversion.
Add a missing newline to the banner message announcing that the replay
policy is in effect.
|
|
Prevents RandomPolicy from being overly aggressive with small methods
that can lead to excessive inlining.
Closes #4711. Undoes #4704.
|
|
The ReplayPolicy reads an external script to determine which inlines
to perform. The script is the same Xml syntax that's produced by
the inliner when JitInlineDumpXml is enabled. This format can be edited
by hand or tool to force particular inlining patterns to occur. Methods
or calls sites not mentioned in the script are considered as noinline.
There's a bunch of work still left to make this fully robust, but in
testing it works well enough for my immediate use case that I'll hold
off on further polish until it's needed. But, for future reference,
here's a laundry list:
* Need better ways to identify methods. Token and hash are not enough.
* Need better ways to identify call sites. Callee token is not enough.
* Consider preparsing or mapping the script into memory.
* Consider caching node positions in the InlineContexts.
* Make it robust for multithreading.
* Handle the prejit root case somehow.
* Possibly allow overriding of inline attributes.
|
|
Inliner: add JitInlineLimit check to LegacyPolicy under debug
|
|
Adds the ability to limit the number of inlines done by the LegacyPolicy.
Useful in trying to binary search for inlines that cause correctness
issues.
This limit impacts inlining in all methods, so effective isolation may
also require use of JitNoInlineRange to hone in on the root method that
is the source of problems.
|
|
Lost some implict casts to float when I updated SIZE_SCALE to an enum
constant, so now make the casts explicit.
Just impacts dump output.
Also, add a newline to LegacyPolicy dump to keep messages distinct.
|
|
Add a new inline policy that tries to inline as much as possible
without increasing method size.
Fix one test that had an expected inline that doesn't happen under
the SizePolicy, by marking a method with AggressiveInline.
|
|
The FullPolicy will be used to grow maximal potential inline trees
for methods. It inlines all legal candidates subject to depth and
size limits. Add a config flag to select this as the policy to use.
Also add a config flag so the inline depth limit becomes adjustable,
and rework the code so that checking this limit is now a matter of
policy. Add in an implementation max depth limit of 1000.
Revise names of some statics to make their intent clearer, and add some
missing header comments.
|
|
I zeroed out the jit time measurement when refactoring the data dumping
code to live in the InlineStrategy class, because it required adding a
helper to the Compiler class. Didn't mean to leave it this way, but did.
So, fixing it how I'd meant to all along.
Also, move a few of the inlining related constants and setup computation
over to InlineStrategy. I'd move `DEFAULT_MAX_INLINE_SIZE` too, but it
ends up getting used for some non-inlining purposes, so will hold off on
cleaning that part up.
|
|
Accumulate the per-inline size impact estimates into an overall inline
size impact estimate. No size budgeting (yet); this just provides data
for further study.
|
|
Use the time budget and time estimates to stop inlining once the
overall jit time increase estimate for the method is 10x the initial
jit time estimate.
This is implemented as part of `LegacyPolicy` and so can impact
the current inline behavior.
The budget is intentionally set quite high so that it only kicks in
for very rare cases where the call tree below the root is deep and wide
with many small methods. In extended testing on desktop this limit
fires in exactly two cases, both HFA tests.
In CoreCLR tests 12 of the HFA tests hit this limit. I've added
a directed test case here that came from the original bug report.
Closes #2472.
|
|
Create the `InlineStrategy` class to hold inline-relevant data that
spans multiple inlines. It tracks the number of inline candidates,
attempts, successes, last successful policy, and holds onto the root
context, and so on. The strategy is responsible for creating contexts
and reporting overall results for a method (either tree-based or
data-based).
The strategy also includes a simple jit time estimate. From various
observations, post-inline jit time can be modelled acceptably by simple
linear relationships on the input IL size. The strategy uses these models
to estimate the initial and current jit time. Estimate units are roughly
microseconds.
The strategy creates a time budget to flag cases where the estimated
jit time increase due to inlining is unreasonbly large. The budget
is initially based on the root method size, and may increase if there
are non-discretionary force inlines.
Once we have a bit more vetting of the budgeting mechanism, policies
will use it to limit inlining in a small number of runaway inlining cases
(see for example #2472).
Remove code under MEASURE_INLINING since the strategy plus the context
tree (optionally extended via policies) contains all that data and more.
Likewise, consolidate a number of the compiler's inlining-related member
variables into the strategy.
|
|
Add a model-based size predictor to the `DiscretionaryPolicy` and
dump its data along with observations. This can be used to assess
the performance of the model. Currently using a linear model that
can explain about 55% of the variability in code size impact for
mscorlib. Whether this holds up for other situations is still to
be determined.
Create a `ModelPolicy` that relies on this code size model to make
inlining decisions. This is very much a work in progress as the code
size models are still not well vetted and there is no real code
quality model yet.
|
|
Have the `DiscretionaryPolicy` do some simple opcode histogramming
to try and provide better observations for code size estimation.
Likewise, look at the types and sizes of arguments and report those.
|