Age | Commit message (Collapse) | Author | Files | Lines |
|
Fix inconsistent handling of zero extending casts
|
|
|
|
There was a bug creating profiling images using /Tuning
We computed the wrong value for addrOfBlockCount at the end of this method
This would result in method having proper block counts,
but still not getting marked as being executed.
This regressed in PR #14640
|
|
Added test case
|
|
And add a new range-check IR for x86 imm-intrinsics.
|
|
* create a new phase: StackLevelSetter
* add repro
* Fix grammar mistakes
* use the default hash
* delete values from the map.
* create gentree::OperIsPutArgStkOrSplit
* fix more comments
* delete an extra condition that is always true
* use GTSTRUCT_2_SPECIAL for PutArgStk
* extract fgUseThrowHelperBlocks
* optimize memory for amd64 and additional checks for x86
* change checks
The previous version was wrong, because morph can call fgAddCodeRef several times for the same instruction during different phases.
* fix comments
* fix genJumpToThrowHlpBlk
* small ref in genJumpToThrowHlpBlk
* fix rebase problems.
* use fgUseThrowHelperBlocks instead of !opts.compDbgCode
* add throwHelperBlocksUsed for throughput.
|
|
For casts that are supposed to zero extend the GTF_UNSIGNED must always be set (and obeyed).
Some code failed to set the flag (e.g. when importing add.ovf.un instructions having native int and int32 operands) and some other code failed to check the flag (e.g. x64 codegen, gtFoldExprConst) and instead decided to zero extend based on the cast destination type.
This resulted in discrepancies between ARM64 and x64 codegen and between constant folding performed by gtFoldExprConst and VN's EvalCastForConstantArgs.
|
|
Bug in the CSE phase when a CSE use contains a CSE def
For the error cases we could unmark or discard a CSE def. This CSE def might be needed when
performing a subsequent CSE replacement. This could lead to an uninitialized value being
used for the CSE use.
Changed optUnmarkCSE method to return a bool instead of a void.
Changed optValuenumCSE_UnmarkCSEs to take a write-back mutable wbKeepList argument
Removed an unreachable code block in fgRemoveStmt and replaced it with a noway_assert
Added method headers for optUnmarkCSE and optUnmarkCSEs
Removed the code that allowed unmarking of a CSE def, instead we return false
Retyped the second arguments to optUnmarksCSEs and optValueNumCSE_UnmarkCSEs
to support appending new nodes to keep using wbKeepList
Code modifications to PerformCSE() to support collecting both persistent side effects and
any nested CSE defs found in CSE uses. Move the location of the call to
optValueNumCSE_UnmarkCSEs so that it comes before the decision to create a GT_COMMA node
for the cse use.
|
|
When using profile weights, fgComputeEdgeWeights expects the first non-internal block to have profile weight.
Make sure we don't break that invariant when removing an empty block.
This fixes VSO 546031.
|
|
* jit sources: Each local pointer variable must be declared on its own line.
Implement https://github.com/dotnet/coreclr/blob/master/Documentation/coding-guidelines/clr-jit-coding-conventions.md#101-pointer-declarations
Each local pointer variable must be declared on its own line.
* add constGenTreePtr
* delete GenTreePtr
* delete constGenTreePtr
* fix arm
|
|
|
|
StoreHigh, StoreLow, and StoreScalar intrinsics
|
|
* delete fgArgInfoPtr
* delete treeLstPtr
* delete treeStmtLstPtr
* delete fgArgTabEntryPtr
* delete BasicBlockPtr
|
|
If the jit sees that an inlinee has multiple return sites or has gc ref locals
it will choose to return the inline result via a temp. The jit was not assigning
a type to that temp and so losing track of some type information.
So, for inlinees returning ref types, initially type the return spill temp with
the declared return type of the method.
When importing we may discover that particular return sites will return more
specific types. If all discovered return sites agree, we can update the return
type for the spill temp to match the consensus improved type.
This can lead to removal of some type checks and also to devirtualization.
Addresses issues discussed in #9908 and #15743.
|
|
In DEBUG/CHECK builds the jit tries to keep track of failed inlines.
Because inlines can be rejected "early" (when the parent method
is being imported) as well as "late" (when their call site is encountered
by the inliner) there is a tracking mechanism to convey the early observations
that cause failures to be resurrected later on.
These observations sometimes didn't end up in the inline context, leading
to assertions when dumping methods.
Fix is to add a new way to propagate the earlier observation to the context
that bypasses some of the policy sanity checks and simply record the reason
that the inline failed.
|
|
* add ifdef for fgThrowHlpBlkStkLevel
* fgFindExcptnTarget should not be called fof dbg code compilation mode.
|
|
several returns. (#14945)
* add fgNeedReturnSpillTemp
* fix the issue
* add the repro without stress mode
|
|
If an inline candidate has args that come from calls that are inline
candidates, the arg trees for the candidate may be GT_RET_EXPRs. Since
the jit inlines in preorder, any arg inlines will have been resolved
by the time we get to the candidate. So, the jit can look back through
the GT_RET_EXPR to get a better handle on the actual arg tree.
Doing this has several small benefits:
* It short-circuits parts of the subsequent GT_RET_EXPR update and so
saves a little bit of throughput.
* It potentially allows for more streamlined arg passing since the actual
node side effects may be less constraining than the placeholder effects on
the GT_RET_EXPR (which must be "worst case").
* It may unblock inlines as actual arg values can be seen when evaluating
the candidate for profitability.
During testing I found cases where looking at the actual arg seemed to
degrade information. This turned out to be a misuse of `argHasLdargaOp`
to indicate that a caller-supplied argument was potentially locally
aliased.
This misuse was blocking type propagation when the actual arg tree was seen to
be an aliasable local reference instead of the formerly opaque GT_RET_EXPR.
To fix this, I split out the aliased arg case with a different flag. All
that the jit needs to ensure in the aliased case is that it does not directly
substitute the arg; the arg must be evaluated at the point of the call. The
type of the actual can safely propagate to the arg as long as the arg itself
is not modifiable.
Added a utility for walking back through GT_RET_EXPRs and updated the
other case where the jit was doing this to use the utility as well.
Closes #14174.
|
|
When prejitting, the jit assesses whether each root method is a potential
inline candidate for any possible caller. Methods deemed un-inlinable in any
caller are marked in the runtime as "noinline" to save the jit some work
later on when it sees calls to these methods.
This assessment was too conservative and led to prejit-ordering dependences
for inlines. It also meant that prejitting was missing some inlines that
would have happened had we not done the prejit root assessment.
This change removes some of the prejit observation blockers. These mostly
will enable more prejit methods to become candidates. We also now track when
a method argument reaches a test.
When we are assessing profitability for a prejit root, assume the call site
best possible case.
Also, update the inline XML to capture the prejit assessments.
This increases the number of inlines considered and performed when prejitting
and will also lead to slightly more inlining when jitting. However we do not
expect a large througput impact when jitting -- the main impact of this change
is that inlining when prejitting is now just as aggressive as inlining when
jitting, and the decisions no longer depend on the order in which methods are
prejitted.
Closes #14441.
Closes #3482.
|
|
* Delete OldStyleClearD from flowgraph
BlockSetOps::Assign requires sets to have the same size, there is no an
additional risk to use ClearD then.
* delete OldStyleClearD from regalloc
regalloc doesn't create new local variables and doesn't change epoch.
* Delete OldStyleClearD from copyprop
CopyProp doesn't create new vars.
Also `VarSetOps::Assign(this, compCurLife, block->bbLiveIn); ` before
the loop requires epoch to be the same.
* Delete OldStyleClearD from assertionpop
Assertion prop doesn't change epoch.
* Delete OldStyleClearD from the emmit
Because it doesn't create new local vars.
* Delete declarations
|
|
Improved the dump when fgOptimizeBranch clones statements
|
|
Streamline JIT memory allocation
|
|
|
|
Merge the LegacyPolicy and EnhancedLegacyPolicy into a unified policy that
behaves like the EnhancedLegacyPolicy. Rename this policy to the DefaultPolicy
since it is in fact the default inline policy.
We had been keeping the LegacyPolicy around in case we ever needed to revert
back to the initial RyuJit inline behavior, but that safeguard no longer seems
necessary.
Remove some of the checks in flowgraph.cpp that alter behavior based on policy
as they are no longer needed.
Remove the jit config setting that allowed selection of the LegacyPolicy.
This is the first stage in fixing #14441.
|
|
Optimize fixed sized locallocs of 32 bytes or less to use local buffers,
if the localloc is not in a loop.
Also "optimize" the degenerate 0 byte case.
Allow inline candidates containing localloc, but fail inlining if any
of a candidate's locallocs do not convert to local buffers.
The 32 byte size threshold was arrived at empirically; larger values did
not enable many more cases and started seeinge size bloat because of
larger stack offsets.
We can revise this threshold if we are willing to reorder locals and see
fixed sized cases larger than 32 bytes.
Closes #8542.
Also add missing handler for the callsite is in try region, this was
an oversight.
|
|
skip commas and nops.
|
|
Enable Crc32 , Popcnt, Lzcnt intrinsics
|
|
|
|
|
|
|
|
Renamed GTF_ICON_CIDMID_HDL to GTF_ICON_MID_HDL
Added a standard function header comment for gtNewIndOfIconHandleNode
Made compiletimeHandle a required argument instead of an option argument for gtNewIconEmbHndNode
Added new isInvariant arg to gtNewIndOfIconHandleNode, it will set GTF_IND_INVARIANT when true
Converted IBC Block Counts to use new gtNewIndOfIconHandleNode method
Simplified gtNewStringLiteralNode
|
|
|
|
If we merge constant returns into a common point we may lose track of sequence
points. So inhibit this when we are generating debuggable code.
Fixes #14339.
|
|
* Ifdef out legacy uses of GT_ASG_op
GT_ASG_op nodes are only generated when the legacy backend is used.
* Address feedback
* Cleanup gtOverflow/gtOverflowEx
|
|
|
|
|
|
|
|
Cleanup of Lowering & LsraInfo
|
|
This tends to generate one virtual call for every allocation...
|
|
* compGetMemA and compGetMemArrayA are useless because ArenaAllocator always returns aligned memory
* compGetMemCallback is not used anywhere
* compFreeMem is not used anywhere. The function is not used anywhere either but let's keep it for consistency.
|
|
These are preparatory changes for eliminating gtLsraInfo.
Register requirements should never be set on contained nodes. This includes setting isDelayFree and restricting to byte registers for x86.
- This results in net positive diffs for the framework (eliminating incorrect setting of hasDelayFreeSrc), though a net regression for the tests on x86 (including many instances of effectively the same code).
- The regressions are largely related to issue #11274.
Improve consistency of IsValue():
- Any node that can be contained should produce a value, and have a type (e.g. GT_FIELD_LIST).
- Some value nodes (GTK_NOVALUE isn't set) are allowed to have TYP_VOID, in which case IsValue() should return false.
- This simplifies IsValue().
- Any node that can be assigned a register should return true for IsValue() (e.g. GT_LOCKADD).
- PUTARG_STK doesn't produce a value; get type from its operand.
- This requires some fixing up of SIMD12 operands.
- Unused GT_LONG nodes shouldn't define any registers
Eliminate isNoRegCompare, by setting type of JTRUE operand to TYP_VOID
- Set GTF_SET_FLAGS on the operand to ensure it is not eliminated as dead code.
|
|
GT_CATCH_ARG should not be marled as MayThrow.
|
|
[Arm64] Add GT_JCMP node
|
|
Mark `EqualityComparer<T>.Default`'s getter as `[Intrinsic]` so
the jit knows there is something special about it. Extend the jit's
named intrinsic recognizer to recognize this method.
Add a new jit interface method to determine the exact type returned
by `EqualityComparer<T>.Default`, given `T`. Compute the return type by
mirroring the logic used in the actual implementation. Bail out when
`T` is not final as those cases won't simplify down much and lead to
code bloat.
Invoke this interface method when trying to devirtualize calls where
the 'this' object in the call comes from `EqualityComparer<T>.Default`.
The devirtualized methods can then be inlined. Since the specific comparer
`Equal` and `GetHashCode` methods look more complicated in IL than they
really are, mark them with AggressiveInlining attributes.
If devirtualization and inlining happen, it is quite likely that the value
of the comparer object itself is not used in the body of the comparer. This
value comes from a static field cache on the comparer helper.
When the comparer value is ignored, the jit removes the field access since it
is non-faulting. It also removes the the class init helper that is there to
ensure that the (no-longer accessed) field is properly initialized. This helper
has relatively high overhead even in the fast case where the class has been
initialized aready.
Add a perf test.
Closes #6688.
|
|
Create new node type GT_JCMP to represent a
fused Relop + JTrue which does not set flags
Add lowering code to create GT_JCMP when
Arm64 could use cbz, cbnz, tbz, or tbnz
|
|
|
|
|
|
|
|
|
|
Improve const return block placement
|