Age | Commit message (Collapse) | Author | Files | Lines |
|
Implement long to float cast for x86
|
|
Convert long to float/double casts to helper calls. Despite the call this version is 2x faster than JIT32's FILD implementation which suffers a significant store forwarding stall penalty.
|
|
Fix some timing code bit rot, plus minor cleanup.
|
|
Model the kill set for ASSIGN_BYREF and stop generating movsq on x86.
|
|
Fix #7092
|
|
Fix #7093
|
|
Fix #7100
|
|
The problem is genCodeForInitBlkUnroll() is trying to generate:
```
mov byte ptr [edi], si
```
which is not legal: ebp/esi/edi/esp don't have byte register versions on x86.
This change requires the source register to be an x86 byteable register in
this case.
Note that legacy JIT generates stosb for this test.
|
|
This instruction is only available when targeting amd64.
|
|
This helper kills esi, edi, and ecx.
|
|
Fix to #7087 - x86 RyuJIT JitstressRegs=2 assert failure
|
|
This is an assert due to how STRESS_64RSLT_MUL is implemented. This
stress mode converts:
```
/--* lclVar int V01 loc0
* * int
\--* lclVar int V01 loc0
```
to:
```
* cast int <- long
| /--* cast long <- int
| | \--* lclVar int V01 loc0
\--* * long
\--* cast long <- long
\--* nop long
\--* lclVar int V01 loc0
```
Thus, the long 'nop' node is above an 'int' operand node. This led to an assert
in genCodeForTreeLng() that the lclVar was type long, which is isn't.
I added yet another cast under the 'nop' to fix this typing problem:
```
* cast int <- long
| /--* cast long <- int
| | \--* lclVar int V01 loc0
\--* * long
\--* cast long <- long
\--* nop long
\--* cast long <- int
\--* lclVar int V01 loc0
```
|
|
|
|
This issue is the following assert:
```
Assert failure '!emitComp->opts.compReloc || memBase->IsIconHandle()'
```
while ngen'ing mscorlib on desktop.
We're trying to generate
```
cmp ebx, dword ptr [0000H]
```
(due to inlining). But, emitInsBinary() doesn't expect a zero address that
is not a relocatable handle.
Simply change the assert to allow this. The code then generates the correct
zero base address.
|
|
|
|
Make GT_LIST processing non-recursive to avoid StackOverflow.
|
|
|
|
|
|
Fix #7089.
|
|
Improve decomp throughput.
|
|
The RyuJIT backend does not use regpair types, so this method should
simply return false when that backend is in use. In the future we
should consider removing this method (and other regpair-related code).
Fixes #7089.
|
|
Implement integer to long/ulong casts for x86
|
|
Don't find uses unless necessary. This saves about 50 milliseconds while
crossgen'ing mscorlib.
|
|
Fix BBF_HAS_NULLCHECK test
|
|
Tracked by #7038
|
|
|
|
Fix IMGREL32 static field addr value-num blindspot
|
|
disable gtFoldExprConst for conversion from negative double to unsigned long
|
|
|
|
These intrinsics were supported by JIT32 using the corresponding
x87 instructions. There are a few downsides to doing the same in
RyuJIT, however, because it uses SSE for all other floating-point
math:
- The mix of precisions used by the combination of SSE + x87
instructions matches neither the JIT32 nor the RyuJIT/x64
behavior.
- Using the x87 instructions is more expensive, as the result of
the instruction may need to be moved via memory to an SSE
register.
- The RA would need to be udpated in order to better understand
that each x87 instruction may require a spill temp in order to
retrieve the result.
Issue #7097 tracks investigating a custom calling convention for
the appropriate helper calls if necessasry.
|
|
We had some internal cases where crossgen failed with StackOverflow exception
when compiling huge methods. In particular, the methods had GT_PHI nodes with
huge number of arguments.
StackOverflow was happening in multiple places. Recent LIR changes eliminated two of those places,
these changes eliminate two more: gtSetEvalOrder and fgDebugCheckFlags (debug only). We already had
gtSetListOrder but it was only used for call arg lists. I made gtSetListOrder non-recursive and also generalized
to handle other GT_LIST nodes.
With that with these changes the huge repros can now be crossgen'd.
I verified no assembly diffs in SuperPMI.
I'm verifying overall throughput effect.
|
|
Enable long multiply
|
|
Fix Arm64 build breaks
|
|
Most long multiplies are converted in morph to helper calls. However,
GT_MULs of the form long = (long)int * (long)int are passed through to be
handled in codegen. In decompose, we convert these multiplies to
GT_MUL_HIs to be handled in a similar manner at GT_MULHI. Since mul and
imul take two ints and return a long in edx:eax, we can handle them
similarly to GT_CALLs, where we save them to lclvars if they aren't
already saved. In codegen, we generate a mul (or imul for signed
multiply), which produces the output in edx:eax, and then we store those
locations when we handle the storelclvar.
|
|
|
|
|
|
Fix GTF_ flag collision
|
|
The `GT_STORE_OBJ` case in codegen was missing a break.
`getStructHandleIfPresent` wasn't correctly handling `GT_ASG'.
The `srcCount` was not being set correctly for block ops with a constant size.
And I lost some code in `fgMakeTmpArgNode` with a merge.
Also, I had introduced a dumping issue in gentree.cpp when I made `GT_OBJ` a `GTK_UNOP` as a result of some PR comments.
|
|
We don't have to keep genIncRegBy() in codegen.h anymore,
since genIncRegBy() is defined in codegenlegacy.cpp and only used in LEGACY_BACKEND code.
Signed-off-by: Hyung-Kyu Choi <hk0110.choi@samsung.com>
|
|
remove `ARM_HAZARD_AVOIDANCE` as it is a rare case.
For #7003
|
|
|
|
Define flag `GTF_IS_IN_CSE` (which is used only to communicate in a call to
`gtNodeHasSideEffects` that the caller wants to disregard the side-effect
of running a .cctor) as `GTF_BOOLEAN` rather than `GTF_MAKE_CSE` -- as the
comment at the definition of `GTF_IS_IN_CSE` indicates, the only
requirement on its value is that it not conflict with one of the
side-effect flags, but `gtNodeHasSideEffects` does in fact check for
`GTF_MAKE_CSE`, so the current value conflicts.
Also update the code in `gtNodeHasSideEffects` that is dictated by
`GTF_IS_IN_CSE` to check just for that bit, instead of checking that the
caller passed in exactly `GTF_PERSISTENT_SIDE_EFFECTS_IN_CSE` -- no need
to entangle conceptually independent parameters.
Fixes #7042.
|
|
|
|
1st Class Struct Block Assignments
|
|
|
|
Enable FunctionEnter/FunctionLeave callbacks on ARM
|
|
|
|
|
|
|
|
|