diff options
Diffstat (limited to 'Documentation/design-docs/first-class-structs.md')
-rw-r--r-- | Documentation/design-docs/first-class-structs.md | 651 |
1 files changed, 651 insertions, 0 deletions
diff --git a/Documentation/design-docs/first-class-structs.md b/Documentation/design-docs/first-class-structs.md new file mode 100644 index 0000000000..fd6a3762c4 --- /dev/null +++ b/Documentation/design-docs/first-class-structs.md @@ -0,0 +1,651 @@ +First Class Structs +=================== + +Objectives +---------- +Primary Objectives +* Avoid forcing structs to the stack if they are only assigned to/from, or passed to/returned + from a call or intrinsic + - Including SIMD types as well as other pointer-sized-or-less struct types + - Enable enregistration of structs that have no field accesses +* Optimize these types as effectively as any other basic type + - Value numbering, especially for types that are used in intrinsics (e.g. SIMD) + - Register allocation + +Secondary Objectives +* No “swizzling” or lying about struct types – they are always struct types + - No confusing use of GT_LCL_FLD to refer to the entire struct as a different type + +Struct-Related Issues in RyuJIT +------------------------------- +The following issues illustrate some of the motivation for improving the handling of value types +(structs) in RyuJIT: + +* VSO Bug 98404: .NET JIT x86 - poor code generated for value type initialization + * This is a simple test case that should generate simply `xor eax; ret` on x86 and x64, but + instead generates many unnecessary copies. It is addressed by full enregistration of + structs that fit into a register: + +```C# +struct foo { public byte b1, b2, b3, b4; } +static foo getfoo() { return new foo(); } +``` + +* [\#1133 JIT: Excessive copies when inlining](https://github.com/dotnet/coreclr/issues/1133) + * The scenario given in this issue involves a struct that is larger than 8 bytes, so + it is not impacted by the fixed-size types. However, by enabling assertion propagation + for struct types (which, in turn is made easier by using normal assignments), the + excess copies can be eliminated. + * Note that these copies are not generated when passing and returning scalar types, + and it may be worth considering (in future) whether we can avoiding adding them + in the first place. + +* [\#1161 RyuJIT properly optimizes structs with a single field if the field type is int but not if it is double](https://github.com/dotnet/coreclr/issues/1161) + * This issue arises because we never promote a struct with a single double field, due to + the fact that such a struct may be passed or returned in a general purpose register. + This issue could be addressed independently, but should "fall out" of improved heuristics + for when to promote and enregister structs. + +* [\#1636 Add optimization to avoid copying a struct if passed by reference and there are no + writes to and no reads after passed to a callee](https://github.com/dotnet/coreclr/issues/1636). + * This issue is nearly the same as the above, except that in this case the desire is to + eliminate unneeded copies locally (i.e. not just due to inlining), in the case where + the struct may or may not be passed or returned directly. + * Unfortunately, there is not currently a scenario or test case for this issue. + +* [\#3144 Avoid marking tmp as DoNotEnregister in tmp=GT_CALL() where call returns a + enregisterable struct in two return registers](https://github.com/dotnet/coreclr/issues/3144) + * This issue could be addressed without First Class Structs. However, + it will be easier with struct assignments that are normalized as regular assignments, and + should be done along with the streamlining of the handling of ABI-specific struct passing + and return values. + +* [\#3539 RyuJIT: Poor code quality for tight generic loop with many inlineable calls](https://github.com/dotnet/coreclr/issues/3539) +(factor x8 slower than non-generic few calls loop). + * I am still investigating this issue. + +* [\#5556 RuyJIT: structs in parameters and enregistering](https://github.com/dotnet/coreclr/issues/5556) + * This also requires further investigation, but requires us to "Add support in prolog to extract fields, and + remove the restriction of not promoting incoming reg structs that have more than one field" - see [Dependent Work Items](https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/first-class-structs.md#dependent-work-items) + +Normalizing Struct Types +------------------------ +We would like to facilitate full enregistration of structs with the following properties: +1. Its fields are infrequently accessed, and +1. The entire struct fits into a register, and +2. Its value is used or defined in a register +(i.e. as an argument to or return value from calls or intrinsics). + +In RyuJIT, the concept of a type is very simplistic (which helps support the high throughput +of the JIT). Rather than a symbol table to hold the properties of a type, RyuJIT primarily +deals with types as simple values of an enumeration. When more detailed information is +required about the structure of a type, we query the type system, across the JIT/EE interface. +This is generally done only during the importer (translation from MSIL to the RyuJIT IR), and +during struct promotion analysis. As a result, struct types are treated as an opaque type +(TYP_STRUCT) of unknown size and structure. + +In order to treat fully-enregisterable struct types as "first class" types in RyuJIT, we + create new types with fixed size and structure: +* TYP_SIMD8, TYP_SIMD12, TYP_SIMD16 and (where supported by the target) TYP_SIMD32 + - These types already exist, and represent some already-completed steps toward First Class Structs. +* TYP_STRUCT1, TYP_STRUCT2, TYP_STRUCT4, TYP_STRUCT8 (on 64-bit systems) + - These types are new, and will be used where struct types of the given size are passed and/or + returned in registers. + +We want to identify and normalize these types early in the compiler, before any decisions are +made regarding whether they are constrained to live on the stack and whether and how they are +promoted (scalar replaced) or copied. + +One issue that arises is that it becomes necessary to know the size of any struct type that +we encounter, even if we may not actually need to know the size in order to generate code. +The major cause of additional queries seems to be for field references. It is possible to +defer some of these cases. I don't know what the throughput impact will be to always do the +normalization, but in principle I think it is worth doing because the alternative would be +to transform the types later (e.g. during morph) and use a contextual tree walk to see if we +care about the size of the struct. That would likely be a messier analysis. + +Current Struct IR Phase Transitions +----------------------------------- + +There are three phases in the JIT that make changes to the representation of struct tree +nodes and lclVars: + +* Importer + * All struct type lclVars have TYP_STRUCT + * All struct assignments/inits are block ops + * All struct call args are ldobj + * Other struct nodes have TYP_STRUCT +* Struct promotion + * Fields of promoted structs become separate lclVars (scalar promoted) with primitive types +* Global morph + * All struct nodes are transformed to block ops + - Besides call args + * Some promoted structs are forced to stack + - Become “dependently promoted” + * Call args + - Morphed to GT_LCL_FLD if passed in a register + - Treated in various ways otherwise (inconsistent) + +Proposed Approach +----------------- +The most fundamental change with first class structs is that struct assignments become +just a special case of assignment. The existing block ops (GT_INITBLK, GT_COPYBLK, + GT_COPYOBJ, GT_LDOBJ) are eliminated. Instead, the block operations in the incoming MSIL + are translated into assignments to or from a new GT_OBJ node. + +New fixed-size struct types are added: (TYP_STRUCT[1|2|4|8]), which are somewhat similar +to the (existing) SIMD types (TYP_SIMD[8|16|32]). As is currently done for the SIMD types, +these types are normalized in the importer. + +Conceptually, struct nodes refer to the object, not the address. This is important, as +the existing block operations all take address operands, meaning that any lclVar involved +in an assignment (including initialization) will be in an address-taken context in the JIT, +requiring special analysis to identify the cases where the address is only taken in order +to assign to or from the lclVar. This further allows for consistency in the treatment of +structs and simple types - even potentially enabling optimizations of non-enregisterable +structs. + +### Struct promotion + +* Struct promotion analysis + * Aggressively promote pointer-sized fields of structs used as args or returns + * Consider FULL promotion of pointer-size structs + * If there are fewer field references than calls or returns + +### Assignments +* Struct assignments look like any other assignment +* GenTreeAsg (GT_ASG) extends GenTreeOp with: + +```C# +// True if this assignment is a volatile memory operation. +bool IsVolatile() const { return (gtFlags & GTF_BLK_VOLATILE) != 0; } +bool gtAsgGcUnsafe; + +// What code sequence we will be using to encode this operation. +enum +{ + AsgKindInvalid, + AsgKindDirect, + AsgKindHelper, + AsgKindRepInstr, + AsgKindUnroll, +} gtAsgKind; +``` + +### Struct “objects” as lvalues +* Lhs of a struct assignment is a block node or lclVar +* Block nodes represent the address and “shape” info formerly on the block copy: + * GT_BLK and GT_STORE_BLK (GenTreeBlk) + * Has a (non-tree node) size field + * Addr() is op1 + * Data() is op2 + * GT_OBJ and GT_STORE_OBJ (GenTreeObj extends GenTreeBlk) + * gtClass, gtGcPtrs, gtGcPtrCount, gtSlots + * GT_DYN_BLK and GT_STORE_DYN_BLK (GenTreeDynBlk extends GenTreeBlk) + * Additional child gtDynamicSize + +### Struct “objects” as rvalues +After morph, structs on rhs of assignment are either: +* The tree node for the object: e.g. call, retExpr +* GT_IND of an address (e.g. GT_LEA) + +The lhs provides the “shape” for the assignment. Note: it has been suggested that these could +remain as GT_BLK nodes, but I have not given that any deep consideration. + +### Preserving Struct Types in Trees + +Prior to morphing, all nodes that may represent a struct type will have a class handle. +After morphing, some will become GT_IND. + +### Structs As Call Arguments + +All struct args imported as GT_OBJ, transformed as follows during morph: +* P_FULL promoted locals: + * Remain as a GT_LCL_VAR nodes, with the appropriate fixed-size struct type. + * Note that these may or may not be passed in registers. +* P_INDEP promoted locals: + * These are the ones where the fields don’t match the reg types + GT_STRUCT (or something) for aggregating multiple fields into a single register + * Op1 is a lclVar for the first promoted field + * Op2 is the lclVar for the next field, OR another GT_STRUCT + * Bit offset for the second child +* All other cases (non-locals, OR P_DEP or non-promoted locals): + * GT_LIST of GT_IND for each half + +### Struct Return + +The return of a struct value from the current method is represented as follows: +* GT_RET(GT_OBJ) initially +* GT_OBJ morphed, and then transformed similarly to call args + +Proposed Struct IR Phase Transitions +------------------------------------ + +* Importer + * Struct assignments are imported as GT_ASG + * Struct type is normalized to TYP_STRUCT* or TYP_SIMD* +* Struct promotion + * Fields of promoted structs become separate lclVars (as is) + * Enregisterable structs (including Pair Types) may be promoted to P_FULL (i.e. fully enregistered) + * As a future optimization, we may "restructure" multi-register argument or return values as a + synthesized struct of appropriately typed fields, and then promoted in the normal manner. +* Global morph + * All struct type local variables remain as simple GT_LCL_VAR nodes. + * All other struct nodes are transformed to GT_IND (rhs of assignment) or remain as GT_OBJ + * In Lowering, GT_OBJ will be changed to GT_BLK if there are no gc ptrs. This could be done + earlier, but there are some places where the object pointer is desired. + * It is not actually clear if there is a great deal of value in the GT_BLK, but it was added + to be more compatible with existing code that expects block copies with gc pointers to be + distinguished from those that do not. + * Promoted structs are forced to stack ONLY if address taken + * Call arguments + * Fixed-size enregisterable structs: GT_LCL_VAR or GT_OBJ of appropriate type. + * Multi-register arguments: GT_LIST of register-sized operands: + * GT_LCL_VAR if there is a promoted field that exactly matches the register size and type + (note that, if we have performed the optimization mentioned above in struct promotion, + we may have a GT_LCL_VAR of a synthesized struct field). + * GT_LCL_FLD if there is a matching field in the struct that has not been promoted. + * GT_IND otherwise. Note that if this is a struct local that does not have a matching field, + this will force the local to live on the stack. +* Lowering + * Pair types (e.g. TYP_LONG on 32-bit targets) are decomposed as needed to expose register requirements. + Although these are not strictly structs, their handling is similar. + * Computations are decomposed into their constituent parts when they independently write + separate registers. + * TYP_LONG lclVars (and TYP_DOUBLE on ARM) are split (similar to promotion/scalar replacement of + structs) if and only if they are register candidates. + * Other TYP_LONG/TYP_DOUBLE lclVars are loaded into independent registers either via: + * Single GT_LCL_VAR that will translate into a pair load instruction (ldp), with two register + targets, or + * GT_LCL_FLD (current approach) or GT_IND (probaby a better approach) + * Calls and loads that target multiple registers + * Existing gtLsraInfo has the capability to specify multiple destination registers + * Additional work is required in LSRA to handle these correctly + * If HFAs can be return values (not just call args), then we may need to support up to 4 destination + registers for LSRA + +Sample IR +--------- +### Bug 98404 +#### Before + +The `getfoo` method initializes a struct of 4 bytes. +The dump of the (single) local variable is included to show the change from `struct (8)` to +`struct4`, as the "exact size" of the struct is 4 bytes. +Here is the IR after Import: + +``` +; V00 loc0 struct ( 8) + + ▌ stmtExpr void (top level) (IL 0x000... ???) + │ ┌──▌ const int 4 + └──▌ initBlk void + │ ┌──▌ const int 0 + └──▌ <list> void + └──▌ addr byref + └──▌ lclVar struct V00 loc0 + + ▌ stmtExpr void (top level) (IL 0x008... ???) + └──▌ return int + └──▌ lclFld int V00 loc0 [+0] +``` +This is how it currently looks just before code generation: +``` + ▌ stmtExpr void (top level) (IL 0x000...0x003) + │ ┌──▌ const int 0 REG rax $81 + │ ├──▌ &lclVar byref V00 loc0 d:3 REG NA + └──▌ storeIndir int REG NA + + ▌ stmtExpr void (top level) (IL 0x008...0x009) + │ ┌──▌ lclFld int V00 loc0 u:3[+0] (last use) REG rax $180 + └──▌ return int REG NA $181 +``` +And here is the resulting code: +``` + push rax + xor rax, rax + mov qword ptr [V00 rsp], rax + xor eax, eax + mov dword ptr [V00 rsp], eax + mov eax, dword ptr [V00 rsp] + add rsp, 8 + ret +``` +#### After +Here is the IR after Import with the prototype First Class Struct changes. +Note that the fixed-size struct variable is assigned and returned just as for a scalar type. + +``` +; V00 loc0 struct4 + + ▌ stmtExpr void (top level) (IL 0x000... ???) + │ ┌──▌ const int 0 + └──▌ = struct4 (init) + └──▌ lclVar struct4 V00 loc0 + + + ▌ stmtExpr void (top level) (IL 0x008... ???) + └──▌ return struct4 + └──▌ lclVar struct4 V00 loc0 +``` +And Here is the resulting code just prior to code generation: +``` + ▌ stmtExpr void (top level) (IL 0x008...0x009) + │ ┌──▌ const struct4 0 REG rax $81 + └──▌ return struct4 REG NA $140 +``` +Finally, here is the resulting code that we were hoping to acheive: +``` + xor eax, eax +``` + +### Issue 1133: +#### Before + +Here is the IR after Inlining for the `TestValueTypesInInlinedMethods` method that invokes a +sequence of methods that are inlined, creating a sequence of copies. +Because this struct type does not fit into a single register, the types do not change (and +therefore the local variable table is not shown). + +``` + ▌ stmtExpr void (top level) (IL 0x000...0x003) + │ ┌──▌ const int 16 + └──▌ initBlk void + │ ┌──▌ const int 0 + └──▌ <list> void + └──▌ addr byref + └──▌ lclVar struct V00 loc0 + + ▌ stmtExpr void (top level) (IL 0x008... ???) + │ ┌──▌ const int 16 + └──▌ copyBlk void + │ ┌──▌ addr byref + │ │ └──▌ lclVar struct V00 loc0 + └──▌ <list> void + └──▌ addr byref + └──▌ lclVar struct V01 tmp0 + + ▌ stmtExpr void (top level) (IL 0x008... ???) + │ ┌──▌ const int 16 + └──▌ copyBlk void + │ ┌──▌ addr byref + │ │ └──▌ lclVar struct V01 tmp0 + └──▌ <list> void + └──▌ addr byref + └──▌ lclVar struct V02 tmp1 + + ▌ stmtExpr void (top level) (IL 0x008... ???) + │ ┌──▌ const int 16 + └──▌ copyBlk void + │ ┌──▌ addr byref + │ │ └──▌ lclVar struct V02 tmp1 + └──▌ <list> void + └──▌ addr byref + └──▌ lclVar struct V03 tmp2 + + ▌ stmtExpr void (top level) (IL 0x008... ???) + └──▌ call help long HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE + ├──▌ const long 0x7ff918494e10 + └──▌ const int 1 + + ▌ stmtExpr void (top level) (IL 0x008... ???) + │ ┌──▌ const int 16 + └──▌ copyBlk void + │ ┌──▌ addr byref + │ │ └──▌ lclVar struct V03 tmp2 + └──▌ <list> void + │ ┌──▌ const long 8 Fseq[#FirstElem] + └──▌ + byref + └──▌ field ref s_dt + + ▌ stmtExpr void (top level) (IL 0x00E... ???) + └──▌ return void +``` +And here is the resulting code: +``` +sub rsp, 104 +xor rax, rax +mov qword ptr [V00 rsp+58H], rax +mov qword ptr [V00+0x8 rsp+60H], rax +xor rcx, rcx +lea rdx, bword ptr [V00 rsp+58H] +vxorpd ymm0, ymm0 +vmovdqu qword ptr [rdx], ymm0 +vmovdqu ymm0, qword ptr [V00 rsp+58H] +vmovdqu qword ptr [V01 rsp+48H]ymm0, qword ptr +vmovdqu ymm0, qword ptr [V01 rsp+48H] +vmovdqu qword ptr [V02 rsp+38H]ymm0, qword ptr +vmovdqu ymm0, qword ptr [V02 rsp+38H] +vmovdqu qword ptr [V03 rsp+28H]ymm0, qword ptr +mov rcx, 0x7FF918494E10 +mov edx, 1 +call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE +mov rax, 0x1FAC6EB29C8 +mov rax, gword ptr [rax] +add rax, 8 +vmovdqu ymm0, qword ptr [V03 rsp+28H] +vmovdqu qword ptr [rax], ymm0 +add rsp, 104 +ret +``` + +#### After +After fginline: +(note that the obj node will become a blk node downstream). +``` + ▌ stmtExpr void (top level) (IL 0x000...0x003) + │ ┌──▌ const int 0 + └──▌ = struct (init) + └──▌ lclVar struct V00 loc0 + + ▌ stmtExpr void (top level) (IL 0x008... ???) + │ ┌──▌ lclVar struct V00 loc0 + └──▌ = struct (copy) + └──▌ lclVar struct V01 tmp0 + + ▌ stmtExpr void (top level) (IL 0x008... ???) + │ ┌──▌ lclVar struct V01 tmp0 + └──▌ = struct (copy) + └──▌ lclVar struct V02 tmp1 + + ▌ stmtExpr void (top level) (IL 0x008... ???) + │ ┌──▌ lclVar struct V02 tmp1 + └──▌ = struct (copy) + └──▌ lclVar struct V03 tmp2 + + ▌ stmtExpr void (top level) (IL 0x008... ???) + └──▌ call help long HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE + ├──▌ const long 0x7ff9184b4e10 + └──▌ const int 1 + + ▌ stmtExpr void (top level) (IL 0x008... ???) + │ ┌──▌ lclVar struct V03 tmp2 + └──▌ = struct (copy) + └──▌ obj(16) struct + │ ┌──▌ const long 8 Fseq[#FirstElem] + └──▌ + byref + └──▌ field ref s_dt + + ▌ stmtExpr void (top level) (IL 0x00E... ???) + └──▌ return void +``` +Here is the IR after fgMorph: +Note that copy propagation has propagated the zero initialization through to the final store. +``` + ▌ stmtExpr void (top level) (IL 0x000...0x003) + │ ┌──▌ const int 0 + └──▌ = struct (init) + └──▌ lclVar struct V00 loc0 + + ▌ stmtExpr void (top level) (IL 0x008... ???) + │ ┌──▌ const struct 0 + └──▌ = struct (init) + └──▌ lclVar struct V01 tmp0 + + ▌ stmtExpr void (top level) (IL 0x008... ???) + │ ┌──▌ const struct 0 + └──▌ = struct (init) + └──▌ lclVar struct V02 tmp1 + + ▌ stmtExpr void (top level) (IL 0x008... ???) + │ ┌──▌ const struct 0 + └──▌ = struct (init) + └──▌ lclVar struct V03 tmp2 + + ▌ stmtExpr void (top level) (IL 0x008... ???) + └──▌ call help long HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE + ├──▌ const long 0x7ffc8bbb4e10 + └──▌ const int 1 + + ▌ stmtExpr void (top level) (IL 0x008... ???) + │ ┌──▌ const struct 0 + └──▌ = struct (init) + └──▌ obj(16) struct + │ ┌──▌ const long 8 Fseq[#FirstElem] + └──▌ + byref + └──▌ indir ref + └──▌ const(h) long 0x2425b6229c8 static Fseq[s_dt] + + ▌ stmtExpr void (top level) (IL 0x00E... ???) + └──▌ return void + +``` +After liveness analysis the dead stores have been eliminated: +``` + ▌ stmtExpr void (top level) (IL 0x008... ???) + └──▌ call help long HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE + ├──▌ const long 0x7ffc8bbb4e10 + └──▌ const int 1 + + ▌ stmtExpr void (top level) (IL 0x008... ???) + │ ┌──▌ const struct 0 + └──▌ = struct (init) + └──▌ obj(16) struct + │ ┌──▌ const long 8 Fseq[#FirstElem] + └──▌ + byref + └──▌ indir ref + └──▌ const(h) long 0x2425b6229c8 static Fseq[s_dt] + + ▌ stmtExpr void (top level) (IL 0x00E... ???) + └──▌ return void +``` +And here is the resulting code, going from a code size of 129 bytes down to 58. +``` +sub rsp, 40 +mov rcx, 0x7FFC8BBB4E10 +mov edx, 1 +call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE +xor rax, rax +mov rdx, 0x2425B6229C8 +mov rdx, gword ptr [rdx] +add rdx, 8 +vxorpd ymm0, ymm0 +vmovdqu qword ptr [rdx], ymm0 +add rsp, 40 +ret +``` + +Work Items +---------- +This is a preliminary breakdown of the work into somewhat separable tasks. Those whose descriptions +are prefaced by '*' have been prototyped in an earlier version of the JIT, and that work is now +being re-integrated and tested, but may require some cleanup and/or phasing with other work items +before a PR is submitted. + +### Mostly-Independent work items +1. *Replace block ops with assignments & new nodes. + +2. *Add new fixed-size types, and normalize them in the importer (might be best to do this with or after #1, but not really dependent) + +3. LSRA + * Enable support for multiple destination regs, call nodes that return a struct in multiple + registers (for x64/ux, and for arm) + * Handle multiple destination regs for ldp on arm64 (could be done before or concurrently with the above). + Note that this work item is specifically intended for call arguments. It is likely the case that + utilizing ldp for general-purpose code sequences would be handled separately. + +4. X64/ux: aggressively promote lclVar struct incoming or outgoing args with two 8-byte fields + +5. X64/ux: + * modify the handling of multireg struct args to use GT_LIST of GT_IND + * remove the restriction to NOT promote things that are multi-reg args, as long as they match (i.e. two 8-byte fields). + Pass those using GT_LIST of GT_LCL_VAR. + * stop adding extra lclVar copies + +6. Arm64: + * Promote 16-byte struct lclVars that are incoming or outgoing register arguments only if they have 2 8-byte fields (DONE). + Pass those using GT_LIST of GT_LCL_VAR (as above for x64/ux). + Note that, if the types do not match, e.g. a TYP_DOUBLE field that will be passed in an integer register, + it will require special handling in Lowering and LSRA, as is currently done in the TYP_SIMD8 case. + * For other cases, pass as GT_LIST of GT_IND (DONE) + * The GT_LIST would be created in fgMorphArgs(). Then in Lower, putarg_reg nodes will be inserted between + the GT_LIST and the list item (GT_LCL_VAR or GT_IND). (DONE) + * Add support for HFAs. + + ### Dependent work items: + +7. *(Depends on 1 & 2): Fully enregister TYP_STRUCT[1|2|3|4|8] with no field accesses. + +8. *(Depends on 1 & 2): Enable value numbering and assertion propagation for struct types. + +9. (Depends on 1 & 2, mostly to avoid conflicts): Add support in prolog to extract fields, and + remove the restriction of not promoting incoming reg structs that have more than one field. + Note that SIMD types are already reassembled in the prolog. + +10. (Not really dependent, but probably best done after 1, 2, 5, 6): Add support for assembling + non-matching fields into registers for call args and returns. This includes producing the + appropriate IR, which may be simply be shifts and or's of the appropriate fields. + This would either be done during `fgMorphArgs()` and the `GT_RETURN` case of `fgMorphSmpOp()` + or as described below in + [Extracting and Assembling Structs](#Extract-Assemble). + +11. (Not really dependent, but probably best done after 1, 2, 5, 6): Add support for extracting the fields for the + returned struct value of a call, producing the appropriate IR, which may simply be shifts and + and's. + This would either be done during the morphing of the call itself, or as described below in + [Extracting and Assembling Structs](#Extract-Assemble). + +12. (Depends on 3, may replace the second part of 6): For arm64, add support for loading non-promoted + or non-local structs with ldp + * Either using TYP_STRUCT and special case handling, OR adding TYP_STRUCT16 + +13. (Depends on 7, 9, 10, 11): Enhance struct promotion to allow full enregistration of structs, + even if some field are accessed, if there are more call/return references than field references. + This work item should address issue #1161, by removing the automatic non-promotion + of structs with a single double field, and adding appropriate heuristics for when it + should be allowed. + +Related Work Item +----------------- +These changes are somewhat orthogonal, though will likely have merge issues if done in parallel with any of +the above: +* Unified API for ABI info + * Pass/Return info: + * Num regs used for passing + * Per-slot location (reg num / REG_STK) + * Per-slot type (for reg “slots”) + * Starting stack slot offset (if passed on stack) + * By reference? + * etc. + * We should be able to unify HFA handling into this model + * For arg passing, the API for creating the argEntry should take an arg state that keeps track of + what regs have been used, and handles the backfilling case for ARM + +Open Design Questions +--------------------- +### <a name="Extract-Assemble"/>Extracting and Assembling Structs + +Should the IR for extracting and assembling struct arguments from or to argument or return registers +be generated directly during the morphing of call arguments and returns, or should this capability +be handled in a more general fashion in `fgMorphCopyBlock()`? +The latter seems desirable for its general applicability. + +One way to handle this might be: + +1. Whenever you have a case of mismatched structs (call args, call node, or return node), + create a promoted temp of the "fake struct type", e.g. for arm you would introduce three + new temps for the struct, and for each of its TYP_LONG promoted fields. +2. Add an assignment to or from the temp (e.g. as a setup arg node), BUT the structs on + both sides of that assignment can now be promoted. +3. Add code to fgMorphCopyBlock to handle the extraction and assembling of structs. +4. The promoted fields of the temp would be preferenced to the appropriate argument or return registers. |