Morphing of call nodes in RyuJIT ========================= Overview -------- In C# and IL, and unlike C/C++, the evaluation order of the arguments for calls is strictly defined. Basically this is left to right in C# or the IL instruction ordering for IL. One issue that must be addressed is the problem of nested calls. Consider `Foo(x[i], Bar(y))`. We first must evaluate `x[i]` and possibly set up the first argument for `Foo()`. But immediately after that, we must set up `y` as first argument of `Bar()`. Thus, when we evaluate `x[i]` we will need to hold that value someplace while we set up and call `Bar().` Arguments that contain an assignment are another issue that we need to address. Most cases of this are rare except for post/pre increment, perhaps like this: `Foo(j, a[j++])`. Here `j` is updated via assignment when the second arg is evaluated, so the earlier uses of `j` would need to be evaluated and saved in a new LclVar.   One simple approach would be to create new single definition, single use LclVars for every argument that is passed. This would preserve the evaluation order. However, it would potentially create hundreds of LclVar for moderately sized methods and that would overflow the limited number of tracked local variables in the JIT. One observation is that many arguments to methods are either constants or LclVars and can be set up anytime we want. They usually will not need a new LclVar to preserve the order of evaluation rule.   Each argument is an arbitrary expression tree. The JIT tracks a summary of observable side-effects using a set of five bit flags in every GenTree node: `GTF_ASG`, `GTF_CALL`, `GTF_EXCEPT`, `GTF_GLOB_REF`, and `GTF_ORDER_SIDEEFF`. These flags are propagated up the tree so that the top node has a particular flag set if any of its child nodes has the flag set. Decisions about whether to evaluate arguments into temp LclVars are made by examining these flags on each of the arguments. *Our design goal for call sites is to create a few temp LclVars as possible, while preserving the order of evaluation rules of IL and C#.* Data Structures ------------ The most important data structure is the `GenTreeCall` node which represents a single call site and is created by the Importer when it sees a call in the IL. It is also used for internal calls that the JIT needs such as helper calls. Every `GT_CALL` node should have a `GTF_CALL` flag set on it. Nodes that may be implemented using a function call also should have the `GTF_CALL` flag set on them. The arguments for a single call site are held by three fields in the `GenTreeCall`: `gtCallObjp`, `gtCallArgs`, and `gtCallLateArgs`. The first one, `gtCallObjp`, contains the instance pointer ("this" pointer) when you are calling a method that takes an instance pointer, otherwise it is null. The `gtCallArgs` contains all of the normal arguments in a null terminated `GT_LIST` format. When the `GenTreeCall` is first created the `gtCallLateArgs` is null and is set up later when we call `fgMorphArgs()` during the global Morph of all nodes. To accurately record and track all of the information about call site arguments we create a `fgArgInfo` that records information and decisions that we make about how each argument for this call site is handled. It has a dynamically sized array member `argTable` that contains details about each argument. This per-argument information is contained in the `fgArgTabEntry` struct. `FEATURE_FIXED_OUT_ARGS` ----------------- All architectures support passing a limited number of arguments in registers and the additional arguments must be passed on the stack. There are two distinctly different approaches for passing arguments of the stack. For the x86 architecture a push instruction is typically used to pass a stack based argument. For all other architectures that we support we allocate the `lvaOutgoingArgSpaceVar`, which is a variable-sized stack based LclVar. Its size is determined by the call site that has the largest requirement for stack based arguments. The define `FEATURE_FIXED_OUT_ARGS` is 1 for architectures that use the outgoing arg space to pass their stack based arguments. There is only one outgoing argument space and any values stored there are considered to be killed by the very next call even if the next call doesn't take any stack based arguments. For x86, we can push some arguments on the stack for one call and leave them there while pushing some new arguments for a nested call. Thus we allow nested calls for x86 but do not allow them for the other architectures. Rules for when Arguments must be evaluated into temp LclVars ----------------- During the first Morph phase known as global Morph we call `fgArgInfo::ArgsComplete()` after we have completed building the `argTable` for `fgArgInfo` struct. This method applies the following rules: 1. When an argument is marked as containing an assignment using `GTF_ASG`, then we force all previous non-constant arguments to be evaluated into temps. This is very conservative, but at this phase of the JIT it is rare to have an assignment subtree as part of an argument. 2. When an argument is marked as containing a call using the `GTF_CALL` flag, then we force that argument and any previous argument that is marked with any of the `GTF_ALL_EFFECT` flags into temps. * Additionally, for `FEATURE_FIXED_OUT_ARGS`, any previous stack based args that we haven't marked as needing a temp but still need to store in the outgoing args area is marked as needing a placeholder temp using `needPlace`. 3. We force any arguments that use `localloc` to be evaluated into temps. 4. We mark any address taken locals with the `GTF_GLOB_REF` flag. For two special cases we call `EvalToTmp()` and set up the temp in `fgMorphArgs`. `EvalToTmp` records the tmpNum used and sets `isTmp` so that we handle it like the other temps. The special cases are for `GT_MKREFANY` and for a `TYP_STRUCT` argument passed by value when we can't optimize away the extra copy. Rules use to determine the order of argument evaluation ----------------- After calling `ArgsComplete()` the `SortArgs()` method is called to determine the optimal way to evaluate the arguments. This sorting controls the order that we place the nodes in the `gtCallLateArgs` list. 1. We iterate over the arguments and move any constant arguments to be evaluated last and remove them from further consideration by marking them as processed. 2. We iterate over the arguments and move any arguments that contain calls to be evaluated first and remove them from further consideration by marking them as processed. 3. We iterate over the arguments and move arguments that must be evaluated into temp LclVars to be after the ones that contain calls. 4. We iterate over the arguments and move arguments that are simple LclVar or LclFlds and put them before the constant args. 5. If there are any remaining arguments, we evaluate them from the most complex to the least complex. Evaluating Args into new LclVar temps and the creation of the LateArgs ----------------- After calling `SortArgs()`, the `EvalArgsToTemps()` method is called to create the temp assignments and to populate the LateArgs list. Arguments that are marked with `needTmp == true`. ----------------- 1. We create an assignment using `gtNewTempAssign`. This assignment replaces the original argument in the `gtCallArgs` list. After we create the assignment the argument is marked as `isTmp`. The new assignment is marked with the `GTF_LATE_ARG` flag. 2. Arguments that are already marked with `isTmp` are treated similarly as above except we don't create an assignment for them. 3. A `TYP_STRUCT` argument passed by value will have `isTmp` set to true and will use a `GT_COPYBLK` or a `GT_COPYOBJ` to perform the assignment of the temp. 4. The assignment node or the CopyBlock node is referred to as `arg1 SETUP` in the JitDump. Argument that are marked with `needTmp == false`. ----------------- 1. If this is an argument that is passed in a register, then the existing node is moved to the `gtCallLateArgs` list and a new `GT_ARGPLACE` (placeholder) node replaces it in the `gtArgList` list. 2. Additionally, if `needPlace` is true (only for `FEATURE_FIXED_OUT_ARGS`) then the existing node is moved to the `gtCallLateArgs` list and a new `GT_ARGPLACE` (placeholder) node replaces it in the `gtArgList` list. 3. Otherwise the argument is left in the `gtCallArgs` and it will be evaluated into the outgoing arg area or pushed on the stack. After the Call node is fully morphed the LateArgs list will contain the arguments passed in registers as well as additional ones for `needPlace` marked arguments whenever we have a nested call for a stack based argument. When `needTmp` is true the LateArg will be a LclVar that was created to evaluate the arg (single-def/single-use). When `needTmp` is false the LateArg can be an arbitrary expression tree.