Documentation/botr/porting-ryujit.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112

# RyuJIT: Porting to different platforms

## What is a Platform?
* Target instruction set and pointer size
* Target calling convention
* Runtime data structures (not really covered here)
* GC encoding
  * So far only JIT32_GCENCODER and everything else
* Debug info (so far mostly the same for all targets?)
* EH info (not really covered here)

One advantage of the CLR is that the VM (mostly) hides the (non-ABI) OS differences

## The Very High Level View
* 32 vs. 64 bits
  * This work is not yet complete in the backend, but should be sharable
* Instruction set architecture:
  * instrsXXX.h, emitXXX.cpp and targetXXX.cpp
  * lowerXXX.cpp
  * codeGenXXX.cpp and simdcodegenXXX.cpp
  * unwindXXX.cpp
* Calling Convention: all over the place

## Front-end changes
* Calling Convention
  * Struct args and returns seem to be the most complex differences 
    * Importer and morph are highly aware of these
      * E.g. fgMorphArgs(), fgFixupStructReturn(), fgMorphCall(), fgPromoteStructs() and the various struct assignment morphing methods
  * HFAs on ARM 
* Tail calls are target-dependent, but probably should be less so
* Intrinsics: each platform recognizes different methods as intrinsics (e.g. Sin only for x86, Round everywhere BUT amd64)
* Target-specific morphs such as for mul, mod and div

## Backend Changes
* Lowering: fully expose control flow and register requirements
* Code Generation: traverse blocks in layout order, generating code (InstrDescs) based on register assignments on nodes
  * Then, generate prolog & epilog, as well as GC, EH and scope tables
* ABI changes:
  * Calling convention register requirements
    * Lowering of calls and returns
    * Code sequences for prologs & epilogs
  * Allocation & layout of frame

## Target ISA "Configuration"
* Conditional compilation (set in jit.h, based on incoming define, e.g. #ifdef X86)
```C++
_TARGET_64_BIT_ (32 bit target is just ! _TARGET_64BIT_)
_TARGET_XARCH_, _TARGET_ARMARCH_
_TARGET_AMD64_, _TARGET_X86_, _TARGET_ARM64_, _TARGET_ARM_
```
* Target.h
* InstrsXXX.h

## Instruction Encoding
* The instrDesc is the data structure used for encoding
  * It is initialized with the opcode bits, and has fields for immediates and register numbers.
  * instrDescs are collected into groups
  * A label may only occur at the beginning of a group
* The emitter is called to:
  * Create new instructions (instrDescs), during CodeGen
  * Emit the bits from the instrDescs after CodeGen is complete
  * Update Gcinfo (live GC vars & safe points)

## Adding Encodings
* The instruction encodings are captured in instrsXXX.h. These are the opcode bits for each instruction
* The structure of each instruction's encoding is target-dependent
* An "instruction" is just the representation of the opcode
* An instance of "instrDesc" represents the instruction to be emitted
* For each "type" of instruction, emit methods need to be implemented. These follow a pattern but a target may have unique ones, e.g.
```C++
emitter::emitInsMov(instruction ins, emitAttr attr, GenTree* node)
emitter::emitIns_R_I(instruction ins, emitAttr attr, regNumber reg, ssize_t     val)
emitter::emitInsTernary(instruction ins, emitAttr attr, GenTree* dst, GenTree* src1, GenTree* src2) (currently Arm64 only)
```

## Lowering
* Lowering ensures that all register requirements are exposed for the register allocator
  * Use count, def count, "internal" reg count, and any special register requirements
  * Does half the work of code generation, since all computation is made explicit
    * But it is NOT necessarily a 1:1 mapping from lowered tree nodes to target instructions
  * Its first pass does a tree walk, transforming the instructions. Some of this is target-independent. Notable exceptions:
    * Calls and arguments
    * Switch lowering
    * LEA transformation
  * Its second pass walks the nodes in execution order
    * Sets register requirements
      * sometimes changes the register requirements children (which have already been traversed)
    * Sets the block order and node locations for LSRA
      * LinearScan:: startBlockSequence() and LinearScan::moveToNextBlock()

## Register Allocation
* Register allocation is largely target-independent
  * The second phase of Lowering does nearly all the target-dependent work
* Register candidates are determined in the front-end
  * Local variables or temps, or fields of local variables or temps
  * Not address-taken, plus a few other restrictions
  * Sorted by lvaSortByRefCount(), and marked "lvTracked"

## Addressing Modes
* The code to find and capture addressing modes is particularly poorly abstracted
* genCreateAddrMode(), in CodeGenCommon.cpp traverses the tree looking for an addressing mode, then captures its constituent elements (base, index, scale & offset) in "out parameters"
  * It optionally generates code
  * For RyuJIT, it NEVER generates code, and is only used by gtSetEvalOrder, and by lowering

## Code Generation
* For the most part, the code generation method structure is the same for all architectures
  * Most code generation methods start with "gen"
* Theoretically, CodeGenCommon.cpp contains code "mostly" common to all targets (this factoring is imperfect)
  * Method prolog, epilog, 
* genCodeForBBList
  * walks the trees in execution order, calling genCodeForTreeNode, which needs to handle all nodes that are not "contained"
  * generates control flow code (branches, EH) for the block