summaryrefslogtreecommitdiff
path: root/src/tools/r2rdump/README.md
blob: ee16e405739c25dce169e8b7f7722eb5e6430e9e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
# R2RDump

Parses and outputs the contents of a ReadyToRun image

## Usage

dotnet R2RDump.dll --in <path to ReadyToRun image>

* -o, --out <arg>
	- Output file path. Dumps everything to the specified file except help message and error messages
* -x, --xml
	- Output in XML format	
* --raw
	- Dump the raw bytes of each section or runtime function
* --header
	- Dump R2R header
* -d, --disasm
	- Show disassembly of methods or runtime functions
* -q, --query <arg>...
	- Query method by exact name, signature, row id or token
* -k, --keyword <arg>...
	- Search method by keyword
* -r, --runtimefunction <arg>...
	- Get one runtime function by id or relative virtual address
* -s, --section <arg>...
	- Get section by keyword
* --unwind
	- Dump unwindInfo
* --gc
	- Dump gcInfo and slot table
* --sc
	- Dump section contents
* -v, --verbose
	- Dump disassembly, unwindInfo, gcInfo and section contents

## Architectures Supported

### R2RDump Architectures

|             | x64 | x86       | ARM | ARM64 |
| ----------- | --- | --------- | --- | ----- |
| **Windows** | yes | no disasm |     |       |
| **Linux**   | yes |           |     |       |
| **OSX**     | yes | -         | -   | -     |

### Input Image Architectures

|             | x64 | x86 | ARM       | ARM64 |
| ----------- | --- | --- | --------- | ----- |
| **Windows** | yes | yes | no disasm | yes   |
| **Linux**   | yes | yes | no disasm |       |
| **OSX**     | yes | -   | -         | -     |    

## ReadyToRun Format

![R2RFormat](R2RFormat.png)

### System.Reflection.Metadata

Used for getting method and type signatures from tokens (see: http://jilc.sourceforge.net/ecma_p2_cil.shtml)

### READYTORUN_SECTION_COMPILER_IDENTIFIER

A string describing the compiler. 

Eg. "CoreCLR 4.5.30319.0 __BUILDMACHINE__"

### READYTORUN_SECTION_IMPORT_SECTIONS

A struct described in [READYTORUN_IMPORT_SECTION](../../inc/readytorun.h). Currently not parsed correctly

### READYTORUN_SECTION_RUNTIME_FUNCTIONS

A array of RVAs. For x64, each RuntimeFunction has RVAs to the start of the assembly, end of the assembly, and start of the UnwindInfo. For x86/Arm/Arm64, each RuntimeFunction has RVAs to the start of the assembly, and start of the UnwindInfo.

### READYTORUN_SECTION_METHODDEF_ENTRYPOINTS

A [NativeArray](NativeArray.cs) used for finding the index of the entrypoint RuntimeFunction for each method. The NativeArray is index by is the rowId-1 of a method. Each element in the NativeArray is an offset pointing to the RuntimeFunction index.

### READYTORUN_SECTION_AVAILABLE_TYPES

A [NativeHashtable](NativeHashtable.cs) mapping type hashcodes of types defined in the program to the rowIds. The hashcode is calculated with [ComputeNameHashCode](../../vm/typehashingalgorithms.h)(namespace) ^ [ComputeNameHashCode](../../vm/typehashingalgorithms.h)(name)

### READYTORUN_SECTION_ATTRIBUTEPRESENCE

A [NativeCuckooFilter](NativeHashtable.cs) to discover which tokens have which "System.Runtime." prefixed attributes. The System.Runtime.CompilerServices.NullableAttribute is not used in this calculation. The filter is composed of a name hash of the type name using [ComputeNameHashCode](../../vm/typehashingalgorithms.h)(namespace + name) hash combined with a hash of each token that produced it. In addition the upper 16 bits is used as the fingerprint in the filter. 

### READYTORUN_SECTION_INSTANCE_METHOD_ENTRYPOINTS

A [NativeHashtable](NativeHashtable.cs) mapping type hashcodes of generic instances to the (methodFlags, methodRowId, list of types, runtimeFunctionId). Each type in the list of types corresponds to a generic type in the method.

Eg. GenericMethod<S, T>(T arg1, S arg2) instantiated for <int, UserDefinedStruct> is in the hashtable as:

(hashcode) -> (methodFlags) (methodRowId) (number of generic types) (Int32) (ValueType) (RowId of UserDefinedStruct) (offset to RuntimeFunctionId)

### UnwindInfo

A struct described in [_UNWIND_INFO](../../inc/win64unwind.h). Each RuntimeFunction has its own UnwindInfo.

For x86, it contains only an encoded function length

For x64, Arm and Arm64, it contains a bit field followed by an array of unwind codes ([_UNWIND_CODE](../../inc/win64unwind.h)) and finally padding to make it byte aligned

The unwind data info structure is used to record the effects a function has on the stack pointer and where the nonvolatile registers are saved on the stack (see https://msdn.microsoft.com/en-us/library/0kd71y96.aspx)

### GcInfo

Written into the ReadyToRun image right after UnwindInfo. Contains a header, GcSlots and GcTransitions (register liveness).

The x64/Arm/Arm64 GcInfo is written in crossgen by [GcInfoEncoder::Build](../../gcinfo/gcinfoencoder.cpp) and decoded similar to [GcInfoDecoder::EnumerateLiveSlots](../../vm/gcinfodecoder.cpp). The x86 gcInfo is written by [GCInfo::gcMakeRegPtrTable](../../jit/gcencode.cpp) and decoded similar to [GCDump::DumpGCTable](../../gcdump/i386/gcdumpx86.cpp)

Contains the code length followed by the header, GcSlots, and finally GcTransitions

The header contains flags indicating which properties are in the GcInfo. GcSlots gives details on the registers or stack pointer offsets that are used in the method. GcTransitions give the CodeOffsets (which line in the assembly code) where GcSlots (excluding untracked slots) become live or dead

In x64/Arm/Arm64, GcTransitions are grouped into chunks where each chunk covers NUM_NORM_CODE_OFFSETS_PER_CHUNK lines of assembly code. The following format is used:
> Array of offsets pointing to each chunk

> Padding to make it byte aligned

> For each chunk:
>> 1 bit indicating if it's RLE encoded

>> Array of bits indicating if each slot changed state in the chunk (ie. false if the slot is not used in the chunk). R2RDump uses this to calculate NumCouldBeLiveSlots and obtain slotIds

>> Array of bits indicating if each slot is live at the end of the chunk

>> For each slot that changed state in the chunk:
>>> Array of elements consisting of a bit set to 1 and the normCodeOffsetDelta indicating all the code offsets where the slot changed state in the chunk. CodeOffset = normCodeOffsetDelta + normChunkBaseCodeOffset + currentRangeStartOffset - cumInterruptibleLength, where normChunkBaseCodeOffset is the sum of the sizes of all preceeding chunks, currentRangeStartOffset is the start offset of the interruptible range that the transition falls under and cumInterruptibleLength is the sum of the lengths of interruptible ranges that came before it

## Todo

* Support R2RDump on ARM and ARM64 (https://github.com/dotnet/coreclr/issues/19089)

* Parse R2RSections: READYTORUN_SECTION_EXCEPTION_INFO, READYTORUN_SECTION_DEBUG_INFO, READYTORUN_SECTION_DELAYLOAD_METHODCALL_THUNKS, READYTORUN_SECTION_INLINING_INFO, READYTORUN_SECTION_PROFILEDATA_INFO (https://github.com/dotnet/coreclr/issues/19616)

* Reenable R2RDumpTests after making it less fragile

* Fix issues with disasm on Arm (https://github.com/dotnet/coreclr/issues/19637) and disasm using x86 coredistools (https://github.com/dotnet/coreclr/issues/19564)

* Test R2RDump on more test cases to make sure it runs reliably and verify that the output is accurate (list of failing inputs: https://github.com/dotnet/coreclr/issues/19642)