summaryrefslogtreecommitdiff
path: root/Documentation/botr
diff options
context:
space:
mode:
authorBruce Forstall <brucefo@microsoft.com>2016-07-01 17:57:57 -0700
committerBruce Forstall <brucefo@microsoft.com>2016-07-01 17:57:57 -0700
commit74db9d7c20e4927b2a129f762c3006de8e5bb49a (patch)
tree1f3c1e098df1d1b20b255c6c7ea239be6d8b5091 /Documentation/botr
parentce97aac437beb855a794155147d08ab94f60c1fe (diff)
downloadcoreclr-74db9d7c20e4927b2a129f762c3006de8e5bb49a.tar.gz
coreclr-74db9d7c20e4927b2a129f762c3006de8e5bb49a.tar.bz2
coreclr-74db9d7c20e4927b2a129f762c3006de8e5bb49a.zip
Add information on x86 EH to CLR ABI
Diffstat (limited to 'Documentation/botr')
-rw-r--r--Documentation/botr/clr-abi.md117
1 files changed, 109 insertions, 8 deletions
diff --git a/Documentation/botr/clr-abi.md b/Documentation/botr/clr-abi.md
index aa663336ac..cbd5fc903c 100644
--- a/Documentation/botr/clr-abi.md
+++ b/Documentation/botr/clr-abi.md
@@ -1,6 +1,6 @@
# CLR ABI
-This document describes the .NET Common Language Runtime (CLR) software conventions (or ABI, "Application Binary Interface"). It focusses on the ABI for the x64 (aka, AMD64), ARM (aka, ARM32 or Thumb-2), and ARM64 processor architectures. This document currently does not describe the x86 ABI.
+This document describes the .NET Common Language Runtime (CLR) software conventions (or ABI, "Application Binary Interface"). It focusses on the ABI for the x64 (aka, AMD64), ARM (aka, ARM32 or Thumb-2), and ARM64 processor architectures. Documentation for the x86 ABI is somewhat scant.
It describes requirements that the Just-In-Time (JIT) compiler imposes on the VM and vice-versa.
@@ -131,11 +131,17 @@ Saving/restoring all the non-volatile registers helps by preventing any register
For IL stubs, the Frame chain isn't popped at the call site, so instead it must be popped right before the epilog and right before any jmp calls. It looks like we do not support tail calls from PInvoke IL stubs?
-# Exception handling and funclets
+# Exception handling
-All managed exception handling (EH) handlers (finally, fault, filter, filter-handler, and catch) are extracted into their own 'funclets'. To the OS they are treated just like first class functions (separate PDATA and XDATA (`RUNTIME_FUNCTION` entry), etc.). The CLR currently treats them just like part of the parent function in many ways. The main function and all funclets must be allocated in a single code allocation (see hot cold splitting). They 'share' GC info. Only the main function prolog can be hot patched.
+This section describes the conventions the JIT needs to follow when generating code to implement managed exception handling (EH). The JIT and VM must agree on these conventions for a correct implementation.
-The only way to enter a handler funclet is via a call. In the case of an exception, the call is from the VM's EH subsystem as part of exception dispatch/unwind. In the non-exceptional case, this is called local unwind or a non-local exit. In C# this is accomplished by simply falling-through/out of a try body or an explicit goto. In IL this is always accomplished via a leave opcode, within a try body, targeting an IL offset outside the try body. In such cases the call is from the JITed code of the parent function.
+## Funclets
+
+For non-x86 platforms, all managed EH handlers (finally, fault, filter, filter-handler, and catch) are extracted into their own 'funclets'. To the OS they are treated just like first class functions (separate PDATA and XDATA (`RUNTIME_FUNCTION` entry), etc.). The CLR currently treats them just like part of the parent function in many ways. The main function and all funclets must be allocated in a single code allocation (see hot cold splitting). They 'share' GC info. Only the main function prolog can be hot patched.
+
+The only way to enter a handler funclet is via a call. In the case of an exception, the call is from the VM's EH subsystem as part of exception dispatch/unwind. In the non-exceptional case, this is called local unwind or a non-local exit. In C# this is accomplished by simply falling-through/out of a try body or an explicit goto. In IL this is always accomplished via a LEAVE opcode, within a try body, targeting an IL offset outside the try body. In such cases the call is from the JITed code of the parent function.
+
+For x86, all handlers are generated within the method body, typically in lexical order. A nested try/catch is generated completely within the EH region in which it is nested. These handlers are essentially "in-line funclets", but they do not look like normal functions: they do not have a normal prolog or epilog, although they do have special entry/exit and register conventions. Also, nested handlers are not un-nested as for funclets: the code for a nested handler is generated within the handler in which it is nested.
## Cloned finallys
@@ -263,7 +269,7 @@ Note that JIT64 does not implement this properly. The C# compiler used to always
## The PSPSym and funclet parameters
-The name *PSPSym* stands for Previous Stack Pointer Symbol. It is how a funclet accesses locals from the main function body.
+The name *PSPSym* stands for Previous Stack Pointer Symbol. It is how a funclet accesses locals from the main function body. This is not used for x86: the frame pointer on x86 is always preserved when the handlers are invoked.
First, two definitions.
@@ -287,10 +293,9 @@ Catch, Filter, and Filter-handlers also get an Exception object (GC ref) as an a
## Funclet Return Values
-The filter funclet returns a simple boolean value in the normal return register (AMD64: `RAX`, ARM/ARM64: `R0`). Non-zero indicates to the VM/EH subsystem that the corresponding filter-handler will handle the exception (i.e. begin the second pass). Zero indicates to the VM/EH subsystem that the exception is **not** handled, and it should continue looking for another filter or catch.
+The filter funclet returns a simple boolean value in the normal return register (x86: `EAX`, AMD64: `RAX`, ARM/ARM64: `R0`). Non-zero indicates to the VM/EH subsystem that the corresponding filter-handler will handle the exception (i.e. begin the second pass). Zero indicates to the VM/EH subsystem that the exception is **not** handled, and it should continue looking for another filter or catch.
-The catch and filter-handler funclets return a code address in the normal return register that indicates where the VM should resume execution after unwinding the stack and cleaning up from the exception. This address should be somewhere in the parent funclet (or main function if the catch or filter-handler is not nested within any other funclet). Because an IL 'leave' opcode can exit out of arbitrary nesting of funclets and try bodies, the JIT is often required to inject step blocks. These are intermediate branch target(s) that then branch to the next outermost target until the real target can be directly reached via the native ABI constraints. These step blocks can also invoke finallys (see
-*Invoking Finallys/Non-local exits*).
+The catch and filter-handler funclets return a code address in the normal return register that indicates where the VM should resume execution after unwinding the stack and cleaning up from the exception. This address should be somewhere in the parent funclet (or main function if the catch or filter-handler is not nested within any other funclet). Because an IL 'leave' opcode can exit out of arbitrary nesting of funclets and try bodies, the JIT is often required to inject step blocks. These are intermediate branch target(s) that then branch to the next outermost target until the real target can be directly reached via the native ABI constraints. These step blocks can also invoke finallys (see *Invoking Finallys/Non-local exits*).
Finally and fault funclets do not have a return value.
@@ -322,6 +327,100 @@ When a funclet finishes execution, and the VM returns execution to the function
Any register value changes made in the funclet are lost. If a funclet wants to make a variable change known to the main function (or the funclet that contains the "try" region), that variable change needs to be made to the shared main function stack frame.
+## x86 EH considerations
+
+The x86 model is somewhat different than the non-x86 model. X86-specific concerns are mentioned here.
+
+### catch / filter-handler regions
+
+When leaving a `catch` or `filter-handler` region, the JIT calls the helper `CORINFO_JIT_ENDCATCH` (implemented in the VM by the `JIT_EndCatch` function) before transferring control to the target location. The code to call to `CORINFO_JIT_ENDCATCH` is within the catch region itself.
+
+### finally / fault regions
+
+"finally" clauses are invoked in the non-exceptional code by the generated JIT code, and in the exceptional case by the VM. "fault" clauses are only executed in exceptional cases by the VM.
+
+On entry to the finally or fault, the top of the stack is the address that should be jumped to on exit from the finally, using a "pop eax; jmp eax" sequence. A simple 'ret' could be used, but we avoid it to avoid potentially creating an unbalanced processor call/ret buffer stack, and messing up call/ret prediction.
+
+There are no register or other stack arguments to a 'finally' or 'fault'.
+
+### ShadowSP slots
+
+X86 exception handlers (e.g., catch, finally) do not establish their own frames. They don't (really) have prologs and epilogs. However, they do use the stack, and need to restore the stack pointer of the enclosing exception handling region when the handler completes executing.
+
+To implement this requirement, for any function with EH, we create a frame-local variable to store a stack of "Shadow SP" values, or ShadowSP slots. In the JIT, the local var is called lvaShadowSPslotsVar, and in dumps it is called "EHSlots". The variable is created in lvaMarkLocalVars() and is sized as follows:
+1. 1 slot is reserved for the VM (for ICodeManager::FixContext(ppEndRegion)).
+2. 1 slot for each handler nesting level (total: ehMaxHndNestingCount).
+3. 1 slot for a filter (we do this even if there aren't any filters; size optimization opportunity to not do this if there are no filters?)
+4. 1 slot for zero termination
+
+Note that the since a slot on x86 is 4 bytes, the minimum size is 16 bytes. The idea is to have 1 slot for each handler that could be possibly be invoked at the same time. For example, for:
+
+```
+ try {
+ ...
+ } catch {
+ try {
+ ...
+ } catch {
+ ...
+ }
+ }
+```
+
+When the inner 'catch' is running, the outer 'catch' is also conceptually "on the stack", or in the middle of execution. So the maximum handler nesting count would be 2.
+
+The ShadowSP slots are filled in from the highest address downwards to the lowest address. The highest slot is reserved. The first address with a zero is a zero terminator. So, we always zero terminate by setting the second-to-highest slot to zero in the function prolog (if we didn't zero initialize all locals anyway).
+
+When calling a finally, we set the appropriate level to 0xFC (aka "finally call") and zero terminate the next-lower address.
+
+Thus, calling a finally from JIT generated code looks like:
+
+```
+ mov dword ptr [L_02+0x4 ebp-10H], 0 // This must happen before the 0xFC is written
+ mov dword ptr [L_02+0x8 ebp-0CH], 252 // 0xFC
+ push G_M52300_IG07
+ jmp SHORT G_M52300_IG04
+```
+
+In this case, `G_M52300_IG07` is not the address after the 'jmp', so a simple 'call' wouldn't work.
+
+The code this finally returns to looks like this:
+
+```
+ mov dword ptr [L_02+0x8 ebp-0CH], 0
+ jmp SHORT G_M52300_IG05
+```
+
+In this case, it zeros out the ShadowSP slot that it previously set to 0xFC, then jumps to the address that is the actual target of the leave from the finally.
+
+The JIT does this "end finally restore" by creating a GT_END_LFIN tree node, with the appropriate stack level as an operand, that generates this code.
+
+In the case of an exceptional 'finally' invocation, the VM sets up the 'return address' to whatever address it wants the JIT to return to.
+
+For catch handlers, the VM is completely in control of filling and reading the ShadowSP slots; the JIT just makes sure there is enough space.
+
+### ShadowSP slots frame location
+
+The ShadowSP slots are required to live in a very particular location, reported via the GC info header. Note that the GC info header does not contain an actual pointer or offset to the ShadowSP slots variable. Instead, the VM calculates the location from other data that does exist in the GC info header, as a negative offset from the EBP frame pointer (which must be established in functions with EH) using the function `GetFirstBaseSPslotPtr()` / `GetStartShadowSPSlotsOffset()`. The VM thus assumes the following frame layout:
+
+1. callee-saved registers <= EBP points to the top of this range
+2. GS cookie
+3. 1 slot if localloc is used (Saved localloc SP?)
+4. 1 slot for CORINFO_GENERICS_CTXT_FROM_PARAMTYPEARG -- assumed for any function with EH, to avoid adding a flag to the GC info about whether it exists or not.
+5. ShadowSP slots
+
+(note, these don't have to be in this order for this calculation, but they possibly do need to be in this order for other calculations.) See also `GetEndShadowSPSlotsOffset()`.
+
+The VM walks the ShadowSP slots in the function `GetHandlerFrameInfo()`, and sets it in various functions such as `EECodeManager::FixContext()`.
+
+### JIT implementation: finally
+
+An aside on the JIT implementation for x86.
+
+The JIT creates BBJ_CALLFINALLY/BBJ_ALWAYS pairs for calling the 'finally' clause. The BBJ_CALLFINALLY block will have a series of CORINFO_JIT_ENDCATCH calls appended at the end, if we need to "leave" a series of nested catches before calling the finally handler (due to a single 'leave' opcode attempting to leave multiple levels of different types of handlers). Then, a GT_END_LFIN statement with the finally clause handler nesting level as an argument is added to the step block where the finally returns to. This is used to generate code to zero out the appropriate level of the ShadowSP slot array after the finally has been executed. The BBJ_CALLFINALLY block itself generates the code to insert the 0xFC value into the ShadowSP slot array. If the 'finally' is invoked by the VM, in exceptional cases, then the VM itself updates the ShadowSP slot array before invoking the 'finally'.
+
+At the end of a finally or filter, a GT_RETFILT is inserted. For a finally, this is a TYP_VOID which is just a placeholder. For a filter, it takes an argument which evaluates to the return value from the filter. On legacy JIT, this tree triggers the generation of both the return value load (for filters) and the "funclet" exit sequence, which is either a "pop eax; jmp eax" for a finally, or a "ret" for a filter. When processing the BBJ_EHFINALLYRET or BBJ_EHFILTERRET block itself (at the end of code generation for the block), nothing is generated. In RyuJIT, the GT_RETFILT only loads up the return value (for filters) and does nothing for finally, and the block type processing after all the tree processing triggers the exit sequence to be generated. There is no real difference between these, except to centralize all "exit sequence" generation in the same place.
+
# EH Info, GC Info, and Hot & Cold Splitting
All GC info offsets and EH info offsets treat the function and funclets as if it was one big method body. Thus all offsets are relative to the start of the main method. Funclets are assumed to always be at the end of (after) all of the main function code. Thus if the main function has any cold code, all funclets must be cold. Or conversely, if there is any hot funclet code, all of the main method must be hot.
@@ -369,6 +468,8 @@ Filters are invoked in the 1st pass of EH processing and as such execution might
Duplicated clauses are a special set of entries in the EH tables to assist the VM. Specifically, if handler 'A' is also protected by an outer EH clause 'B', then the JIT must emit a duplicated clause, a duplicate of 'B', that marks the whole handler 'A' (which is now lexically disjoint for the range of code for the corresponding try body 'A') as being protected by the handler for 'B'.
+Duplicated clauses are not needed for x86.
+
During exception dispatch the VM uses these duplicated clauses to know when to skip any frames between the handler and its parent function. After skipping to the parent function, due to a duplicated clause, the VM searches for a regular/non-duplicate clause in the parent function. The order of duplicated clauses is important. They should appear after all of the main function clauses. They should still follow the normal sorting rules (inner-to-outer, top-to-bottom), but because the try-start/try-end will all be the same for a given handler, they should maintain the ordering, regarding inner-to-outer, as the corresponding original clause.
Example: