summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorJiyoung Yun <jy910.yun@samsung.com>2016-11-23 10:09:09 (GMT)
committerJiyoung Yun <jy910.yun@samsung.com>2016-11-23 10:09:09 (GMT)
commit4b4aad7217d3292650e77eec2cf4c198ea9c3b4b (patch)
tree98110734c91668dfdbb126fcc0e15ddbd93738ca /Documentation
parentfa45f57ed55137c75ac870356a1b8f76c84b229c (diff)
downloadcoreclr-4b4aad7217d3292650e77eec2cf4c198ea9c3b4b.zip
coreclr-4b4aad7217d3292650e77eec2cf4c198ea9c3b4b.tar.gz
coreclr-4b4aad7217d3292650e77eec2cf4c198ea9c3b4b.tar.bz2
Imported Upstream version 1.1.0upstream/1.1.0
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/.gitmirrorall1
-rw-r--r--Documentation/README.md84
-rw-r--r--Documentation/botr/_tableOfContents.md34
-rw-r--r--Documentation/botr/botr-faq.md46
-rw-r--r--Documentation/botr/clr-abi.md661
-rw-r--r--Documentation/botr/dac-notes.md213
-rw-r--r--Documentation/botr/exceptions.md299
-rw-r--r--Documentation/botr/garbage-collection.md332
-rw-r--r--Documentation/botr/intro-to-clr.md261
-rw-r--r--Documentation/botr/method-descriptor.md343
-rw-r--r--Documentation/botr/mscorlib.md357
-rw-r--r--Documentation/botr/porting-ryujit.md112
-rw-r--r--Documentation/botr/profilability.md240
-rw-r--r--Documentation/botr/profiling.md513
-rw-r--r--Documentation/botr/readytorun-overview.md335
-rw-r--r--Documentation/botr/ryujit-overview.md558
-rw-r--r--Documentation/botr/stackwalking.md85
-rw-r--r--Documentation/botr/threading.md210
-rw-r--r--Documentation/botr/type-loader.md317
-rw-r--r--Documentation/botr/type-system.md233
-rw-r--r--Documentation/botr/virtual-stub-dispatch.md188
-rw-r--r--Documentation/building/buildinglldb.md87
-rw-r--r--Documentation/building/cross-building.md137
-rw-r--r--Documentation/building/crossgen.md66
-rw-r--r--Documentation/building/debugging-instructions.md145
-rw-r--r--Documentation/building/freebsd-instructions.md282
-rw-r--r--Documentation/building/linux-instructions.md220
-rw-r--r--Documentation/building/netbsd-instructions.md129
-rw-r--r--Documentation/building/osx-instructions.md80
-rw-r--r--Documentation/building/test-configuration.md41
-rw-r--r--Documentation/building/testing-with-corefx.md16
-rw-r--r--Documentation/building/unix-test-instructions.md81
-rw-r--r--Documentation/building/viewing-jit-dumps.md173
-rw-r--r--Documentation/building/windows-instructions.md188
-rw-r--r--Documentation/building/windows-test-instructions.md106
-rw-r--r--Documentation/coding-guidelines/EventLogging.md19
-rw-r--r--Documentation/coding-guidelines/clr-code-guide.md1269
-rw-r--r--Documentation/coding-guidelines/clr-jit-coding-conventions.md2001
-rw-r--r--Documentation/coding-guidelines/cross-platform-performance-and-eventing.md287
-rw-r--r--Documentation/design-docs/first-class-structs.md651
-rw-r--r--Documentation/design-docs/inline-size-estimates.md526
-rw-r--r--Documentation/design-docs/inlining-plans.md438
-rw-r--r--Documentation/design-docs/longs-on-32bit-arch.md117
-rw-r--r--Documentation/design-docs/removing-embedded-statements.md180
-rw-r--r--Documentation/images/dac-overview.pngbin0 -> 8955 bytes
-rw-r--r--Documentation/images/methoddesc-fig1.pngbin0 -> 23050 bytes
-rw-r--r--Documentation/images/methoddesc-fig2.pngbin0 -> 18739 bytes
-rw-r--r--Documentation/images/methoddesc-fig3.pngbin0 -> 16945 bytes
-rw-r--r--Documentation/images/profiling-exception-callback-sequence.pngbin0 -> 234437 bytes
-rw-r--r--Documentation/images/profiling-gc.pngbin0 -> 32961 bytes
-rw-r--r--Documentation/images/profiling-overview.pngbin0 -> 71763 bytes
-rw-r--r--Documentation/images/ryujit-ir-overview.pngbin0 -> 158980 bytes
-rw-r--r--Documentation/images/stack.pngbin0 -> 90302 bytes
-rw-r--r--Documentation/images/type-system-dependencies.pngbin0 -> 28616 bytes
-rw-r--r--Documentation/images/typeloader-fig1.pngbin0 -> 16940 bytes
-rw-r--r--Documentation/images/typeloader-fig2.pngbin0 -> 7093 bytes
-rw-r--r--Documentation/images/typeloader-fig3.pngbin0 -> 15030 bytes
-rw-r--r--Documentation/images/typeloader-fig4.pngbin0 -> 17294 bytes
-rw-r--r--Documentation/images/virtualstubdispatch-fig1.pngbin0 -> 21092 bytes
-rw-r--r--Documentation/images/virtualstubdispatch-fig2.pngbin0 -> 10696 bytes
-rw-r--r--Documentation/images/virtualstubdispatch-fig3.pngbin0 -> 9840 bytes
-rw-r--r--Documentation/images/virtualstubdispatch-fig4.pngbin0 -> 8315 bytes
-rw-r--r--Documentation/project-docs/adding_new_public_apis.md31
-rw-r--r--Documentation/project-docs/ci-trigger-phrases.md343
-rwxr-xr-xDocumentation/project-docs/clr-complus-conf-docgen.sh105
-rw-r--r--Documentation/project-docs/clr-configuration-knobs.md809
-rw-r--r--Documentation/project-docs/contributing-workflow.md107
-rw-r--r--Documentation/project-docs/contributing.md166
-rw-r--r--Documentation/project-docs/developer-guide.md28
-rw-r--r--Documentation/project-docs/dotnet-filenames.md11
-rw-r--r--Documentation/project-docs/dotnet-standards.md69
-rw-r--r--Documentation/project-docs/garbage-collector-guidelines.md37
-rw-r--r--Documentation/project-docs/glossary.md28
-rw-r--r--Documentation/project-docs/jit-testing.md169
-rw-r--r--Documentation/project-docs/linux-performance-tracing.md142
-rw-r--r--Documentation/project-docs/performance-guidelines.md54
-rw-r--r--Documentation/project-docs/profiling-api-status.md80
-rw-r--r--Documentation/project-docs/project-priorities.md27
-rw-r--r--Documentation/project-docs/windows-performance-tracing.md14
79 files changed, 14881 insertions, 0 deletions
diff --git a/Documentation/.gitmirrorall b/Documentation/.gitmirrorall
new file mode 100644
index 0000000..9ee5c57
--- /dev/null
+++ b/Documentation/.gitmirrorall
@@ -0,0 +1 @@
+This folder will be mirrored by the Git-TFS Mirror recursively. \ No newline at end of file
diff --git a/Documentation/README.md b/Documentation/README.md
new file mode 100644
index 0000000..7e29013
--- /dev/null
+++ b/Documentation/README.md
@@ -0,0 +1,84 @@
+Documents Index
+===============
+
+This repo includes several documents that explain both high-level and low-level concepts about the .NET runtime. These are very useful for contributors, to get context that can be very difficult to acquire from just reading code.
+
+Intro to .NET Core
+==================
+
+.NET Core is a self-contained .NET runtime and framework that implements [ECMA 335](project-docs/dotnet-standards.md). It can be (and has been) ported to multiple architectures and platforms. It supports a variety of installation options, having no specific deployment requirements itself.
+
+Getting Started
+===============
+
+- [Installing the .NET Core SDK](https://www.microsoft.com/net/core)
+- [[WIP] Official .NET Core Docs](http://dotnet.github.io/docs/)
+
+Project Docs
+============
+
+- [Developer Guide](project-docs/developer-guide.md)
+- [Project priorities](project-docs/project-priorities.md)
+- [Contributing to .NET Core](project-docs/contributing.md)
+- [Contributing Workflow](project-docs/contributing-workflow.md)
+- [Performance Guidelines](project-docs/performance-guidelines.md)
+- [Garbage Collector Guidelines](project-docs/garbage-collector-guidelines.md)
+- [Adding new public APIs to mscorlib](project-docs/adding_new_public_apis.md)
+- [Project NuGet Dependencies](https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/project-nuget-dependencies.md)
+
+Coding Guidelines
+=================
+
+- [CLR Coding Guide](coding-guidelines/clr-code-guide.md)
+- [CLR JIT Coding Conventions](coding-guidelines/clr-jit-coding-conventions.md)
+- [Cross Platform Performance and Eventing Design](coding-guidelines/cross-platform-performance-and-eventing.md)
+- [Adding New Events to the VM](coding-guidelines/EventLogging.md)
+
+Build CoreCLR from Source
+=========================
+
+- [Building CoreCLR on FreeBSD](building/freebsd-instructions.md)
+- [Building CoreCLR on Linux](building/linux-instructions.md)
+- [Building CoreCLR on OS X](building/osx-instructions.md)
+- [Building CoreCLR on Windows](building/windows-instructions.md)
+
+Testing and Debugging CoreCLR
+=============================
+
+- [Debugging CoreCLR](building/debugging-instructions.md)
+- [Testing Changes on Windows](building/windows-test-instructions.md)
+- [Testing Changes on Linux, OS X, and FreeBSD](building/unix-test-instructions.md)
+- [Testing with CoreFX](building/testing-with-corefx.md)
+- [Performance Tracing on Windows](project-docs/windows-performance-tracing.md)
+- [Performance Tracing on Linux](project-docs/linux-performance-tracing.md)
+- [Creating native images](building/crossgen.md)
+
+Book of the Runtime
+===================
+
+The Book of the Runtime is a set of chapters that go in depth into various
+interesting aspects of the design of the .NET Framework.
+
+- [Book of the Runtime](botr/_tableOfContents.md)
+
+For your convenience, here are a few quick links to popular chapters:
+
+- [Introduction to the Common Language Runtime](botr/intro-to-clr.md)
+- [Garbage Collection Design](botr/garbage-collection.md)
+- [Type System](botr/type-system.md)
+
+Decoder Rings
+=============
+
+- [.NET Core Glossary](project-docs/glossary.md)
+- [.NET Filename Encyclopedia](project-docs/dotnet-filenames.md)
+
+Other Information
+=================
+
+- [CoreFX Repo documentation](https://github.com/dotnet/corefx/tree/master/Documentation)
+- [Porting to .NET Core](https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/support-dotnet-core-instructions.md)
+- [.NET Standards (Ecma)](project-docs/dotnet-standards.md)
+- [CLR Configuration Knobs](project-docs/clr-configuration-knobs.md)
+- [MSDN Entry for the CLR](http://msdn.microsoft.com/library/8bs2ecf4.aspx)
+- [Wikipedia Entry for the CLR](http://en.wikipedia.org/wiki/Common_Language_Runtime)
diff --git a/Documentation/botr/_tableOfContents.md b/Documentation/botr/_tableOfContents.md
new file mode 100644
index 0000000..db4ffc1
--- /dev/null
+++ b/Documentation/botr/_tableOfContents.md
@@ -0,0 +1,34 @@
+
+#The Book of the Runtime
+
+Welcome to the Book of the Runtime (BOTR) for the .NET Runtime. This contains
+a collection of articles about the non-trivial internals of the .NET Runtime. Its
+intended audience are people actually modifying the code or simply wishing to have a
+deep understanding of the runtime.
+
+Below is a table of contents.
+
+- [Book of the Runtime FAQ](botr-faq.md)
+- [Introduction to the Common Language Runtime](intro-to-clr.md)
+- [Garbage Collection Design](garbage-collection.md)
+- [Threading](threading.md)
+- [RyuJIT Overview](ryujit-overview.md)
+ - [Porting RyuJIT to other platforms](porting-ryujit.md)
+- [Type System](type-system.md)
+- [Type Loader](type-loader.md)
+- [Method Descriptor](method-descriptor.md)
+- [Virtual Stub Dispatch](virtual-stub-dispatch.md)
+- [Stack Walking](stackwalking.md)
+- [Mscorlib and Calling Into the Runtime](mscorlib.md)
+- [Data Access Component (DAC) Notes](dac-notes.md)
+- [Profiling](profiling.md)
+- [Implementing Profilability](profilability.md)
+- [What Every Dev needs to Know About Exceptions in the Runtime](exceptions.md)
+- [ReadyToRun Overview](readytorun-overview.md)
+- [CLR ABI](clr-abi.md)
+
+
+It may be possible that this table is not complete. You can get a complete list
+by looking at the directory where all the chapters are stored:
+
+* [All Book of the Runtime (BOTR) chapters on GitHub](../botr)
diff --git a/Documentation/botr/botr-faq.md b/Documentation/botr/botr-faq.md
new file mode 100644
index 0000000..b1a1de4
--- /dev/null
+++ b/Documentation/botr/botr-faq.md
@@ -0,0 +1,46 @@
+Book of the Runtime (BotR) FAQ
+===
+
+# What is the BotR?
+
+The [Book of the Runtime](https://github.com/dotnet/coreclr#learn-about-coreclr) is a set of documents that describe components in the CLR and BCL. They are intended to focus more on architecture and invariants and not an annotated description of the codebase.
+
+It was originally created within Microsoft in ~ 2007, including this document. Developers were responsible to document their feature areas. This helped new devs joining the team and also helped share the product architecture across the team.
+
+We realized that the BotR is even more valuable now, with CoreCLR being open source on GitHub. We are publishing BotR chapters to help a new set of CLR developers.
+
+Each of the BoTR documents were written with a [certain perspective](https://github.com/dotnet/coreclr/pull/115), both in terms of the timeframe and the author. We did not think it was right to mutate the documents to make them more "2015". They remain the docs that they were, modulo a few spelling corrections and a conversion to markdown. That said, we'll accept PRs to the docs to improve them.
+
+# Who is the main audience of BotR?
+
+- Developers who are working on bugs that impinge on an area and need a high level overview of the component.
+- Developers working on new features with dependencies on a component need to know enough about it to ensure the new feature will interact correctly with existing components.
+- New developers need this chapter to maintain a given component.
+
+# What should be in a BotR chapter?
+
+The purpose of Book of the Runtime chapters is to capture information that we cannot easily reconstruct from the functional specification and source code alone, and to enable communication at a high level between team members. It explains concepts and presents a top-down description, and mostly importantly, explains why we made the design decisions we made.
+
+# How is this different from a design doc?
+
+A design doc is what you write before you start implementation. A BotR chapter is usually written after a feature is implemented, at which point you have already decided the pros and cons of various design options and settled on one (and perhaps have plans to use an improved design in the future), and have a much better idea about all the details (some of which could be very hard to think of without actually going through the implementation/testing). So you can talk about rationales behind design decisions a lot better.
+
+# I am a new dev and not familiar with any features yet, how can I contribute?
+
+A new dev can be a great contributor to BotR as one of the most important purposes of BotR is to help new devs with getting up to speed. Here are some ways you can contribute:
+
+- Be a reviewer! If you think some things are not clear or could be explained better, do not hesitate to contact the author of the chapter and chat with him/her to see how you can make it more understandable.
+- As you are getting up to speed in your area, look over the BotR chapters for your area and see if there are any errors or anything that requires an update and make the modifications yourself.
+- Volunteer to write a chapter or part of a chapter. This might seem like a daunting task but you can start by just accumulating knowledge - take notes as you learn stuff about your area and gradually mold it into a BotR chapter.
+
+# What are the responsibilities of a BotR reviewer?
+
+As a reviewer you will be expected to give constructive comments on the chapter you are reviewing. You can comment on any aspect, eg. the technical depth, writing style, content coverage. Keep in mind that BotR is mostly about design and architectural issues that may not be obvious. It is not meant to walk you through implementation details. Please focus on that.
+
+# I _really_ don't have time to work on a BotR chapter – it seems like I always have other things to do. What do I do?
+
+Here are some ways I think would be useful when working on BotR.
+
+- Spread the work out; don't make it a workitem as in "I will need to spend the next Monday through Thursday to work on my chapter"; think of it more like something you do when you want to take a break from coding or bug fixing, or just a change of scenery. I find it much easier to spend a little time here and there working on a chapter than having to specifically allocate a contiguous number of days which always seem hard to come by.
+- Have someone else write the chapter or most of the chapter for you. I am not joking. This is actually a very good way to help new devs ramp up. If you will be mentoring a new dev in your area, spend time with them to explain the feature area and encourage them to write a BotR chapter if one doesn't already exist. Of course be a reviewer of it.
+- Use other documentation that is already there. There are MSDN docs and blog posts on .NET features. This can certainly be a base for your BotR chapter as well.
diff --git a/Documentation/botr/clr-abi.md b/Documentation/botr/clr-abi.md
new file mode 100644
index 0000000..cbd5fc9
--- /dev/null
+++ b/Documentation/botr/clr-abi.md
@@ -0,0 +1,661 @@
+# CLR ABI
+
+This document describes the .NET Common Language Runtime (CLR) software conventions (or ABI, "Application Binary Interface"). It focusses on the ABI for the x64 (aka, AMD64), ARM (aka, ARM32 or Thumb-2), and ARM64 processor architectures. Documentation for the x86 ABI is somewhat scant.
+
+It describes requirements that the Just-In-Time (JIT) compiler imposes on the VM and vice-versa.
+
+A note on the JIT codebases: JIT32 refers to the original JIT codebase that originally generated x86 code and was later ported to generate ARM code. Later, it was ported and re-architected to generate AMD64 code (making its name something of a confusing misnomer). This work is referred to as RyuJIT. RyuJIT is being ported to generate ARM64 code. JIT64 refers to the legacy codebase that supports AMD64.
+
+# Getting started
+
+Read everything in the documented Windows ABI.
+
+AMD64: See "x64 Software Conventions" on MSDN: https://msdn.microsoft.com/en-us/library/7kcdt6fy.aspx.
+
+ARM: See "Overview of ARM ABI Conventions" on MSDN: https://msdn.microsoft.com/en-us/library/dn736986.aspx.
+
+The CLR follows those basic conventions. This document only describes things that are CLR-specific, or exceptions from those documents.
+
+# General Unwind/Frame Layout
+
+For all non-x86 platforms, all methods must have unwind information so the garbage collector (GC) can unwind them (unlike native code in which a leaf method may be omitted).
+
+ARM and ARM64: Managed methods must always push LR on the stack, and create a minimal frame, so that the method can be properly hijacked using return address hijacking.
+
+# Special/extra parameters
+
+## The "this" pointer
+
+The managed "this" pointer is treated like a new kind of argument not covered by the native ABI, so we chose to always pass it as the first argument in (AMD64) `RCX` or (ARM, ARM64) `R0`.
+
+AMD64-only: Up to .NET Framework 4.5, the managed "this" pointer was treated just like the native "this" pointer (meaning it was the second argument when the call used a return buffer and was passed in RDX instead of RCX). Starting with .NET Framework 4.5, it is always the first argument.
+
+## Varargs
+
+Varargs refers to passing or receiving a variable number of arguments for a call.
+
+C# varargs, using the `params` keyword, are at the IL level just normal calls with a fixed number of parameters.
+
+Managed varargs (using C#'s pseudo-documented "...", `__arglist`, etc.) are implemented almost exactly like C++ varargs. The biggest difference is that the JIT adds a "vararg cookie" after the optional return buffer and the optional "this" pointer, but before any other user arguments. The callee must spill this cookie and all subsequent arguments into their home location, as they may be addressed via pointer arithmetic starting with the cookie as a base. The cookie happens be to a pointer to a signature that the runtime can parse to (1) report any GC pointers within the variable portion of the arguments or (2) type-check (and properly walk over) any arguments extracted via ArgIterator. This is marked by `IMAGE_CEE_CS_CALLCONV_VARARG`, which should not be confused with `IMAGE_CEE_CS_CALLCONV_NATIVEVARARG`, which really is exactly native varargs (no cookie) and should only appear in PInvoke IL stubs, which properly handle pinning and other GC magic.
+
+On AMD64, just like native, any floating point arguments passed in floating point registers (including the fixed arguments) will be shadowed (i.e. duplicated) in the integer registers.
+
+On ARM and ARM64, just like native, nothing is put in the floating point registers.
+
+However, unlike native varargs, all floating point arguments are not promoted to double (`R8`), and instead retain their original type (`R4` or `R8`) (although this does not preclude an IL generator like managed C++ from explicitly injecting an upcast at the call-site and adjusting the call-site-sig appropriately). This leads to unexpected behavior when native C++ is ported to C# or even just managed via the different flavors of managed C++.
+
+Managed varargs are not supported in .NET Core.
+
+## Generics
+
+*Shared generics*. In cases where the code address does not uniquely identify a generic instantiation of a method, then a 'generic instantiation parameter' is required. Often the "this" pointer can serve dual-purpose as the instantiation parameter. When the "this" pointer is not the generic parameter, the generic parameter is passed as the next argument (after the optional return buffer and the optional "this" pointer, but before any user arguments). For generic methods (where there is a type parameter directly on the method, as compared to the type), the generic parameter currently is a MethodDesc pointer (I believe an InstantiatedMethodDesc). For static methods (where there is no "this" pointer) the generic parameter is a MethodTable pointer/TypeHandle.
+
+Sometimes the VM asks the JIT to report and keep alive the generics parameter. In this case, it must be saved on the stack someplace and kept alive via normal GC reporting (if it was the "this" pointer, as compared to a MethodDesc or MethodTable) for the entire method except the prolog and epilog. Also note that the code to home it, must be in the range of code reported as the prolog in the GC info (which probably isn't the same as the range of code reported as the prolog in the unwind info).
+
+There is no defined/enforced/declared ordering between the generic parameter and the varargs cookie because the runtime does not support that combination. There are chunks of code in the VM and JITs that would appear to support that, but other places assert and disallow it, so nothing is tested, and I would assume there are bugs and differences (i.e. one JIT using a different ordering than the other JIT or the VM).
+
+### Example
+```
+call(["this" pointer] [return buffer pointer] [generics context|varargs cookie] [userargs]*)
+```
+
+## AMD64-only: by-value value types
+
+Just like native, AMD64 has implicit-byrefs. Any structure (value type in IL parlance) that is not 1, 2, 4, or 8 bytes in size (i.e., 3, 5, 6, 7, or >= 9 bytes in size) that is declared to be passed by value, is instead passed by reference. For JIT generated code, it follows the native ABI where the passed-in reference is a pointer to a compiler generated temp local on the stack. However, there are some cases within remoting or reflection where apparently stackalloc is too hard, and so they pass in pointers within the GC heap, thus the JITed code must report these implicit byref parameters as interior pointers (BYREFs in JIT parlance), in case the callee is one of these reflection paths. Similarly, all writes must use checked write barriers.
+
+The AMD64 native calling conventions (Windows 64 and System V) require return buffer address to be returned by callee in RAX. JIT also follows this rule.
+
+## Return buffers
+
+The same applies to some return buffers. See `MethodTable::IsStructRequiringStackAllocRetBuf()`. When that returns false, the return buffer might be on the heap, either due to reflection/remoting code paths mentioned previously or due to a JIT optimization where a call with a return buffer that then assigns to a field (on the GC heap) are changed into passing the field reference as the return buffer. Conversely, when it returns true, the JIT does not need to use a write barrier when storing to the return buffer, but it is still not guaranteed to be a compiler temp, and as such the JIT should not introduce spurious writes to the return buffer.
+
+NOTE: This optimization is now disabled for all platforms (`IsStructRequiringStackAllocRetBuf()` always returns FALSE).
+
+## Hidden parameters
+
+*Stub dispatch* - when a virtual call uses a VSD stub, rather than back-patching the calling code (or disassembling it), the JIT must place the address of the stub used to load the call target, the "stub indirection cell", in (x86) `EAX` / (AMD64) `R11` / (ARM) `R4` / (ARM64) `R11`. In the JIT, this is `REG_VIRTUAL_STUB_PARAM`.
+
+AMD64-only: Fast Pinvoke - The VM wants a conservative estimate of the size of the stack arguments placed in `R11`. (This is consumed by callout stubs used in SQL hosting).
+
+*Calli Pinvoke* - The VM wants the address of the PInvoke in (AMD64) `R10` / (ARM) `R12` / (ARM64) `R14` (In the JIT: `REG_PINVOKE_TARGET_PARAM`), and the signature (the pinvoke cookie) in (AMD64) `R11` / (ARM) `R4` / (ARM64) `R15` (in the JIT: `REG_PINVOKE_COOKIE_PARAM`).
+
+*Normal PInvoke* - The VM shares IL stubs based on signatures, but wants the right method to show up in call stack and exceptions, so the MethodDesc for the exact PInvoke is passed in the (x86) `EAX` / (AMD64) `R10` / (ARM, ARM64) `R12` (in the JIT: `REG_SECRET_STUB_PARAM`). Then in the IL stub, when the JIT gets `CORJIT_FLG_PUBLISH_SECRET_PARAM`, it must move the register into a compiler temp. The value is returned for the intrinsic `CORINFO_INTRINSIC_StubHelpers_GetStubContext`, and the address of that location is returned for `CORINFO_INTRINSIC_StubHelpers_GetStubContextAddr`.
+
+# PInvokes
+
+The convention is that any method with an InlinedCallFrame (either an IL stub or a normal method with an inlined pinvoke) saves/restores all non-volatile integer registers in its prolog/epilog respectively. This is done so that the InlinedCallFrame can just contain a return address, a stack pointer and a frame pointer. Then using just those three it can start a full stack walk using the normal RtlVirtualUnwind.
+
+For AMD64, a method with an InlinedCallFrame must use RBP as the frame register.
+
+For ARM and ARM64, we will also always use a frame pointer (R11). That is partially due to the frame chaining requirement. However, the VM also requires it for PInvokes with InlinedCallFrames.
+
+For ARM, the VM also has a dependency on `REG_SAVED_LOCALLOC_SP`.
+
+All these dependencies show up in the implementation of `InlinedCallFrame::UpdateRegDisplay`.
+
+JIT32 only generates one epilog (and causes all returns to branch to it) when there are PInvokes/InlinedCallFrame in the current method.
+
+## Per-frame PInvoke initialization
+
+The InlinedCallFrame is initialized once at the head of IL stubs and once in each path that does an inlined PInvoke.
+
+In JIT64 this happens in blocks that actually contain calls, but pushing it out of loops that have landing pads, and then looking for dominator blocks. For IL stubs and methods with EH, we give up and place the initialization in the first block.
+
+In RyuJIT/JIT32 (ARM), all methods are treated like JIT64's IL stubs (meaning the per-frame initialization happens once just after the prolog).
+
+The JIT generates a call to `CORINFO_HELP_INIT_PINVOKE_FRAME` passing the address of the InlinedCallFrame and either NULL or the secret parameter for IL stubs. `JIT_InitPInvokeFrame` initializes the InlinedCallFrame and sets it to point to the current Frame chain top. Then it returns the current thread's native Thread object.
+
+On AMD64, the JIT generates code to save RSP and RBP into the InlinedCallFrame.
+
+For IL stubs only, the per-frame initialization includes setting `Thread->m_pFrame` to the InlinedCallFrame (effectively 'pushing' the Frame).
+
+## Per-call-site PInvoke work
+
+1. For direct calls, the JITed code sets `InlinedCallFrame->m_pDatum` to the MethodDesc of the call target.
+ * For JIT64, indirect calls within IL stubs sets it to the secret parameter (this seems redundant, but it might have changed since the per-frame initialization?).
+ * For JIT32 (ARM) indirect calls, it sets this member to the size of the pushed arguments, according to the comments. The implementation however always passed 0.
+2. For JIT64/AMD64 only: Next for non-IL stubs, the InlinedCallFrame is 'pushed' by setting `Thread->m_pFrame` to point to the InlinedCallFrame (recall that the per-frame initialization already set `InlinedCallFrame->m_pNext` to point to the previous top). For IL stubs this step is accomplished in the per-frame initialization.
+3. The Frame is made active by setting `InlinedCallFrame->m_pCallerReturnAddress`.
+4. The code then toggles the GC mode by setting `Thread->m_fPreemptiveGCDisabled = 0`.
+5. Starting now, no GC pointers may be live in registers.
+6. Then comes the actual call/PInvoke.
+7. The GC mode is set back by setting `Thread->m_fPreemptiveGCDisabled = 1`.
+8. Then we check to see if `g_TrapReturningThreads` is set (non-zero). If it is, we call `CORINFO_HELP_STOP_FOR_GC`.
+ * For ARM, this helper call preserves the return register(s): `R0`, `R1`, `S0`, and `D0`.
+ * For AMD64, the generated code must manually preserve the return value of the PInvoke by moving it to a non-volatile register or a stack location.
+9. Starting now, GC pointers may once again be live in registers.
+10. Clear the `InlinedCallFrame->m_pCallerReturnAddress` back to 0.
+11. For JIT64/AMD64 only: For non-IL stubs 'pop' the Frame chain by resetting `Thread->m_pFrame` back to `InlinedCallFrame.m_pNext`.
+
+Saving/restoring all the non-volatile registers helps by preventing any registers that are unused in the current frame from accidentally having a live GC pointer value from a parent frame. The argument and return registers are 'safe' because they cannot be GC refs. Any refs should have been pinned elsewhere and instead passed as native pointers.
+
+For IL stubs, the Frame chain isn't popped at the call site, so instead it must be popped right before the epilog and right before any jmp calls. It looks like we do not support tail calls from PInvoke IL stubs?
+
+# Exception handling
+
+This section describes the conventions the JIT needs to follow when generating code to implement managed exception handling (EH). The JIT and VM must agree on these conventions for a correct implementation.
+
+## Funclets
+
+For non-x86 platforms, all managed EH handlers (finally, fault, filter, filter-handler, and catch) are extracted into their own 'funclets'. To the OS they are treated just like first class functions (separate PDATA and XDATA (`RUNTIME_FUNCTION` entry), etc.). The CLR currently treats them just like part of the parent function in many ways. The main function and all funclets must be allocated in a single code allocation (see hot cold splitting). They 'share' GC info. Only the main function prolog can be hot patched.
+
+The only way to enter a handler funclet is via a call. In the case of an exception, the call is from the VM's EH subsystem as part of exception dispatch/unwind. In the non-exceptional case, this is called local unwind or a non-local exit. In C# this is accomplished by simply falling-through/out of a try body or an explicit goto. In IL this is always accomplished via a LEAVE opcode, within a try body, targeting an IL offset outside the try body. In such cases the call is from the JITed code of the parent function.
+
+For x86, all handlers are generated within the method body, typically in lexical order. A nested try/catch is generated completely within the EH region in which it is nested. These handlers are essentially "in-line funclets", but they do not look like normal functions: they do not have a normal prolog or epilog, although they do have special entry/exit and register conventions. Also, nested handlers are not un-nested as for funclets: the code for a nested handler is generated within the handler in which it is nested.
+
+## Cloned finallys
+
+JIT64 attempts to speed the normal control flow by 'inlining' a called finally along the 'normal' control flow (i.e., leaving a try body in a non-exceptional manner via C# fall-through). Because the VM semantics for non-rude Thread.Abort dictate that handlers will not be aborted, the JIT must mark these 'inlined' finally bodies. These show up as special entries at the end of the EH tables and are marked with `COR_ILEXCEPTION_CLAUSE_FINALLY | COR_ILEXCEPTION_CLAUSE_DUPLICATED`, and the try_start, try_end, and handler_start are all the same: the start of the cloned finally.
+
+JIT32 and RyuJIT currently do not implement finally cloning.
+
+## Invoking Finallys/Non-local exits
+
+In order to have proper forward progress and `Thread.Abort` semantics, there are restrictions on where a call-to-finally can be, and what the call site must look like. The return address can **NOT** be in the corresponding try body (otherwise the VM would think the finally protects itself). The return address **MUST** be within any outer protected region (so exceptions from the finally body are properly handled).
+
+JIT64, and RyuJIT for AMD64 and ARM64, creates something similar to a jump island: a block of code outside the try body that calls the finally and then branches to the final target of the leave/non-local-exit. This jump island is then marked in the EH tables as if it were a cloned finally. The cloned finally clause prevents a Thread.Abort from firing before entering the handler. By having the return address outside of the try body we satisfy the other constraint.
+
+Note that ARM solves this by not using a call (bl) instruction and instead explicitly places a return address in `LR` and then jumps to the finally. We have not yet implemented this for AMD64 because it might mess up the call-return predictor on the CPU. (So far performance data on ARM indicates they don't have an issue).
+
+## ThreadAbortException considerations
+
+There are three kinds of thread abort: (1) rude thread abort, that cannot be stopped, and doesn't run (all?) handlers, (2) calls to the `Thread.Abort()` api, and (3) asynchronous thread abort, injected from another thread.
+
+Note that ThreadAbortException is fully available in the desktop framework, and is heavily used in ASP.NET, for example. However, it is not supported in .NET Core, CoreCLR, or the Windows 8 "modern app profile". Nonetheless, the JIT generates ThreadAbort-compatible code on all platforms.
+
+For non-rude thread abort, the VM walks the stack, running any catch handler that catches ThreadAbortException (or a parent, like System.Exception, or System.Object), and running finallys. There is one very particular characteristic of ThreadAbortException: if a catch handler has caught ThreadAbortException, and the handler returns from handling the exception without calling Thread.ResetAbort(), then the VM *automatically re-raises ThreadAbortException*. To do so, it uses the resume address that the catch handler returned as the effective address where the re-raise is considered to have been raised. This is the address of the label that is specified by a LEAVE opcode within the catch handler. There are cases where the JIT must insert synthetic "step blocks" such that this label is within an appropriate enclosing "try" region, to ensure that the re-raise can be caught by an enclosing catch handler.
+
+For example:
+
+```
+try { // try 1
+ try { // try 2
+ System.Threading.Thread.CurrentThread.Abort();
+ } catch (System.Threading.ThreadAbortException) { // catch 2
+ ...
+ LEAVE L;
+ }
+} catch (System.Exception) { // catch 1
+ ...
+}
+L:
+```
+
+In this case, if the address returned in catch 2 corresponding to label L is outside try 1, then the ThreadAbortException re-raised by the VM will not be caught by catch 1, as is expected. The JIT needs to insert a block such that this is the effective code generation:
+
+```
+try { // try 1
+ try { // try 2
+ System.Threading.Thread.CurrentThread.Abort();
+ } catch (System.Threading.ThreadAbortException) { // catch 2
+ ...
+ LEAVE L';
+ }
+ L': LEAVE L;
+} catch (System.Exception) { // catch 1
+ ...
+}
+L:
+```
+
+Similarly, the automatic re-raise address for a ThreadAbortException can't be within a finally handler, or the VM will abort the re-raise and swallow the exception. This can happen due to call-to-finally thunks marked as "cloned finally", as described above. For example (this is pseudo-assembly-code, not C#):
+
+```
+try { // try 1
+ try { // try 2
+ System.Threading.Thread.CurrentThread.Abort();
+ } catch (System.Threading.ThreadAbortException) { // catch 2
+ ...
+ LEAVE L;
+ }
+} finally { // finally 1
+ ...
+}
+L:
+```
+
+This would generate something like:
+
+```
+ // beginning of 'try 1'
+ // beginning of 'try 2'
+ System.Threading.Thread.CurrentThread.Abort();
+ // end of 'try 2'
+ // beginning of call-to-finally 'cloned finally' region
+L1: call finally1
+ nop
+ // end of call-to-finally 'cloned finally' region
+ // end of 'try 1'
+ // function epilog
+ ret
+
+Catch2:
+ // do something
+ lea rax, &L1; // load up resume address
+ ret
+
+Finally1:
+ // do something
+ ret
+```
+
+Note that the JIT must already insert a "step" block so the finally will be called. However, this isn't sufficient to support ThreadAbortException processing, because "L1" is marked as "cloned finally". In this case, the JIT must insert another step block that is within "try 1" but outside the cloned finally block, that will allow for correct re-raise semantics. For example:
+
+```
+ // beginning of 'try 1'
+ // beginning of 'try 2'
+ System.Threading.Thread.CurrentThread.Abort();
+ // end of 'try 2'
+L1': nop
+ // beginning of call-to-finally 'cloned finally' region
+L1: call finally1
+ nop
+ // end of call-to-finally 'cloned finally' region
+ // end of 'try 1'
+ // function epilog
+ ret
+
+Catch2:
+ // do something
+ lea rax, &L1'; // load up resume address
+ ret
+
+Finally1:
+ // do something
+ ret
+```
+
+Note that JIT64 does not implement this properly. The C# compiler used to always insert all necessary "step" blocks. The Roslyn C# compiler at one point did not, but then was change to once again insert them.
+
+## The PSPSym and funclet parameters
+
+The name *PSPSym* stands for Previous Stack Pointer Symbol. It is how a funclet accesses locals from the main function body. This is not used for x86: the frame pointer on x86 is always preserved when the handlers are invoked.
+
+First, two definitions.
+
+*Caller-SP* is the value of the stack pointer in a function's caller before the call instruction is executed. That is, when function A calls function B, Caller-SP for B is the value of the stack pointer immediately before the call instruction in A (calling B) was executed. Note that this definition holds for both AMD64, which pushes the return value when a call instruction is executed, and for ARM, which doesn't. For AMD64, Caller-SP is the address above the call return address.
+
+*Initial-SP* is the initial value of the stack pointer after the fixed-size portion of the frame has been allocated. That is, before any "alloca"-type allocations.
+
+The PSPSym is a pointer-sized local variable in the frame of the main function and of each funclet. The value stored in PSPSym is the value of Initial-SP for AMD64 or Caller-SP for other platforms, for the main function. The stack offset of the PSPSym is reported to the VM in the GC information header. The value reported in the GC information is the offset of the PSPSym from Initial-SP for AMD64 or Caller-SP for other platforms. (Note that both the value stored, and the way the value is reported to the VM, differs between architectures. In particular, note that most things in the GC information header are reported as offsets relative to Caller-SP, but PSPSym on AMD64 is one exception, and maybe the only exception.)
+
+The VM uses the PSPSym to find other locals it cares about (such as the generics context in a funclet frame). The JIT uses it to re-establish the frame pointer register, so that the frame pointer is the same value in a funclet as it is in the main function body.
+
+When a funclet is called, it is passed the *Establisher Frame Pointer*. For AMD64 this is true for all funclets and it is passed as the first argument in RCX, but for ARM and ARM64 this is only true for first pass funclets (currently just filters) and it is passed as the second argument in R1. The Establisher Frame Pointer is a stack pointer of an interesting "parent" frame in the exception processing system. For the CLR, it points either to the main function frame or a dynamically enclosing funclet frame from the same function, for the funclet being invoked. The value of the Establisher Frame Pointer is Initial-SP on AMD64, Caller-SP on ARM and ARM64.
+
+Using the establisher frame, the funclet wants to load the value of the PSPSym. Since we don't know if the Establisher Frame is from the main function or a funclet, we design the main function and funclet frame layouts to place the PSPSym at an identical, small, constant offset from the Establisher Frame in each case. (This is also required because we only report a single offset to the PSPSym in the GC information, and that offset must be valid for the main function and all of its funclets). Then, the funclet uses this known offset to compute the PSPSym address and read its value. From this, it can compute the value of the frame pointer (which is a constant offset from the PSPSym value) and set the frame register to be the same as the parent function. Also, the funclet writes the value of the PSPSym to its own frame's PSPSym. This "copying" of the PSPSym happens for every funclet invocation, in particular, for every nested funclet invocation.
+
+On ARM and ARM64, for all second pass funclets (finally, fault, catch, and filter-handler) the VM restores all non-volatile registers to their values within the parent frame. This includes the frame register (`R11`). Thus, the PSPSym is not used to recompute the frame pointer register in this case, though the PSPSym is copied to the funclet's frame, as for all funclets.
+
+Catch, Filter, and Filter-handlers also get an Exception object (GC ref) as an argument (`REG_EXCEPTION_OBJECT`). On AMD64 it is the second argument and thus passed in RDX. On ARM and ARM64 this is the first argument and passed in R0.
+
+(Note that the JIT64 source code contains a comment that says, "The current CLR doesn't always pass the correct establisher frame to the funclet. Funclet may receive establisher frame of funclet when expecting that of original routine." It indicates this is the reason that a PSPSym is required in all funclets as well as the main function, whereas if the establisher frame was correctly reported, the PSPSym could be omitted in some cases.)
+
+## Funclet Return Values
+
+The filter funclet returns a simple boolean value in the normal return register (x86: `EAX`, AMD64: `RAX`, ARM/ARM64: `R0`). Non-zero indicates to the VM/EH subsystem that the corresponding filter-handler will handle the exception (i.e. begin the second pass). Zero indicates to the VM/EH subsystem that the exception is **not** handled, and it should continue looking for another filter or catch.
+
+The catch and filter-handler funclets return a code address in the normal return register that indicates where the VM should resume execution after unwinding the stack and cleaning up from the exception. This address should be somewhere in the parent funclet (or main function if the catch or filter-handler is not nested within any other funclet). Because an IL 'leave' opcode can exit out of arbitrary nesting of funclets and try bodies, the JIT is often required to inject step blocks. These are intermediate branch target(s) that then branch to the next outermost target until the real target can be directly reached via the native ABI constraints. These step blocks can also invoke finallys (see *Invoking Finallys/Non-local exits*).
+
+Finally and fault funclets do not have a return value.
+
+## Register values and exception handling
+
+Exception handling imposes certain restrictions on the usage of registers in functions with exception handling.
+
+CoreCLR and "desktop" CLR behave the same way. Windows and non-Windows implementations of the CLR both follow these rules.
+
+Some definitions:
+
+*Non-volatile* (aka *callee-saved* or *preserved*) registers are those defined by the ABI that a function call preserves. Non-volatile registers include the frame pointer and the stack pointer, among others.
+
+*Volatile* (aka *caller-saved* or *trashed*) registers are those defined by the ABI that a function call does not preserve, and thus might have a different value when the function returns.
+
+### Registers on entry to a funclet
+
+When an exception occurs, the VM is invoked to do some processing. If the exception is within a "try" region, it eventually calls a corresponding handler (which also includes calling filters). The exception location within a function might be where a "throw" instruction executes, the point of a processor exception like null pointer dereference or divide by zero, or the point of a call where the callee threw an exception but did not catch it.
+
+On AMD64, all register values that existed at the exception point in the corresponding "try" region are trashed on entry to the funclet. That is, the only registers that have known values are those of the funclet parameters.
+
+On ARM and ARM64, all registers are restored to their values at the exception point.
+
+On x86: TBD.
+
+### Registers on return from a funclet
+
+When a funclet finishes execution, and the VM returns execution to the function (or an enclosing funclet, if there is EH clause nesting), the non-volatile registers are restored to the values they held at the exception point. Note that the volatile registers have been trashed.
+
+Any register value changes made in the funclet are lost. If a funclet wants to make a variable change known to the main function (or the funclet that contains the "try" region), that variable change needs to be made to the shared main function stack frame.
+
+## x86 EH considerations
+
+The x86 model is somewhat different than the non-x86 model. X86-specific concerns are mentioned here.
+
+### catch / filter-handler regions
+
+When leaving a `catch` or `filter-handler` region, the JIT calls the helper `CORINFO_JIT_ENDCATCH` (implemented in the VM by the `JIT_EndCatch` function) before transferring control to the target location. The code to call to `CORINFO_JIT_ENDCATCH` is within the catch region itself.
+
+### finally / fault regions
+
+"finally" clauses are invoked in the non-exceptional code by the generated JIT code, and in the exceptional case by the VM. "fault" clauses are only executed in exceptional cases by the VM.
+
+On entry to the finally or fault, the top of the stack is the address that should be jumped to on exit from the finally, using a "pop eax; jmp eax" sequence. A simple 'ret' could be used, but we avoid it to avoid potentially creating an unbalanced processor call/ret buffer stack, and messing up call/ret prediction.
+
+There are no register or other stack arguments to a 'finally' or 'fault'.
+
+### ShadowSP slots
+
+X86 exception handlers (e.g., catch, finally) do not establish their own frames. They don't (really) have prologs and epilogs. However, they do use the stack, and need to restore the stack pointer of the enclosing exception handling region when the handler completes executing.
+
+To implement this requirement, for any function with EH, we create a frame-local variable to store a stack of "Shadow SP" values, or ShadowSP slots. In the JIT, the local var is called lvaShadowSPslotsVar, and in dumps it is called "EHSlots". The variable is created in lvaMarkLocalVars() and is sized as follows:
+1. 1 slot is reserved for the VM (for ICodeManager::FixContext(ppEndRegion)).
+2. 1 slot for each handler nesting level (total: ehMaxHndNestingCount).
+3. 1 slot for a filter (we do this even if there aren't any filters; size optimization opportunity to not do this if there are no filters?)
+4. 1 slot for zero termination
+
+Note that the since a slot on x86 is 4 bytes, the minimum size is 16 bytes. The idea is to have 1 slot for each handler that could be possibly be invoked at the same time. For example, for:
+
+```
+ try {
+ ...
+ } catch {
+ try {
+ ...
+ } catch {
+ ...
+ }
+ }
+```
+
+When the inner 'catch' is running, the outer 'catch' is also conceptually "on the stack", or in the middle of execution. So the maximum handler nesting count would be 2.
+
+The ShadowSP slots are filled in from the highest address downwards to the lowest address. The highest slot is reserved. The first address with a zero is a zero terminator. So, we always zero terminate by setting the second-to-highest slot to zero in the function prolog (if we didn't zero initialize all locals anyway).
+
+When calling a finally, we set the appropriate level to 0xFC (aka "finally call") and zero terminate the next-lower address.
+
+Thus, calling a finally from JIT generated code looks like:
+
+```
+ mov dword ptr [L_02+0x4 ebp-10H], 0 // This must happen before the 0xFC is written
+ mov dword ptr [L_02+0x8 ebp-0CH], 252 // 0xFC
+ push G_M52300_IG07
+ jmp SHORT G_M52300_IG04
+```
+
+In this case, `G_M52300_IG07` is not the address after the 'jmp', so a simple 'call' wouldn't work.
+
+The code this finally returns to looks like this:
+
+```
+ mov dword ptr [L_02+0x8 ebp-0CH], 0
+ jmp SHORT G_M52300_IG05
+```
+
+In this case, it zeros out the ShadowSP slot that it previously set to 0xFC, then jumps to the address that is the actual target of the leave from the finally.
+
+The JIT does this "end finally restore" by creating a GT_END_LFIN tree node, with the appropriate stack level as an operand, that generates this code.
+
+In the case of an exceptional 'finally' invocation, the VM sets up the 'return address' to whatever address it wants the JIT to return to.
+
+For catch handlers, the VM is completely in control of filling and reading the ShadowSP slots; the JIT just makes sure there is enough space.
+
+### ShadowSP slots frame location
+
+The ShadowSP slots are required to live in a very particular location, reported via the GC info header. Note that the GC info header does not contain an actual pointer or offset to the ShadowSP slots variable. Instead, the VM calculates the location from other data that does exist in the GC info header, as a negative offset from the EBP frame pointer (which must be established in functions with EH) using the function `GetFirstBaseSPslotPtr()` / `GetStartShadowSPSlotsOffset()`. The VM thus assumes the following frame layout:
+
+1. callee-saved registers <= EBP points to the top of this range
+2. GS cookie
+3. 1 slot if localloc is used (Saved localloc SP?)
+4. 1 slot for CORINFO_GENERICS_CTXT_FROM_PARAMTYPEARG -- assumed for any function with EH, to avoid adding a flag to the GC info about whether it exists or not.
+5. ShadowSP slots
+
+(note, these don't have to be in this order for this calculation, but they possibly do need to be in this order for other calculations.) See also `GetEndShadowSPSlotsOffset()`.
+
+The VM walks the ShadowSP slots in the function `GetHandlerFrameInfo()`, and sets it in various functions such as `EECodeManager::FixContext()`.
+
+### JIT implementation: finally
+
+An aside on the JIT implementation for x86.
+
+The JIT creates BBJ_CALLFINALLY/BBJ_ALWAYS pairs for calling the 'finally' clause. The BBJ_CALLFINALLY block will have a series of CORINFO_JIT_ENDCATCH calls appended at the end, if we need to "leave" a series of nested catches before calling the finally handler (due to a single 'leave' opcode attempting to leave multiple levels of different types of handlers). Then, a GT_END_LFIN statement with the finally clause handler nesting level as an argument is added to the step block where the finally returns to. This is used to generate code to zero out the appropriate level of the ShadowSP slot array after the finally has been executed. The BBJ_CALLFINALLY block itself generates the code to insert the 0xFC value into the ShadowSP slot array. If the 'finally' is invoked by the VM, in exceptional cases, then the VM itself updates the ShadowSP slot array before invoking the 'finally'.
+
+At the end of a finally or filter, a GT_RETFILT is inserted. For a finally, this is a TYP_VOID which is just a placeholder. For a filter, it takes an argument which evaluates to the return value from the filter. On legacy JIT, this tree triggers the generation of both the return value load (for filters) and the "funclet" exit sequence, which is either a "pop eax; jmp eax" for a finally, or a "ret" for a filter. When processing the BBJ_EHFINALLYRET or BBJ_EHFILTERRET block itself (at the end of code generation for the block), nothing is generated. In RyuJIT, the GT_RETFILT only loads up the return value (for filters) and does nothing for finally, and the block type processing after all the tree processing triggers the exit sequence to be generated. There is no real difference between these, except to centralize all "exit sequence" generation in the same place.
+
+# EH Info, GC Info, and Hot & Cold Splitting
+
+All GC info offsets and EH info offsets treat the function and funclets as if it was one big method body. Thus all offsets are relative to the start of the main method. Funclets are assumed to always be at the end of (after) all of the main function code. Thus if the main function has any cold code, all funclets must be cold. Or conversely, if there is any hot funclet code, all of the main method must be hot.
+
+## EH clause ordering
+
+EH clauses must be sorted inner-to-outer, first-to-last based on IL offset of the try start/try end pair. The only exceptions are cloned finallys, which always appear at the end.
+
+## How EH affects GC info/reporting
+
+Because a main function body will **always** be on the stack when one of its funclets is on the stack, the GC info must be careful not to double-report. JIT64 accomplished this by having all named locals appear in the parent method frame, anything shared between the function and funclets was homed to the stack, and only the parent function reported stack locals (funclets might report local registers). JIT32 and RyuJIT (for AMD64, ARM, and ARM64) take the opposite direction. The leaf-most funclet is responsible for reporting everything that might be live out of a funclet (in the case of a filter, this might resume back in the original method body). This is accomplished with the GC header flag WantsReportOnlyLeaf (JIT32 and RyuJIT set it, JIT64 doesn't) and the VM tracking if it has already seen a funclet for a given frame. Once JIT64 is fully retired, we should be able to remove this flag from GC info.
+
+There is one "corner case" in the VM implementation of WantsReportOnlyLeaf model that has implications for the code the JIT is allowed to generate. Consider this function with nested exception handling:
+
+```
+public void runtest() {
+ try {
+ try {
+ throw new UserException3(ThreadId); // 1
+ }
+ catch (UserException3 e){
+ Console.WriteLine("Exception3 was caught");
+ throw new UserException4(ThreadId);
+ }
+ }
+ catch (UserException4 e) { // 2
+ Console.WriteLine("Exception4 was caught");
+ }
+}
+```
+
+When the inner "throw new UserException4" is executed, the exception handling first pass finds that the outer catch handler will handle the exception. The exception handling second pass unwinds stack frames back to the "runtest" frame, and then executes the catch handler. There is a period of time during which the original catch handler ("catch (UserException3 e)") is no longer on the stack, but before the new catch handler is executed. During this time, a GC might occur. In this case, the VM needs to make sure to report GC roots properly for the "runtest" function. The inner catch has been unwound, so we can't report that. We don't want to report at "// 1", which is still on the stack, because that effectively is "going backwards" in execution, and doesn't properly represent what object references are live. We need to report live object references at the next location where execution will occur. This is the "// 2" location. However, we can't report the first location of the catch funclet, as that will be non-interruptible. The VM instead looks forward for the first interruptible point in that handler, and reports live references that the JIT reports for that location. This will be the first location after the handler prolog. There are several implications of this implementation for the JIT. It requires that:
+
+1. Methods which have EH clauses are fully interruptible.
+2. All catch funclets have an interruptible point immediately after the prolog.
+3. The first interruptible point in the catch funclet reports the following live objects on the stack
+ * Only objects that are shared with parent method i.e. no additional stack object which is live only in catch funclet and not live in parent method.
+ * All shared objects which are referenced in catch funclet and any subsequent control flow are reported live.
+
+## Filter GC semantics
+
+Filters are invoked in the 1st pass of EH processing and as such execution might resume back at the faulting address, or in the filter-handler, or someplace else. Because the VM must allow GC's to occur during and after a filter invocation, but before the EH subsystem knows where it will resume, we need to keep everything alive at both the faulting address **and** within the filter. This is accomplished by 3 means: (1) the VM's stackwalker and GCInfoDecoder report as live both the filter frame and its corresponding parent frame, (2) the JIT encodes all stack slots that are live within the filter as being pinned, and (3) the JIT reports as live (and possible zero-initializes) anything live-out of the filter. Because of (1) it is likely that a stack variable that is live within the filter and the try body will be double reported. During the mark phase of the GC double reporting is not a problem. The problem only arises if the object is relocated: if the same location is reported twice, the GC will try to relocate the address stored at that location twice. Thus we prevent the object from being relocated by pinning it, which leads us to why we must do (2). (3) is done so that after the filter returns, we can still safely incur a GC before executing the filter-handler or any outer handler within the same frame.
+
+## Duplicated Clauses
+
+Duplicated clauses are a special set of entries in the EH tables to assist the VM. Specifically, if handler 'A' is also protected by an outer EH clause 'B', then the JIT must emit a duplicated clause, a duplicate of 'B', that marks the whole handler 'A' (which is now lexically disjoint for the range of code for the corresponding try body 'A') as being protected by the handler for 'B'.
+
+Duplicated clauses are not needed for x86.
+
+During exception dispatch the VM uses these duplicated clauses to know when to skip any frames between the handler and its parent function. After skipping to the parent function, due to a duplicated clause, the VM searches for a regular/non-duplicate clause in the parent function. The order of duplicated clauses is important. They should appear after all of the main function clauses. They should still follow the normal sorting rules (inner-to-outer, top-to-bottom), but because the try-start/try-end will all be the same for a given handler, they should maintain the ordering, regarding inner-to-outer, as the corresponding original clause.
+
+Example:
+
+```
+A: try {
+B: ...
+C: try {
+D: ...
+E: try {
+F: ...
+G: }
+H: catch {
+I: ...
+J: }
+K: ...
+L: }
+M: finally {
+N: ...
+O: }
+P: ...
+Q: }
+R: catch {
+S: ...
+T: }
+```
+
+In MSIL this would generate 3 EH clauses:
+
+```
+.try E-G catch H-J
+.try C-L finally M-O
+.try A-Q catch R-T
+```
+
+The native code would be laid out as follows (the order of the handlers is irrelevant except they are after the main method body) with their corresponding (fake) native offsets:
+
+```
+A: -> 1
+B: -> 2
+C: -> 3
+D: -> 4
+E: -> 5
+F: -> 6
+G: -> 7
+K: -> 8
+L: -> 9
+P: -> 10
+Q: -> 11
+H: -> 12
+I: -> 13
+J: -> 14
+M: -> 15
+N: -> 16
+O: -> 17
+R: -> 18
+S: -> 19
+T: -> 20
+```
+
+The native EH clauses would be listed as follows:
+
+```
+1. .try 5-7 catch 12-14 (top-most & inner-most first)
+2. .try 3-9 finally 15-17 (top-most & next inner-most)
+3. .try 1-11 catch 18-20 (top-most & outer-most)
+4. .try 12-14 finally 15-17 duplicated (inner-most because clause 2 is inside clause 3, top-most because handler H-J is first)
+5. .try 12-14 catch 18-20 duplicated
+6. .try 15-17 catch 18-20
+```
+
+If the handlers were in a different order, then clause 6 might appear before clauses 4 and 5, but never in between.
+
+## GC Interruptibility and EH
+
+The VM assumes that anytime a thread is stopped, it must be at a GC safe point, or the current frame is non-resumable (i.e. a throw that will never be caught in the same frame). Thus effectively all methods with EH must be fully interruptible (or at a minimum all try bodies). Currently the GC info appears to support mixing of partially interruptible and fully-interruptible regions within the same method, but no JIT uses this, so use at your own risk.
+
+The debugger always wants to stop at GC safe points, and thus debuggable code should be fully interruptible to maximize the places where the debugger can safely stop. If the JIT creates non-interruptible regions within fully interruptible code, the code should ensure that each sequence point begins on an interruptible instruction.
+
+AMD64/JIT64 only: The JIT will add an interruptible NOP if needed.
+
+## Security Object
+
+The security object is a GC pointer and must be reported as such, and kept alive the duration of the method.
+
+## GS Cookie
+
+The GS Cookie is not a GC object, but still needs to be reported. It can only have one lifetime due to how it is encoded/reported in the GC info. Since the GS Cookie ceases being valid once we pop the stack, the epilog cannot be part of the live range. Since we only get one live range that means there cannot be any code (except funclets) after the epilog in methods with a GS cookie.
+
+## NOPs and other Padding
+
+### AMD64 padding info
+
+The unwind callbacks don't know if the current frame is a leaf or a return address. Consequently, the JIT must ensure that the return address of a call is in the same region as the call. Specifically, the JIT must add a NOP (or some other instruction) after any call that otherwise would directly precede the start of a try body, the end of a try body, or the end of a method.
+
+The OS has an optimization in the unwinder such that if an unwind results in a PC being within (or at the start of) an epilog, it assumes that frame is unimportant and unwinds again. Since the CLR considers every frame important, it does not want this double-unwind behavior and requires the JIT to place a NOP (or other instruction) between the any call and any epilog.
+
+### ARM and ARM64 padding info
+
+The OS unwinder uses the `RUNTIME_FUNCTION` extents to determine which function or funclet to unwind out of. The net result is that a call (bl opcode) to `IL_Throw` cannot be the last thing. So similar to AMD64 the JIT must inject an opcode (a breakpoint in this case) when the `bl IL_Throw` would otherwise be the last opcode of a function or funclet, the last opcode before the end of the hot section, or (this might be an x86-ism leaking into ARM) the last before a "special throw block".
+
+The CLR unwinder assumes any non-leaf frame was unwound as a result of a call. This is mostly (always?) true except for non-exceptional finally invocations. For those cases, the JIT must place a 2 byte NOP **before** the address set as the finally return address (in the LR register, before jumping to the finally). I believe this is only needed if the preceding 2 bytes would have otherwise been in a different region (i.e. the end or start of a try body, etc.), but currently the JIT always emits the NOP. This is because the stack walker looks at the return address, subtracts 2, and uses that as the PC for the next step of stack walking. Note that the inserted NOP must have correct GC information.
+
+# Profiler Hooks
+
+If the JIT gets passed `CORJIT_FLG_PROF_ENTERLEAVE`, then the JIT might need to insert native entry/exit/tail call probes. To determine for sure, the JIT must call GetProfilingHandle. This API returns as out parameters, the true dynamic boolean indicating if the JIT should actually insert the probes and a parameter to pass to the callbacks (typed as void*), with an optional indirection (used for NGEN). This parameter is always the first argument to all of the call-outs (thus placed in the usual first argument register `RCX` (AMD64) or `R0` (ARM, ARM64)).
+
+Outside of the prolog (in a GC interruptible location), the JIT injects a call to `CORINFO_HELP_PROF_FCN_ENTER`. For AMD64, all argument registers will be homed into their caller-allocated stack locations (similar to varargs). For ARM and ARM64, all arguments are prespilled (again similar to varargs).
+
+After computing the return value and storing it in the correct register, but before any epilog code (including before a possible GS cookie check), the JIT injects a call to `CORINFO_HELP_PROF_FCN_LEAVE`. For AMD64 this call must preserve the return register: `RAX` or `XMM0`. For ARM, the return value will be moved from `R0` to `R2` (if it was in `R0`), `R1`, `R2`, and `S0/D0` must be preserved by the callee (longs will be `R2`, `R1` - note the unusual ordering of the registers, floats in `S0`, doubles in `D0`, smaller integrals in `R2`).
+
+TODO: describe ARM64 profile leave conventions.
+
+Before the argument setup (but after any argument side-effects) for any tail calls or jump calls, the JIT injects a call to `CORINFO_HELP_PROF_FCN_TAILCALL`. Note that it is NOT called for self-recursive tail calls turned into loops.
+
+For ARM tail calls, the JIT actually loads the outgoing arguments first, and then just before the profiler call-out, spills the argument in `R0` to another non-volatile register, makes the call (passing the callback parameter in `R0`), and then restores `R0`.
+
+For AMD64, all probes receive a second parameter (passed in `RDX` according to the default argument rules) which is the address of the start of the arguments' home location (equivalent to the value of the caller's stack pointer).
+
+TODO: describe ARM64 tail call convention.
+
+JIT32 only generates one epilog (and causes all returns to branch to it) when there are profiler hooks.
+
+# Synchronized Methods
+
+JIT32/RyuJIT only generates one epilog (and causes all returns to branch to it) when a method is synchronized. See `Compiler::fgAddSyncMethodEnterExit()`. The user code is wrapped in a try/finally. Outside/before the try body, the code initializes a boolean to false. `CORINFO_HELP_MON_ENTER` or `CORINFO_HELP_MON_ENTER_STATIC` are called, passing the lock object (the "this" pointer for instance methods or the Type object for static methods) and the address of the boolean. If the lock is acquired, the boolean is set to true (as an 'atomic' operation in the sense that a Thread.Abort/EH/GC/etc. cannot interrupt the Thread when the boolean does not match the arquired state of the lock). JIT32/RyuJIT follows the exact same logic and arguments for placing the call to `CORINFO_HELP_MON_EXIT` / `CORINFO_HELP_MON_EXIT_STATIC` in the finally.
+
+# Rejit
+
+For AMD64 to support profiler attach scenarios, the JIT can be required to ensure every generated method is hot patchable (see `CORJIT_FLG_PROF_REJIT_NOPS`). The way we do this is to ensure that the first 5 bytes of code are non-interruptible and there is no branch target within those bytes (includes calls/returns). Thus the VM can stop all threads (like for a GC) and safely replace those 5 bytes with a branch to a new version of the method (presumably instrumented by a profiler). The JIT adds NOPs or increases the size of the prolog reported in the GC info to accomplish these 2 requirements.
+
+In a function with exception handling, only the main function is affected; the funclet prologs are not made hot patchable.
+
+# Edit and Continue
+
+Edit and Continue (EnC) is a special flavor of un-optimized code. The debugger has to be able to reliably remap a method state (instruction pointer and local variables) from original method code to edited method code. This puts constraints on the method stack layout performed by the JIT. The key constraint is that the addresses of the existing locals must stay the same after the edit. This constraint is required because the address of the local could have been stored in the method state.
+
+In the current design, the JIT does not have access to the previous versions of the method and so it has to assume the worst case. EnC is designed for simplicity, not for performance of the generated code.
+
+EnC is currently enabled on x86 and x64 only, but the same principles would apply if it is ever enabled on other platforms.
+
+The following sections describe the various Edit and Continue code conventions that must be followed.
+
+## EnC flag in GCInfo
+
+The JIT records the fact that it has followed conventions for EnC code in GC Info. On x64, this flag is implied by recording the size of the stack frame region preserved between EnC edits (`GcInfoEncoder::SetSizeOfEditAndContinuePreservedArea`). For normal methods on JIT64, the size of this region is 2 slots (saved `RBP` and return address). On RyuJIT/AMD64, the size of this region is increased to include `RSI` and `RDI`, so that `rep stos` can be used for block initialization and block moves.
+
+## Allocating local variables backward
+
+This is required to preserve addresses of the existing locals when an EnC edit appends new ones. In other words, the first local must be allocated at the highest stack address. Special care has to be taken to deal with alignment. The total size of the method frame can either grow (more locals added) or shrink (fewer temps needed) after the edit. The VM zeros out newly added locals.
+
+## Fixed set of callee-saved registers
+
+This eliminates need to deal with the different sets in the VM, and makes preservation of local addresses easier. On x64, we choose to always save `RBP` only. There are plenty of volatile registers and so lack of non-volatile registers does not impact quality of non-optimized code.
+
+## EnC is supported for methods with EH
+
+However, EnC remap is not supported inside funclets. The stack layout of funclets does not matter for EnC.
+
+## Initial RSP == RBP == PSPSym
+
+This invariant allows VM to compute new value of `RBP` and PSPSym after the edit without any additional information. Location of PSPSym is found via GC info.
+
+## Localloc
+
+Localloc is allowed in EnC code, but remap is disallowed after the method has executed a localloc instruction. VM uses the invariant above (`RSP == RBP`) to detect whether localloc was executed by the method.
+
+## Security object
+
+This does not require any special handling by the JIT on x64. (Different from x86). The security object is copied over by the VM during remap if necessary. Location of security object is found via GC info.
+
+## Synchronized methods
+
+The extra state created by the JIT for synchronized methods (original "this" and lock taken flag) must be preserved during remap. The JIT stores this state in the preserved region, and increases the size of the preserved region reported in GC info accordingly.
+
+## Generics
+
+EnC is not supported for generic methods and methods on generic types.
+
+# System V x86_64 support
+
+This section relates mostly to calling conventions on System V systems (such as Ubuntu Linux and Mac OS X).
+The general rules outlined in the System V x86_64 ABI (described at http://www.x86-64.org/documentation/abi.pdf) are followed with a few exceptions, described below:
+
+1. The hidden argument for by-value passed structs is always after the "this" parameter (if there is one). This is a difference with the System V ABI and affects only the internal JIT calling conventions. For PInvoke calls the hidden argument is always the first parameter since there is no "this" parameter in this case.
+2. Managed structs that have no fields are always passed by-value on the stack.
+3. The JIT proactively generates frame register frames (with `RBP` as a frame register) in order to aid the native OS tooling for stack unwinding and the like.
+4. All the other internal VM contracts for PInvoke, EH, and generic support remains in place. Please see the relevant sections above for more details. Note, however, that the registers used are different on System V due to the different calling convention. For example, the integer argument registers are, in order, RDI, RSI, RDX, RCX, R8, and R9. Thus, where the first argument (typically, the "this" pointer) on Windows AMD64 goes in RCX, on System V it goes in RDI, and so forth.
+5. Structs with explicit layout are always passed by value on the stack.
diff --git a/Documentation/botr/dac-notes.md b/Documentation/botr/dac-notes.md
new file mode 100644
index 0000000..adeb9a8
--- /dev/null
+++ b/Documentation/botr/dac-notes.md
@@ -0,0 +1,213 @@
+Data Access Component (DAC) Notes
+=================================
+
+Date: 2007
+
+Debugging managed code requires special knowledge of managed objects and constructs. For example, objects have various kinds of header information in addition to the data itself. Objects may move in memory as the garbage collector does its work. Getting type information may require help from the loader. Retrieving the correct version of a function that has undergone an edit-and-continue or getting information for a function emitted through reflection requires the debugger to be aware of EnC version numbers and metadata. The debugger must be able to distinguish AppDomains and assemblies. The code in the VM directory embodies the necessary knowledge of these managed constructs. This essentially means that APIs to retrieve information about managed code and data must run some of the same algorithms that the execution engine itself runs.
+
+Debuggers can operate either _in-process_ or _out-of-process_. A debugger that runs in-process requires a live data target (the debuggee). In this case, the runtime has been loaded and the target is running. A helper thread in the debuggee runs code from the execution engine to compute the information the debugger needs. Because the helper thread runs in the target process, it has ready access to the target's address space and the runtime code. All the computation occurs in the target process. This is a simple way to get the information the debugger needs to be able to represent managed constructs in a meaningful way. Nevertheless, an in-process debugger has certain limitations. For example, if the debuggee is not currently running (as is the case when the debuggee is a dump file), the runtime is not loaded (and may not even be available on the machine). In this case, the debugger has no way to execute runtime code to get the information it needs.
+
+Historically, the CLR debugger has operated in process. A debugger extension, SOS (Son of Strike) or Strike (in the early CLR days) can be used to inspect managed code. Starting with .NET Framework 4, the debugger runs out-of-process. The CLR debugger APIs provide much of the functionality of SOS along with other functionality that SOS does not provide. Both SOS and the CLR debugging APIs use the Data Access Component (DAC) to implement out-of-process debugging. The DAC is conceptually a subset of the runtime's execution engine code that runs out-of-process. This means that it can operate on a dump file, even on a machine that has no runtime installed. Its implementation consists mainly of a set of macros and templates, combined with conditional compilation of the execution engine's code. When the runtime is built, both clr.dll and mscordacwks.dll. For CoreCLR builds, the binaries are slightly different: coreclr.dll and msdaccore.dll. The file names also differ when built for other operating systems, like OS X. To inspect the target, the DAC can read its memory to get the inputs for the VM code in mscordacwks. It can then run the appropriate functions in the host to compute the information needed about a managed construct and finally return the results to the debugger.
+
+Notice that the DAC reads _the memory of the target process_. It's important to realize that the debugger and the debuggee are separate processes with separate address spaces. Thus it is important to make a clear distinction between target memory and host memory. Using a target address in code running in the host process would have completely unpredictable and generally incorrect results. When using the DAC to retrieve memory from the target, it is important to be very careful to use addresses from the correct address space. Furthermore, sometimes the target addresses are sometimes strictly used as data. In this case, it would be just as incorrect to use a host address. For example, to display information about a managed function, we might want to list its starting address and size. Here, it is important to provide the target address. When writing code in the VM that the DAC will run, one needs to correctly choose when to use host and target addresses.
+
+The DAC infrastructure (the macros and templates that control how host or target memory is accessed) supplies certain conventions that distinguish which pointers are host addresses and which are target addresses. When a function is _DACized_ (i.e., use the DAC infrastructure to make the function work out of process), host pointers of type _T _are declared to be of type _T _\*. Target pointers are of type PTR\ __T_. Remember though, that the concept of host versus target is only meaningful for the DAC. In a non-DAC build, we have only a single address space. The host and the target are the same: the CLR. If we declare a local variable of either type _T \* _or of type PTR\_T in a VM function, it will be a "host pointer" When we are executing code in clr.dll (coreclr.dll), there is absolutely no difference between a local variable of type _T \* _and a local variable of type PTR\__ T._ If we execute the function compiled into mscordacwks.dll (msdaccore.dll) from the same source, the variable declared to be of type _T \*_ will be a true host pointer, with the debugger as the host. If you think about it, this is obvious. Nevertheless it can become confusing when we start passing these pointers to other VM functions. When we are DACizing a function (i.e., changing _T \*_ to PTR\__T_, as appropriate), we sometimes need to trace a pointer back to its point of origin to determine whether it should be a host or target type.
+
+When one has no understanding of the DAC, it's easy to find the use of the DAC infrastructure annoying. The TADDRs and PTR\_this and dac\_casts, etc. seem to clutter the code and make it harder to understand. With just a little work, though, you'll find that these are not really difficult to learn. Keeping host and target addresses explicitly different is really a form of strong typing. The more diligent we are, the easier it becomes to ensure our code is correct.
+
+Because the DAC potentially operates on a dump, the part of the VM sources we build in clr.dll (msdaccore.dll) must be non-invasive. Specifically, we usually don't want to do anything that would cause writing to the target's address space, nor can we execute any code that might cause an immediate garbage collection. (If we can defer the GC, it may be possible to allocate.) Note that the _host_ state is always mutated (temporaries, stack or local heap values); it is only mutating the _target_ space that is problematic. To enforce this, we do two things: code factoring and conditional compilation. In an ideal world, we would factor the VM code so that we would strictly isolate invasive actions in functions that are separate from non-invasive functions.
+
+Unfortunately, we have a large code base, most of which we wrote without ever thinking about the DAC at all. We have a significant number of functions with "find or create" semantics and many other functions that have some parts that just do inspection and other parts that write to the target. Sometimes we control this with a flag passed into the function. This is common in loader code, for example. To avoid having to complete the immense job of refactoring all the VM code before we can use the DAC, we have a second method to prevent executing invasive code from out of process. We have a defined pre-processor constant, DACCESS\_COMPILE that we use to control what parts of the code we compile into the DAC. We would like to use the DACCESS\_COMPILE constant as little as we can, so when we DACize a new code path, we prefer to refactor whenever possible. Thus, a function that has "find or create" semantics should become two functions: one that tries to find the information and a wrapper that calls this and creates if the find fails. That way, the DAC code path can call the find function directly and avoid the creation.
+
+How does the DAC work?
+======================
+
+As discussed, the DAC works by marshaling the data it needs and running code in the mscordacwks.dll (msdaccore.dll) module. It marshals data by reading from the target address space to get a target value, and then storing it in the host address space where the functions in mscordacwks can operate on it. This happens only on demand, so if the mscordacwks functions never need a target value, the DAC will not marshal it.
+
+Marshaling Principles
+---------------------
+
+The DAC maintains a cache of data that it reads. This avoids the overhead of reading the same values repeatedly. Of course, if the target is live, the values will potentially change. We can only assume the cached values are valid as long as the debuggee remains stopped. Once we allow the target to continue execution, we must flush the DAC cache. The DAC will retrieve the values again when the debugger stops the target for further inspection. The entries in the DAC cache are of type DAC\_INSTANCE. This contains (among other data) the target address, the size of the data and space for the marshaled data itself. When the DAC marshals data, it returns the address of the marshaled data part of this entry as the host address.
+
+When the DAC reads a value from the target, it marshals the value as a chunk of bytes of a given size (determined by its type). By keeping the target address as a field in the cache entries, it maintains a mapping between the target address and the host address (the address in the cache). Between any stop and continue of a debugger session, the DAC will marshal each value requested only once, as long as subsequent accesses use the same type. (If we reference the target address by two different types, the size may be different, so the DAC will create a new cache entry for the new type). If the value is already in the cache, the DAC will be able to look it up by its target address. That means we can correctly compare two host pointers for (in)equality as long as we have accessed both pointers using the same type. This identity of pointers does not hold across type conversions however. Furthermore, we have no guarantee that values marshaled separately will maintain the same spatial relationship in the cache that they do in the target, so it is incorrect to compare two host pointers for less-than or greater-than relationships. Object layout must be identical in host and targe, so we can access fields in an object in the cache using the same offsets we use in the target. Remember that any pointer fields in a marshaled object will be target addresses (generally declared as data members of a PTR type). If we need the values at those addresses, the DAC must marshal them to the host before dereferencing them.
+
+Because we build this dll from the same sources that we use to build mscorwks.dll (coreclr.dll), the mscordacwks.dll (msdaccore.dll) build that the debugger uses must match the mscorwks build exactly. You can see that this is obviously true if you consider that between builds we might add or remove a field from a type we use. The size for the object in mscorwks would then be different from the size in mscordacwks and the DAC could not marshal the object correctly. This has a ramification that's obvious when you think about it, but easy to overlook. We cannot have fields in objects that exist only in DAC builds or only in non-DAC builds. Thus, a declaration such as the following would lead to incorrect behavior.
+
+ class Foo
+ {
+ ...
+ int nCount;
+
+ // DON'T DO THIS!! Object layout must match in DAC builds
+ #ifndef DACCESS_COMPILE
+
+ DWORD dwFlags;
+
+ #endif
+
+ PTR_Bar pBar;
+ ...
+ };
+
+Marshaling Specifics
+--------------------
+
+DAC marshaling works through a collection of typedefs, macros and templated types that generally have one meaning in DAC builds and a different meaning in non-DAC builds. You can find these declarations in [src\inc\daccess.h][daccess.h]. You will also find a long comment at the beginning of this file that explains the details necessary to write code that uses the DAC.
+
+[daccess.h]: https://github.com/dotnet/coreclr/blob/master/src/inc/daccess.h
+
+An example may be helpful in understanding how marshaling works. The common debugging scenario is represented in the following block diagram:
+
+![DAC Overview](../images/dac-overview.png)
+
+The debugger in this figure could be Visual Studio, MDbg, WinDbg, etc. The debugger interfaces with the CLR debugger interface (DBI) APIs to get the information it needs. Information that must come from the target goes through the DAC. The debugger implements the data target, which is responsible for implementing a ReadVirtual function to read memory in the target. The dotted line in the diagram represents the process boundary.
+
+Suppose the debugger needs to display the starting address of an ngen'ed method in the managed application that it has gotten from the managed stack. We will assume that the debugger has already gotten an instance of ICorDebugFunction back from the DBI. It will begin by calling the DBI API ICorDebugFunction::GetNativeCode. This calls into the DAC through the DAC/DBI interface function GetNativeCodeInfo, passing in the domain file and metadata token for the function. The following code fragment is a simplification of the actual function, but it illustrates marshaling without introducing extraneous details.
+
+ void DacDbiInterfaceImpl::GetNativeCodeInfo(TADDR taddrDomainFile,
+ mdToken functionToken,
+ NativeCodeFunctionData \* pCodeInfo)
+ {
+ ...
+
+ DomainFile \* pDomainFile = dac\_cast<PTR\_DomainFile>(taddrDomainFile);
+ Module \* pModule = pDomainFile->GetCurrentModule();
+
+ MethodDesc\* pMethodDesc = pModule->LookupMethodDef (functionToken);
+ pCodeInfo->pNativeCodeMethodDescToken = pMethodDesc;
+
+ // if we are loading a module and trying to bind a previously set breakpoint, we may not have
+ // a method desc yet, so check for that situation
+ if(pMethodDesc != NULL)
+ {
+ pCodeInfo->startAddress = pMethodDesc->GetNativeCode();
+ ...
+ }
+ }
+
+The first step is to get the module in which the managed function resides. The taddrDomainFile parameter we pass in represents a target address, but we will need to be able to dereference it here. This means we need the DAC to marshal the value. The dac\_cast operator will construct a new instance of PTR\_DomainFile with a target address equal to the value of domainFileTaddr. When we assign this to pDomainFile, we have an implicit conversion to the host pointer type. This conversion operator is a member of the PTR type and this is where the marshaling occurs. The DAC first searches its cache for the target address. If it doesn't find it, it reads the data from the target for the marshaled DomainFile instance and copies it to the cache. Finally, it returns the host address of the marshaled value.
+
+Now we can call GetCurrentModule on this host instance of the DomainFile. This function is a simple accessor that returns DomainFile::m\_pModule. Notice that it returns a Module \*, which will be a host address. The value of m\_pModule is a target address (the DAC will have copied the DomainFile instance as raw bytes). The type for the field is PTR\_Module, however, so when the function returns it, the DAC will automatically marshal it as part of the conversion to Module \*. That means the return value is a host address. Now we have the correct module and a method token, so we have all the information we need to get the MethodDesc.
+
+ Module * DomainFile::GetCurrentModule()
+ {
+ LEAF_CONTRACT;
+ SUPPORTS_DAC;
+ return m_pModule;
+ }
+
+In this simplified version of the code, we are assuming that the method token is a method definition. The next step, then, is to call the LookupMethodDef function on the Module instance.
+
+ inline MethodDesc \*Module::LookupMethodDef(mdMethodDef token)
+ {
+ WRAPPER\_CONTRACT;
+ SUPPORTS\_DAC;
+ ...
+ return dac\_cast<PTR\_MethodDesc>(GetFromRidMap(&m\_MethodDefToDescMap,
+ RidFromToken(token)));
+ }
+
+This uses the RidMap to lookup the MethodDesc. If you look at the definition for this function, you will see that it returns a TADDR:
+
+ TADDR GetFromRidMap(LookupMap \*pMap, DWORD rid)
+ {
+ ...
+
+ TADDR result = pMap->pTable[rid];
+ ...
+ return result;
+ }
+
+This represents a target address, but it's not really a pointer; it's simply a number (although it represents an address). The problem is that LookupMethodDef needs to return the address of a MethodDesc that we can dereference. To accomplish this, the function uses a dac\_cast to PTR\_MethodDesc to convert the TADDR to a PTR\_MethodDesc. You can think of this as the target address space form of a cast from void \* to MethodDesc \*. In fact, this code would be slightly cleander if GetFromRidMap returned a PTR\_VOID (with pointer semantics) instead of a TADDR (with integer semantics). Again, the type conversion implicit in the return statement ensures that the DAC marshals the object (if necessary) and returns the host address of the MethodDesc in the DAC cache.
+
+The assignment statement in GetFromRidMap indexes an array to get a particular value. The pMap parameter is the address of a structure field from the MethodDesc. As such, the DAC will have copied the entire field into the cache when it marshaled the MethodDesc instance. Thus, pMap, which is the address of this struct, is a host pointer. Dereferencing it does not involve the DAC at all. The pTable field, however, is a PTR\_TADDR. What this tells us is that pTable is an array of target addresses, but its type indicates that it is a marshaled type. This means that pTable will be a target address as well. We dereference it with the overloaded indexing operator for the PTR type. This will get the target address of the array and compute the target address of the element we want. The last step of indexing marshals the array element back to a host instance in the DAC cache and returns its value. We assign the the element (a TADDR) to the local variable result and return it.
+
+Finally, to get the code address, the DAC/DBI interface function will call MethodDesc::GetNativeCode. This function returns a value of type PCODE. This type is a target address, but one that we cannot dereference (it is just an alias of TADDR) and one that we use specifically to specify a code address. We store this value on the ICorDebugFunction instance and return it to the debugger.
+
+### PTR Types
+
+Because the DAC marshals values from the target address space to the host address space, understanding how the DAC handles target pointers is fundamental. We collectively refer to the fundamental types used for marshaling these as "PTR types." You will see that [daccess.h][daccess.h] defines two classes: \_\_TPtrBase, which has several derived types, and \_\_GlobalPtr. We don't use these types directly; we use them only indirectly through a number of macros. Each of these contains a single data member to give us the target address of the value. For \_\_TPtrBase, this is a full address. For \_\_GlobalPtr, it is a relative address, referenced from a DAC global base location. The "T" in \_\_TPtrBase stands for "target". As you can guess, we use types derived from \_\_TPtrBase for pointers that are data members or locals and we use \_\_GlobalPtr for globals and statics.
+
+In practice, we use these types only through macros. The introductory comment in [daccess.h][daccess.h] has examples of the use of all of these. What is interesting about these macros is that they will expand to declare instantiated types from these marshaling templates in DAC builds, but are no-ops in non-DAC builds. For example, the following definition declares PTR\_MethodTable as a type to represent method table pointers (note that the convention is to name these types with a prefix of PTR\_):
+
+ typedef DPTR(class MethodTable) PTR\_MethodTable;
+
+In a DAC build, the DPTR macro will expand to declare a \_\_DPtr<MethodTable> type named PTR\_MethodTable. In a non-DAC build, the macro simply declares PTR\_MethodTable to be MethodTable \*. This implies that the DAC functionality does not result in any behavior change or performance degradation in non-DAC builds.
+
+Even better, in a DAC build, the DAC will automatically marshal variables, data members, or return values declared to be of type PTR\_MethodTable, as we saw in the example in the last section. The marshaling is completely transparent. The \_\_DPtr type has overloaded operator functions to redefine pointer dereferencing and array indexing, and a conversion operator to cast to the host pointer type. These operations determine whether the requested value is already in the cache, from whence the operators will return them immediately, or whether it is necessary to read from the target and load the value into the cache before returning it. If you are interested in understanding the details, the function responsible for these cache operations is DacInstantiateTypeByAddressHelper.
+
+PTR types defined with DPTR are the most common in the runtime, but we also have PTR types for global and static pointers, restricted-use arrays, pointers to variable-sized objects, and pointers to classes with virtual functions that we may need to call from mscordacwks.dll (msdaccore.dll). Most of these are rare and you can refer to [daccess.h][daccess.h] to learn more about them if you need them.
+
+The GPTR and VPTR macros are common enough to warrant special mention here. Both the way we use these and their external behavior is quite similar to DPTRs. Again, marshaling is automatic and transparent. The VPTR macro declares a marshaled pointer type for a class with virtual functions. This special macro is necessary because the virtual function table is essentially an implicit extra field. The DAC has to marshal this separately, since the function addresses are all target addresses that the DAC must convert to host addresses. Treating these classes in this way means that the DAC automatically instantiates the correct implementation class, making casts between base and derived types unnecessary. When you declare a VPTR type, you must also list it in vptr\_list.h. \_\_GlobalPtr types provide base functionality to marshal both global variables and static data members through the GPTR, GVAL, SPTR and SVAL macros. The implementation of global variables is almost identical to that of static fields (both use the \_\_GlobalPtr class) and require the addition of an entry in [dacvars.h][dacvars.h]. The comments in daccess.h and dacvars.h provide more details about declaring these types.
+
+[dacvars.h]: https://github.com/dotnet/coreclr/blob/master/src/inc/dacvars.h
+
+Global and static values and pointers are interesting because they form the entry points to the target address space (all other uses of the DAC require you to have a target address already). Many of the globals in the runtime are already DACized. It occasionally becomes necessary to make a previously unDACized (or a newly introduced) global available to the DAC. By using the appropriate macros and [dacvars.h][dacvars.h] entry, you enable a post-build step (DacTableGen.exe run by the build in ndp\clr\src\dacupdatedll) to save the address of the global (from clr.pdb) into a table that is embedded into mscordacwks.dll. The DAC uses this table at run-time to determine where to look in the target address space when the code accesses a global.
+
+### VAL Types
+
+In addition to pointer types, the DAC must also marshal static and global values (as opposed to values referenced by static or global pointers). For this we have a collection of macros ?VAL\_\*. We use GVAL\_\* for global values, and SVAL\_\* for static values. The comment in the [daccess.h][daccess.h] file has a table showing how to use the various forms of these and includes instructions for declaring global and static values (and global and static pointers) that we will use in DACized code.
+
+### Pure Addresses
+
+The TADDR and PCODE types we introduced in the example of DAC operation are pure target addresses. These are actually integer types, rather than pointers. This prevents code in the host from incorrectly dereferencing them. The DAC does not treat them as pointers either. Specifically, because we have no type or size information no dereferencing or marshalling can occur. We use these primarily in two situations: when we are treating a target address as pure data and when we need to do pointer arithmetic with target addresses (although we can also do pointer arithmetic with PTR types). Of course, because TADDRs have no type information for the target locations they specify, when we perform address arithmetic, we need to factor in the size explicitly.
+
+We also have one special class of PTRs that don't involve marshaling: PTR\_VOID and PTR\_CVOID. These are the target equivalents of void \* and const void \*, respectively. Because TADDRs are simply numbers, they don't have pointer semantics, which means that if we DACize code by converting void \* to TADDR (as was often the case in the past), we often need extra casts and other changes, even in code that does not compile for the DAC. Using PTR\_VOID makes it easier and cleaner to DACize code that uses void \* by preserving the semantics expected for void \*. If we DACize a function that uses PTR\_VOID or PTR\_CVOID, we can't directly marshal data from these addresses, since we have no idea how much data we would need to read. This means we can't dereference them (or even do pointer aritmetic), but this is identical to the semantics of void \*. As is the case for void \*, we generally cast them to a more specific PTR type when we need to use them. We also have a PTR\_BYTE type, which is a standard marshaled target pointer (that supports pointer arithmetic, etc.). In general, when we DACize code, void \* becomes PTR\_VOID and BYTE \* becomes PTR\_BYTE, just as you would expect. [daccess.h][daccess.h] has explanatory comments that provide more details about the use and semantics of the PTR\_VOID type.
+
+Occasionally, legacy code stores a target address in a host pointer type such as void \*. This is always a bug and makes it extremely difficult to reason about the code. It will also break when we support cross-platform, where the pointer types are different sizes). In DAC builds, the void \* type is a host pointer which should never contain a target address. Using PTR\_VOID instead allows us to indicate that a void pointer type is a target address. We are trying to eliminate all such uses, but some are quite pervasive in the code and will take a while to eliminate entirely.
+
+### Conversions
+
+In earlier CLR versions, we used C-style type casting, macros, and constructors to cast between types. For example, in MethodIterator::Next, we have the following:
+
+ if (methodCold)
+ {
+ PTR_CORCOMPILE_METHOD_COLD_HEADER methodColdHeader
+ = PTR_CORCOMPILE_METHOD_COLD_HEADER((TADDR)methodCold);
+
+ if (((TADDR)methodCode) == PTR_TO_TADDR(methodColdHeader->hotHeader))
+ {
+ // Matched the cold code
+ m_pCMH = PTR_CORCOMPILE_METHOD_COLD_HEADER((TADDR)methodCold);
+ ...
+
+Both methodCold and methodCode are declared as BYTE \*, but in fact hold target addresses. In line 4, methodCold is casted to a TADDR and used as the argument to the constructor for PTR\_CORCOMPILE\_METHOD\_COLD\_HEADER. At this point, methodColdHeader is explicitly a target address. In line 6, there is another C-style cast for methodCode. The hotHeader field of methodColdHeader is of type PTR\_CORCOMPILE\_METHOD\_HEADER. The macro PTR\_TO\_TADDR extracts the raw target address from this PTR type and assigns it to methodCode. Finally, in line 9, another instance of type PTR\_CORCOMPILE\_METHOD\_COLD\_HEADER is constructed. Again, methodCold is casted to TADDR to pass to this constructor.
+
+If this code seems overly complex and confusing to you, that's good. In fact it is. Worse, it provides no protection for the separation of host and target addresses. From the declarations of methodCold and methodCode, there is no particular reason to interpret them as target addresses at all. If these pointers were dereferenced in DAC builds as if they really were host pointers, the process would probably AV. This snippet demonstrates that any arbitrary pointer type (as opposed to a PTR type) can be casted to a TADDR. Given that these two variables always hold target addresses, they should be of type PTR\_BYTE, rather than BYTE \*.
+
+There is also a disciplined means to cast between different PTR types: dac\_cast. The dac\_cast operator is the DAC-aware vesion of the C++ static\_cast operator (which the CLR coding conventions stipulate instead of C-style casts when casting pointer types). The dac\_cast operator will do any of the following things:
+
+1. Create a PTR type from a TADDR
+2. Convert one PTR type to another
+3. Create a PTR from a host instance previously marshaled to the DAC cache
+4. Extract the TADDR from a PTR type
+5. Get a TADDR from a host instance previously marshaled to the DAC cache
+
+Now, assuming both methodCold and methodCode are declared to be of type PTR\_BYTE, the code above can be rewritten as follows.
+
+ if (methodCold)
+ {
+ PTR_CORCOMPILE_METHOD_COLD_HEADER methodColdHeader
+ = dac_cast<PTR_CORCOMPILE_METHOD_COLD_HEADER>(methodCold);
+
+ if (methodCode == methodColdHeader->hotHeader)
+ {
+ // Matched the cold code
+ m_pCMH = methodColdHeader;
+
+You might argue that this code still seems complex and confusing, but at least we have significantly reduced the number of casts and constructors. We have also used constructs that maintain the separation between host and target pointers, so we have made the code safer. In particular, dac\_cast will often generate compiler or run-time errors if we try to do the wrong thing. In general, dac\_cast should be used for conversions.
+
+DACizing
+========
+
+When do you need to DACize?
+---------------------------
+
+Whenever you add a new feature, you will need to consider its debuggability needs and DACize the code to support your feature. You must also ensure that any other changes, such as bug fixes or code clean-up, conform to the DAC rules when necessary. Otherwise, the changes will break the debugger or SOS. If you are simply modifying existing code (as opposed to implementing a new feature), you will generally be able to determine that you need to worry about the DAC when a function you modify includes a SUPPORTS\_DAC contract. This contract has a few variants such as SUPPORTS\_DAC\_WRAPPER and LEAF\_DAC\_CONTRACT. You can find comments explaining the differences in [contract.h][contract.h]. If you see a number of DAC-specific types in the function, you should assume the code will run in DAC builds.
+
+[contract.h]: https://github.com/dotnet/coreclr/blob/master/src/inc/contract.h
+
+DACizing ensures that code in the engine will work correctly with the DAC. It is important to use the DAC correctly to marshal values from the target to the host. Target addresses used incorrectly from the host (or vice versa) may reference unmapped addresses. If addresses are mapped, the values will be completely unrelated to the values expected. As a result, DACizing mostly involves ensuring that we use PTR types for all values that the DAC needs to marshal. Another major task is to ensure that we do not allow invasive code to execute in DAC builds. In practice, this means that we must sometimes refactor code or add DACCESS\_COMPILE preprocessor directives. We also want to be sure that we add the appropriate SUPPORTS\_DAC contract. The use of this contract signals to developers that the function works with the DAC. This is important for two reasons:
+
+1. If we later call it from some other SUPPORTS\_DAC function, we know that it is DAC-safe and we don't need to worry about DACizing it.
+2. If we make modifications to the function, we need to make sure that they are DAC-safe. If we add a call to another function from this one, we also need to ensure that it is DAC-safe or that we only make the call in non-DAC builds.
diff --git a/Documentation/botr/exceptions.md b/Documentation/botr/exceptions.md
new file mode 100644
index 0000000..daa684b
--- /dev/null
+++ b/Documentation/botr/exceptions.md
@@ -0,0 +1,299 @@
+What Every Dev needs to Know About Exceptions in the Runtime
+============================================================
+
+Date: 2005
+
+When talking about "exceptions" in the CLR, there is an important distinction to keep in mind. There are managed exceptions, which are exposed to applications through mechanisms like C#'s try/catch/finally, with all of the runtime machinery to implement them. And then there is the use of exceptions inside the runtime itself. Most runtime developers seldom need to think about how to build and expose the managed exception model. But every runtime developer needs to understand how exceptions are used in the implementation of the runtime. When there is a need to keep the distinction clear, this document will refer to _managed exceptions_ that a managed application may throw or catch, and will refer to the _CLR's internal exceptions_ that are used by the runtime for its own error handling. Mostly, though, this document is about the CLR's internal exceptions.
+
+Where do exceptions matter?
+===========================
+
+Exceptions matter almost everywhere. They matter the most in functions that throw or catch exceptions, because that code must be written explicitly to throw the exception, or to catch and properly handle an exception. Even if a particular function doesn't itself throw an exception, it may well call one that does, and so that particular function must be written to behave correctly when an exception is thrown through it. The judicious use of _holders_ can greatly ease writing such code correctly.
+
+Why are CLR internal exceptions different?
+==========================================
+
+The CLR's internal exceptions are much like C++ exceptions, but not exactly. Rotor can be built for Mac OSX, for BSD, and for Windows. The OS and compiler differences dictate that we can't just use standard C++ try/catch. In addition, the CLR internal exceptions provide features similar to the managed "finally" and "fault".
+
+With the help of some macros, it is possible to write exception handling code that is almost as easy to write and to read as standard C++.
+
+Catching an Exception
+=====================
+
+EX_TRY
+------
+
+The basic macros are, of course, EX_TRY / EX_CATCH / EX_END_CATCH, and in use they look like this:
+
+ EX_TRY
+ // Call some function. Maybe it will throw an exception.
+ Bar();
+ EX_CATCH
+ // If we're here, something failed.
+ m_finalDisposition = terminallyHopeless;
+ EX_END_CATCH(RethrowTransientExceptions)
+
+The EX_TRY macro simply introduces the try block, and is much like the C++ "try", except that it also includes an opening brace, "{".
+
+EX_CATCH
+--------
+
+The EX_CATCH macro ends the try block, including the closing brace, "}", and begins the catch block. Like the EX_TRY, it also starts the catch block with an opening brace.
+
+And here is the big difference from C++ exceptions: the CLR developer doesn't get to specify what to catch. In fact, this set of macros catches everything, including non-C++ exceptions like AV or a managed exception. If a bit of code needs to catch just one exception, or a subset, then it will need to catch, examine the exception, and rethrow anything that isn't relevant.
+
+It bears repeating that the EX_CATCH macro catches everything. This behaviour is frequently not what a function needs. The next two sections discuss more about how to deal with exceptions that shouldn't have been caught.
+
+GET_EXCEPTION() & GET_THROWABLE()
+---------------------------------
+
+How, then, does a CLR developer discover just what has been caught, and determine what to do? There are several options, depending on just what the requirement is.
+
+First, whatever the (C++) exception that is caught, it will be delivered as an instance of some class derived from the global Exception class. Some of these derived classes are pretty obvious, like OutOfMemoryException. Some are somewhat domain specific, like EETypeLoadException. And some of these are just wrapper classes around another system's exceptions, like CLRException (has an OBJECTHANDLE to reference any managed exception) or HRException (wraps an HRESULT). If the original exception was not derived from Exception, the macros will wrap it up in something that is. (Note that all of these exceptions are system-provided and well known. _New exception classes shouldn't be added without involving the Core Execution Engine Team!_)
+
+Next, there is always an HRESULT associated with a CLR internal exception. Sometimes, as with HRException, the value came from some COM source, but internal errors and Win32 api failures also have HRESULTS.
+
+Finally, because almost any exception inside the CLR could possibly be delivered back to managed code, there is a mapping from the internal exceptions back to the corresponding managed exceptions. The managed exception won't necessarily be created, but there is always the possibility of obtaining it.
+
+So, given these features, how does the CLR developer categorize an exception?
+
+Frequently, all that is needed to categorize is the HRESULT that corresponds to the exception, and this is extremely easy to get:
+
+ HRESULT hr = GET_EXCEPTION()->GetHR();
+
+More information is often most conveniently available through the managed exception object. And if the exception will be delivered back to managed code, whether immediately, or cached for later, the managed object is, of course, required. And the exception object is just as easy to get. Of course, it is a managed objectref, so all the usual rules apply:
+
+ OBJECTREF throwable = NULL;
+ GCPROTECT_BEGIN(throwable);
+ // . . .
+ EX_TRY
+ // . . . do something that might throw
+ EX_CATCH
+ throwable = GET_THROWABLE();
+ EX_END_CATCH(RethrowTransientExceptions)
+ // . . . do something with throwable
+ GCPROTECT_END()
+
+Sometimes, there is no avoiding a need for the C++ exception object, though this is mostly inside the exception implementation. If it is important exactly what the C++ exception type is, there is a set of lightweight RTTI-like functions that help categorize exceptions. For instance,
+
+ Exception *pEx = GET_EXCEPTION();
+ if (pEx->IsType(CLRException::GetType())) {/* ... */}
+
+would tell whether the exception is (or derives from) CLRException.
+
+EX_END_CATCH(RethrowTransientExceptions)
+----------------------------------------
+
+In the example above, "RethrowTransientExceptions" is an argument to the EX_END_CATCH macro; it is one of three pre-defined macros that can be thought of "exception disposition". Here are the macros, and their meanings:
+
+- _SwallowAllExceptions_: This is aptly named, and very simple. As the name suggests, it swallows everything. While simple and appealing, this is often not the right thing to do.
+- _RethrowTerminalExceptions_. A better name would be "RethrowThreadAbort", which is what this macro does.
+- _RethrowTransientExceptions_. The best definition of a "transient" exception is one that might not occur if tried again, possibly in a different context. These are the transient exceptions:
+ - COR_E_THREADABORTED
+ - COR_E_THREADINTERRUPTED
+ - COR_E_THREADSTOP
+ - COR_E_APPDOMAINUNLOADED
+ - E_OUTOFMEMORY
+ - HRESULT_FROM_WIN32(ERROR_COMMITMENT_LIMIT)
+ - HRESULT_FROM_WIN32(ERROR_NOT_ENOUGH_MEMORY)
+ - (HRESULT)STATUS_NO_MEMORY
+ - COR_E_STACKOVERFLOW
+ - MSEE_E_ASSEMBLYLOADINPROGRESS
+
+The CLR developer with doubts about which macro to use should probably pick _RethrowTransientExceptions_.
+
+In every case, however, the developer writing an EX_END_CATCH needs to think hard about which exception should be caught, and should catch only those exceptions. And, because the macros catch everything anyway, the only way to not catch an exception is to rethrow it.
+
+If an EX_CATCH / EX_END_CATCH block has properly categorized its exceptions, and has rethrown wherever necessary, then SwallowAllExceptions is the way to tell the macros that no further rethrowing is necessary.
+
+## EX_CATCH_HRESULT
+
+Sometimes all that is needed is the HRESULT corresponding to an exception, particularly when the code is in an interface from COM. For these cases, EX_CATCH_HRESULT is simpler than writing a while EX_CATCH block. A typical case would look like this:
+
+ HRESULT hr;
+ EX_TRY
+ // code
+ EX_CATCH_HRESULT (hr)
+
+ return hr;
+
+_However, while very tempting, it is not always correct_. The EX_CATCH_HRESULT catches all exceptions, saves the HRESULT, and swallows the exception. So, unless that exception swallowing is what the function really needs, EX_CATCH_HRESULT is not appropriate.
+
+EX_RETHROW
+----------
+
+As noted above, the exception macros catch all exceptions; the only way to catch a specific exception is to catch all, and rethrow all but the one(s) of interest. So, if, after an exception is caught, examined, possibly logged, and so forth, it shouldn't be caught, it may be re-thrown. EX_RETHROW will re-raise the same exception.
+
+Not catching an exception
+=========================
+
+It's frequently the case that a bit of code doesn't need to catch an exception, but does need to perform some sort of cleanup or compensating action, Holders are frequently just the thing for this scenario, but not always. For the times that holders aren't adequate, the CLR has two variations on a "finally" block.
+
+EX_TRY_FOR_FINALLY
+------------------
+
+When there is a need for some sort of compensating action as code exits, a finally may be appropriate. There is a set of macros to implement a try/finally in the CLR:
+
+ EX_TRY_FOR_FINALLY
+ // code
+ EX_FINALLY
+ // exit and/or backout code
+ EX_END_FINALLY
+
+**Important** : The EX_TRY_FOR_FINALLY macros are built with SEH, rather than C++ EH, and the C++ compiler doesn't allow SEH and C++ EH to be mixed in the same function. Locals with auto-destructors require C++ EH for their destructor to run. Therefore, any function with EX_TRY_FOR_FINALLY can't have EX_TRY, and can't have any local variable with an auto-destructor.
+
+EX_HOOK
+-------
+
+Frequently there is a need for compensating code, but only when an exception is thrown. For these cases, EX_HOOK is similar to EX_FINALLY, but the "hook" clause only runs when there is an exception. The exception is automatically rethrown at the end of the "hook" clause.
+
+ EX_TRY
+ // code
+ EX_HOOK
+ // code to run when an exception escapes the “code” block.
+ EX_END_HOOK
+
+This construct is somewhat better than simply EX_CATCH with EX_RETHROW, because it will rethrow a non-stack-overflow, but will catch a stack overflow exception (and unwind the stack) and then throw a new stack overflow exception.
+
+Throwing an Exception
+=====================
+
+Throwing an Exception in the CLR is generally a matter of calling
+
+ COMPlusThrow ( < args > )
+
+There are a number of overloads, but the idea is to pass the "kind" of the exception to COMPlusThrow. The list of "kinds" is generated by a set of macros operating on [Rexcep.h](https://github.com/dotnet/coreclr/blob/master/src/vm/rexcep.h), and the various "kinds" are kAmbiguousMatchException, kApplicationException, and so forth. Additional arguments (for the overloads) specify resources and substitution text. Generally, the right "kind" is selected by looking for other code that reports a similar error.
+
+There are some pre-defined convenience variations:
+
+COMPlusThrowOOM();
+------------------
+
+Defers to ThrowOutOfMemory(), which throws the C++ OOM exception. This will throw a pre-allocated exception, to avoid the problem of being out of memory trying to throw an out of memory exception!
+
+When getting the managed exception object for this exception, the runtime will first try to allocate a new managed object <sup>[1]</sup>, and if that fails, will return a pre-allocated, shared, global out of memory exception object.
+
+[1] After all, if it was a request for a 2gb array that failed, a simple object may be fine.
+
+COMPlusThrowHR(HRESULT theBadHR);
+---------------------------------
+
+There are a number of overloads, in case you have an IErrorInfo, etc. There is some surprisingly complicated code to figure out what kind of exception corresponds to a particular HRESULT.
+
+COMPlusThrowWin32(); / COMPlusThrowWin32(hr);
+---------------------------------------------
+
+Basically throws an HRESULT_FROM_WIN32(GetLastError())
+
+COMPlusThrowSO();
+-----------------
+
+Throws a Stack Overflow (SO) Exception. Note that this is not a hard SO, but rather an exception we throw when proceeding might lead to a hard SO.
+
+Like OOM, this throws a pre-allocated C++ SO exception object. Unlike OOM, when retrieving the managed object, the runtime always returns the pre-allocated, shared, global stack overflow exception object.
+
+COMPlusThrowArgumentNull()
+--------------------------
+
+A helper for throwing an "argument foo must not be null" exception.
+
+COMPlusThrowArgumentOutOfRange()
+--------------------------------
+
+As it sounds.
+
+COMPlusThrowArgumentException()
+-------------------------------
+
+Yet another flavor of invalid argument exception.
+
+COMPlusThrowInvalidCastException(thFrom, thTo)
+----------------------------------------------
+
+Given type handles to from and to types of the attempted cast, the helper creates the a nicely formatted exception message.
+
+EX_THROW
+--------
+
+This is a low-level throw construct that is not generally needed in normal code. Many of the COMPlusThrowXXX functions use EX_THROW internally, as do other specialized ThrowXXX functions. It is best to minimize direct use of EX_THROW, simply to keep the nitty-gritty details of the exception mechanism as well encapsulated as possible. But when none of the higher-level Throw functions work, it is fine to use EX_THROW.
+
+The macro takes two arguments, the type of exception to be thrown (some sub-type of the C++ Exception class), and a parenthesized list of arguments to the exception type's constructor.
+
+Using SEH directly
+==================
+
+There are a few situations where it is appropriate to use SEH directly. In particular, SEH is the only option if some processing is needed on the first pass, that is, before the stack is unwound. The filter code in an SEH __try/__except can do anything, in addition to deciding whether to handle an exception. Debugger notifications is an area that sometimes needs first pass handling.
+
+Filter code needs to be written very carefully. In general, the filter code must be prepared for any random, and likely inconsistent, state. Because the filter runs on the first pass, and dtors run on the second pass, holders won't have run yet, and will not have restored their state.
+
+PAL_TRY / PAL_EXCEPT, PAL_EXCEPT_FILTER, PAL_FINALLY / PAL_ENDTRY
+-----------------------------------------------------------------
+
+When a filter is needed, the PAL_TRY family is the portable way to write one in the CLR. Because the filter uses SEH directly, it is incompatible with C++ EH in the same function, and so there can't be any holders in the function.
+
+Again, these should be rare.
+
+__try / __except, __finally
+---------------------------
+
+There isn't a good reason to use these directly in the CLR.
+
+Exceptions and GC mode
+======================
+
+Throwing an exception with COMPlusThrowXXX() doesn't affect the GC mode, and is safe in any mode. As the exception unwinds back to the EX_CATCH, any holders that were on the stack will be unwound, releasing their resources and resetting their state. By the time that execution resumes in the EX_CATCH, the holder-protected state will have been restored to what it was at the time of the EX_TRY.
+
+Transitions
+===========
+
+Considering managed code, the CLR, COM servers, and other native code, there are many possible transitions between calling conventions, memory management, and, of course, exception handling mechanisms. Regarding exceptions, it is fortunate for the CLR developer that most of these transitions are either completely outside of the runtime, or are handled automatically. There are three transitions that are a daily concern for a CLR developer. Anything else is an advanced topic, and those who need to know about them, are well aware that they need to know!
+
+Managed code into the runtime
+-----------------------------
+
+This is the "fcall", "jit helper", and so forth. The typical way that the runtime reports errors back to managed code is through a managed exception. So, if an fcall function, directly or indirectly, raises a managed exception, that's perfectly fine. The normal CLR managed exception implementation will "do the right thing" and look for an appropriate managed handler.
+
+On the other hand, if an fcall function can do anything that might throw a CLR internal exception (one of the C++ exceptions), that exception must not be allowed to leak back out to managed code. To handle this case, the CLR has the UnwindAndContinueHandler (UACH), which is a set of code to catch the C++ EH exceptions, and re-raise them as managed exceptions.
+
+Any runtime function that is called from managed code, and might throw a C++ EH exception, must wrap the throwing code in INSTALL_UNWIND_AND_CONTINUE_HANDLER / UNINSTALL_UNWIND_AND_CONTINUE_HANDLER. Installing a HELPER_METHOD_FRAME will automatically install the UACH. There is a non-trivial amount of overhead to installing a UACH, so they shouldn't be used everywhere. One technique that is used in performance critical code is to run without a UACH, and install one just before throwing an exception.
+
+When a C++ exception is thrown, and there is a missing UACH, the typical failure will be a Contract Violation of "GC_TRIGGERS called in a GC_NOTRIGGER region" in CPFH_RealFirstPassHandler. To fix these, look for managed to runtime transitions, and check for INSTALL_UNWIND_AND_CONTINUE_HANDLER or HELPER_METHOD_FRAME_BEGIN_XXX.
+
+Runtime code into managed code
+------------------------------
+
+The transition from the runtime into managed code has highly platform-dependent requirements. On 32-bit Windows platforms, the CLR's managed exception code requires that "COMPlusFrameHandler" is installed just before entering managed code. These transitions are handled by highly specialized helper functions, which take care of the appropriate exception handlers. It is very unlikely that any typical new calls into managed would use any other way in. In the event that the COMPlusFrameHander were missing, the most likely effect would be that exception handling code in the target managed code simply wouldn't be executed – no finally blocks, and no catch blocks.
+
+Runtime code into external native code
+--------------------------------------
+
+Calls from the runtime into other native code (the OS, the CRT, and other DLLs) may need particular attention. The cases that matter are those in which the external code might cause an exception. The reason that this is a problem comes from the implementation of the EX_TRY macros, and in particular how they translate or wrap non-Exceptions into Exceptions. With C++ EH, it is possible to catch any and all exceptions (via "catch(...)"), but only by giving up all information about what has been caught. When catching an Exception*, the macros have the exception object to examine, but when catching anything else, there is nothing to examine, and the macros must guess what the actual exception is. And when the exception comes from outside of the runtime, the macros will always guess wrong.
+
+The current solution is to wrap the call to external code in a "callout filter". The filter will catch the external exception, and translate it into SEHException, one of the runtime's internal exceptions. This filter is predefined, and is simple to use. However, using a filter means using SEH, which of course precludes using C++ EH in the same function. To add a callout filter to a function that uses C++ EH will require splitting a function in two.
+
+To use the callout filter, instead of this:
+
+ length = SysStringLen(pBSTR);
+
+write this:
+
+ BOOL OneShot = TRUE;
+
+ PAL_TRY
+ {
+ length = SysStringLen(pBSTR);
+ }
+ PAL_EXCEPT_FILTER(CallOutFilter, &OneShot)
+ {
+ _ASSERTE(!"CallOutFilter returned EXECUTE_HANDLER.");
+ }
+ PAL_ENDTRY;
+
+A missing callout filter on a call that raises an exception will always result in the wrong exception being reported in the runtime. The type that is incorrectly reported isn't even always deterministic; if there is already some managed exception "in flight", then that managed exception is what will be reported. If there is no current exception, then OOM will be reported. On a checked build there are asserts that usually fire for a missing callout filter. These assert messages will include the text "The runtime may have lost track of the type of an exception".
+
+Miscellaneous
+=============
+
+There are actually a lot of macros involved in EX_TRY. Most of them should never, ever, be used outside of the macro implementations.
+
+One set, BEGIN_EXCEPTION_GLUE / END_EXCEPTION_GLUE, deserves special mention. These were intended to be transitional macros, and were to be replaced with more appropriate macros in the Whidbey project. Of course, they worked just fine, and so they weren't all replaced. Ideally, all instances will be converted during a "cleanup" milestone, and the macros removed. In the meantime, any CLR dev tempted to use them should resist, and instead write EX_TRY/EX_CATCH/EX_CATCH_END or EX_CATCH_HRESULT.
diff --git a/Documentation/botr/garbage-collection.md b/Documentation/botr/garbage-collection.md
new file mode 100644
index 0000000..9e16131
--- /dev/null
+++ b/Documentation/botr/garbage-collection.md
@@ -0,0 +1,332 @@
+Garbage Collection Design
+=========================
+Author: Maoni Stephens ([@maoni0](https://github.com/maoni0)) - 2015
+
+Note: See the The Garbage Collection Handbook referenced in the resources section at the end of this document to learn more about garbage collection topics.
+
+Component Architecture
+======================
+
+The 2 components that belong to GC are the allocator and the
+collector. The allocator is responsible for getting more memory and triggering the collector when appropriate. The collector reclaims garbage, or the memory of objects that are no longer in use by the program.
+
+There are other ways that the collector can get called, such as manually calling GC.Collect or the finalizer thread receiving an asynchronous notification of the low memory (which triggers the collector).
+
+Design of Allocator
+===================
+
+The allocator gets called by the allocation helpers in the Execution Engine (EE), with the following information:
+
+- Size requested
+- Thread allocation context
+- Flags that indicate things like whether this is a finalizable object or not
+
+The GC does not have special treatment for different kinds of object types. It consults the EE to get the size of an object.
+
+Based on the size, the GC divides objects into 2 categories: small
+objects (< 85,000 bytes) and large objects (>= 85,000 bytes). In
+principle, small and large objects can be treated the same way but
+since compacting large objects is more expensive GC makes this
+distinction.
+
+When the GC gives out memory to the allocator, it does so in terms of allocation contexts. The size of an allocation context is defined by the allocation quantum.
+
+- **Allocation contexts** are smaller regions of a given heap segment that are each dedicated for use by a given thread. On a single-processor (meaning 1 logical processor) machine, a single context is used, which is the generation 0 allocation context.
+- The **Allocation quantum** is the size of memory that the allocator allocates each time it needs more memory, in order to perform object allocations within an allocation context. The allocation is typically 8k and the average size of managed objects are around 35 bytes, enabling a single allocation quantum to be used for many object allocations.
+
+Large objects do not use allocation contexts and quantums. A single large object can itself be larger than these smaller regions of memory. Also, the benefits (discussed below) of these regions are specific to smaller objects. Large objects are allocated directly to a heap segment.
+
+The allocator is designed to achieve the following:
+
+- **Triggering a GC when appropriate:** The allocator triggers a GC when the allocation budget (a threshold set by the collector) is exceeded or when the allocator can no longer allocate on a given segment. The allocation budget and managed segments are discussed in more detail later.
+- **Preserving object locality:** Objects allocated together on the same heap segment will be stored at virtual addresses close to each other.
+- **Efficient cache usage:** The allocator allocates memory in _allocation quantum_ units, not on an object-by-object basis. It zeroes out that much memory to warm up the CPU cache because there will be objects immediately allocated in that memory. The allocation quantum is usually 8k.
+- **Efficient locking:** The thread affinity of allocation contexts and quantums guarantee that there is only ever a single thread writing to a given allocation quantum. As a result, there is no need to lock for object allocations, as long as the current allocation context is not exhausted.
+- **Memory integrity:** The GC always zeroes out the memory for newly allocated objects to prevent object references pointing at random memory.
+- **Keeping the heap crawlable:** The allocator makes sure to make a free object out of left over memory in each allocation quantum. For example, if there is 30 bytes left in an allocation quantum and the next object is 40 bytes, the allocator will make the 30 bytes a free object and get a new allocation quantum.
+
+Allocation APIs
+---------------
+
+ Object* GCHeap::Alloc(size_t size, DWORD flags);
+ Object* GCHeap::Alloc(alloc_context* acontext, size_t size, DWORD flags);
+
+The above functions can be used to allocate both small objects and
+large objects. There is also a function to allocate directly on LOH:
+
+ Object* GCHeap::AllocLHeap(size_t size, DWORD flags);
+
+Design of the Collector
+=======================
+
+Goals of the GC
+---------------
+
+The GC strives to manage memory extremely efficiently and
+require very little effort from people who write "managed code". Efficient means:
+
+- GCs should occur often enough to avoid the managed heap containing a significant amount (by ratio or absolute count) of unused but allocated objects (garbage), and therefore use memory unnecessarily.
+- GCs should happen as infrequently as possible to avoid using otherwise useful CPU time, even though frequent GCs would result in lower memory usage.
+- A GC should be productive. If GC reclaims a small amount of memory, then the GC (including the associated CPU cycles) was wasted.
+- Each GC should be fast. Many workloads have low latency requirements.
+- Managed code developers shouldn’t need to know much about the GC to achieve good memory utilization (relative to their workload).
+– The GC should tune itself to satisfy different memory usage patterns.
+
+Logical representation of the managed heap
+------------------------------------------
+
+The CLR GC is a generational collector which means objects are
+logically divided into generations. When a generation _N_ is collected,
+the survived objects are now marked as belong to generation _N+1_. This
+process is called promotion. There are exceptions to this when we
+decide to demote or not promote.
+
+For small objects the heap is divided into 3 generations: gen0, gen1
+and gen2. For large objects there’s one generation – gen3. Gen0 and gen1 are referred to as ephemeral (objects lasting for a short time) generations.
+
+For the small object heap, the generation number represents the age – gen0
+being the youngest generation. This doesn’t mean all objects in gen0
+are younger than any objects in gen1 or gen2. There are exceptions
+which will be explained below. Collecting a generation means collecting
+objects in that generation and all its younger generations.
+
+In principle large objects can be handled the same way as small
+objects but since compacting large objects is very expensive, they are treated differently. There is only one generation for large objects and
+they are always collected with gen2 collections due to performance
+reasons. Both gen2 and gen3 can be big, and collecting ephemeral generations (gen0 and gen1) needs to have a bounded cost.
+
+Allocations are made in the youngest generation – for small objects this means always gen0 and for large objects this means gen3 since there’s only one generation.
+
+Physical representation of the managed heap
+-------------------------------------------
+
+The managed heap is a set of managed heap segments. A heap segment is a contiguous block of memory that is acquired by the GC from the OS. The heap segments are
+partitioned into small and large object segments, given the distinction of small and large objects. On each heap the heap segments are chained together. There is at least one small object segment and one large segment - they are reserved when CLR is loaded.
+
+There’s always only one ephemeral segment in each small object heap, which is where gen0 and gen1 live. This segment may or may not include gen2
+objects. In addition to the ephemeral segment, there can be zero, one or more additional segments, which will be gen2 segments since they only contain gen2 objects.
+
+There are 1 or more segments on the large object heap.
+
+A heap segment is consumed from the lower address to the higher
+address, which means objects of lower addresses on the segment are
+older than those of higher addresses. Again there are exceptions that
+will be described below.
+
+Heap segments can be acquired as needed. They are deleted when they
+don’t contain any live objects, however the initial segment on the heap
+will always exist. For each heap, one segment at a time is acquired,
+which is done during a GC for small objects and during allocation time
+for large objects. This design provides better performance because large objects are only collected with gen2 collections (which are relatively expensive).
+
+Heap segments are chained together in order of when they were acquired. The last segment in the chain is always the ephemeral segment. Collected segments (no live objects) can be reused instead of deleted and instead become the new ephemeral segment. Segment reuse is only implemented for small object heap. Each time a large object is allocated, the whole large object heap is considered. Small object allocations only consider the ephemeral segment.
+
+The allocation budget
+---------------------
+
+The allocation budget is a logical concept associated with each
+generation. It is a size limit that triggers a GC for that
+generation when it exceeded.
+
+The budget is a property set on the generation mostly based on the
+survival rate of that generation. If the survival rate is high, the budget is made larger with the expectation that there will be a better ratio of dead to live objects next time there is a GC for that generation.
+
+Determining which generation to collect
+---------------------------------------
+
+When a GC is triggered, the GC must first determine which generation to collect. Besides the allocation budget there are other factors that must be considered:
+
+- Fragmentation of a generation – if a generation has high fragmentation, collecting that generation is likely to be productive.
+- If the memory load on the machine is too high, the GC may collect
+ more aggressively if that’s likely to yield free space. This is important to
+ prevent unnecessary paging (across the machine).
+- If the ephemeral segment is running out of space, the GC may do more aggressive ephemeral collections (meaning doing more gen1’s) to avoid acquiring a new heap segment.
+
+The flow of a GC
+----------------
+
+Mark phase
+----------
+
+The goal of the mark phase is to find all live objects.
+
+The benefit of a generational collector is the ability to collect just part of
+the heap instead of having to look at all of the objects all the
+time. When collecting the ephemeral generations, the GC needs to find out which objects are live in these generations, which is information reported by the EE. Besides the objects kept live by the EE, objects in older generations
+can also keep objects in younger generations live by making references
+to them.
+
+The GC uses cards for the older generation marking. Cards are set by JIT
+helpers during assignment operations. If the JIT helper sees an
+object in the ephemeral range it will set the byte that contains the
+card representing the source location. During ephemeral collections, the GC can look at the set cards for the rest of the heap and only look at the objects that these cards correspond to.
+
+Plan phase
+---------
+
+The plan phase simulates a compaction to determine the effective result. If compaction is productive the GC starts an actual compaction; otherwise it sweeps.
+
+Relocate phase
+--------------
+
+If the GC decides to compact, which will result in moving objects, then references to these objects must be updated. The relocate phase needs to find all references that point to objects that are in the
+generations being collected. In contrast, the mark phase only consults live objects so doesn’t need to consider weak references.
+
+Compact phase
+-------------
+
+This phase is very straight forward since the plan phase already
+calculated the new addresses the objects should move to. The compact
+phase will copy the objects there.
+
+Sweep phase
+-----------
+
+The sweep phase looks for the dead space in between live objects. It creates free objects in place of these dead spaces. Adjacent dead objects are made into one free object. It places all of these free objects onto the _freelist_.
+
+Code Flow
+=========
+
+Terms:
+
+- **WKS GC:** Workstation GC.
+- **SRV GC:** Server GC
+
+Functional Behavior
+-------------------
+
+### WKS GC with concurrent GC off
+
+1. User thread runs out of allocation budget and triggers a GC.
+2. GC calls SuspendEE to suspend managed threads.
+3. GC decides which generation to condemn.
+4. Mark phase runs.
+5. Plan phase runs and decides if a compacting GC should be done.
+6. If so relocate and compact phase runs. Otherwise, sweep phase runs.
+7. GC calls RestartEE to resume managed threads.
+8. User thread resumes running.
+
+### WKS GC with concurrent GC on
+
+This illustrates how a background GC is done.
+
+1. User thread runs out of allocation budget and triggers a GC.
+2. GC calls SuspendEE to suspend managed threads.
+3. GC decides if background GC should be run.
+4. If so background GC thread is woken up, to do a background
+ GC. Background GC thread calls RestartEE to resume managed threads.
+5. Managed threads continue allocating while the background GC does its work.
+6. User thread may run out of allocation budget and trigger an
+ ephemeral GC (what we call a foreground GC). This is done in the same
+ fashion as the "WKS GC with concurrent GC off" flavor.
+7. Background GC calls SuspendEE again to finish with marking and then
+ calls RestartEE to start the concurrent sweep phase while user threads
+ are running.
+8. Background GC is finished.
+
+### SVR GC with concurrent GC off
+
+1. User thread runs out of allocation budget and triggers a GC.
+2. Server GC threads are woken up and calls SuspendEE to suspend
+ managed threads.
+3. Server GC threads do the GC work (same phases as in workstation GC
+ without concurrent GC).
+4. Server GC threads call RestartEE to resume managed threads.
+5. User thread resumes running.
+
+### SVR GC with concurrent GC on
+
+This scenario is the same as WKS GC with concurrent GC on, except the non background GCs are done on SVR GC threads.
+
+Physical Architecture
+=====================
+
+This section is meant to help you follow the code flow.
+
+User thread runs out of quantum and gets a new quantum via try_allocate_more_space.
+
+try_allocate_more_space calls GarbageCollectGeneration when it needs to trigger a GC.
+
+Given WKS GC with concurrent GC off, GarbageCollectGeneration is done all
+on the user thread that triggerred the GC. The code flow is:
+
+ GarbageCollectGeneration()
+ {
+ SuspendEE();
+ garbage_collect();
+ RestartEE();
+ }
+
+ garbage_collect()
+ {
+ generation_to_condemn();
+ gc1();
+ }
+
+ gc1()
+ {
+ mark_phase();
+ plan_phase();
+ }
+
+ plan_phase()
+ {
+ // actual plan phase work to decide to
+ // compact or not
+ if (compact)
+ {
+ relocate_phase();
+ compact_phase();
+ }
+ else
+ make_free_lists();
+ }
+
+Given WKS GC with concurrent GC on (default case), the code flow for a background GC is
+
+ GarbageCollectGeneration()
+ {
+ SuspendEE();
+ garbage_collect();
+ RestartEE();
+ }
+
+ garbage_collect()
+ {
+ generation_to_condemn();
+ // decide to do a background GC
+ // wake up the background GC thread to do the work
+ do_background_gc();
+ }
+
+ do_background_gc()
+ {
+ init_background_gc();
+ start_c_gc ();
+
+ //wait until restarted by the BGC.
+ wait_to_proceed();
+ }
+
+ bgc_thread_function()
+ {
+ while (1)
+ {
+ // wait on an event
+ // wake up
+ gc1();
+ }
+ }
+
+ gc1()
+ {
+ background_mark_phase();
+ background_sweep();
+ }
+
+Resources
+=========
+
+- [.NET CLR GC Implementation](https://raw.githubusercontent.com/dotnet/coreclr/master/src/gc/gc.cpp)
+- [The Garbage Collection Handbook: The Art of Automatic Memory Management](http://www.amazon.com/Garbage-Collection-Handbook-Management-Algorithms/dp/1420082795)
+- [Garbage collection (Wikipedia)](http://en.wikipedia.org/wiki/Garbage_collection_(computer_science))
diff --git a/Documentation/botr/intro-to-clr.md b/Documentation/botr/intro-to-clr.md
new file mode 100644
index 0000000..3ba3d3b
--- /dev/null
+++ b/Documentation/botr/intro-to-clr.md
@@ -0,0 +1,261 @@
+Introduction to the Common Language Runtime (CLR)
+===
+
+By Vance Morrison ([@vancem](https://github.com/vancem)) - 2007
+
+What is the Common Language Runtime (CLR)? To put it succinctly:
+
+> The Common Language Runtime (CLR) is a complete, high level virtual machine designed to support a broad variety of programming languages and interoperation among them.
+
+Phew, that was a mouthful. It also in and of itself is not very illuminating. The statement above _is_ useful however, because it is the first step in taking the large and complicated piece of software known as the [CLR][clr] and grouping its features in an understandable way. It gives us a "10,000 foot" view of the runtime from which we can understand the broad goals and purpose of the runtime. After understanding the CLR at this high level, it is easier to look more deeply into sub-components without as much chance of getting lost in the details.
+
+# The CLR: A (very rare) Complete Programming Platform
+
+Every program has a surprising number of dependencies on its runtime environment. Most obviously, the program is written in a particular programming language, but that is only the first of many assumptions a programmer weaves into the program. All interesting programs need some _runtime library_ that allows them to interact with the other resources of the machine (such as user input, disk files, network communications, etc). The program also needs to be converted in some way (either by interpretation or compilation) to a form that the native hardware can execute directly. These dependencies of a program are so numerous, interdependent and diverse that implementers of programming languages almost always defer to other standards to specify them. For example, the C++ language does not specify the format of a C++ executable. Instead, each C++ compiler is bound to a particular hardware architecture (e.g., X86) and to an operating system environment (e.g., Windows, Linux, or Mac OS), which describes the format of the executable file format and specifies how it will be loaded. Thus, programmers don't make a "C++ executable," but rather a "Windows X86 executable" or a "Power PC Mac OS executable."
+
+While leveraging existing hardware and operating system standards is usually a good thing, it has the disadvantage of tying the specification to the level of abstraction of the existing standards. For example, no common operating system today has the concept of a garbage-collected heap. Thus, there is no way to use existing standards to describe an interface that takes advantage of garbage collection (e.g., passing strings back and forth, without worrying about who is responsible for deleting them). Similarly, a typical executable file format provides just enough information to run a program but not enough information for a compiler to bind other binaries to the executable. For example, C++ programs typically use a standard library (on Windows, called msvcrt.dll) which contains most of the common functionality (e.g., printf), but the existence of that library alone is not enough. Without the matching header files that go along with it (e.g., stdio.h), programmers can't use the library. Thus, existing executable file format standards cannot be used both to describe a file format that can be run and to specify other information or binaries necessary to make the program complete.
+
+The CLR fixes problems like these by defining a [very complete specification][ecma-spec] (standardized by ECMA) containing the details you need for the COMPLETE lifecycle of a program, from construction and binding through deployment and execution. Thus, among other things, the CLR specifies:
+
+- A GC-aware virtual machine with its own instruction set (called the Common Intermediate Language (CIL)) used to specify the primitive operations that programs perform. This means the CLR is not dependent on a particular type of CPU.
+- A rich meta data representation for program declarations (e.g., types, fields, methods, etc), so that compilers generating other executables have the information they need to call functionality from 'outside'.
+- A file format that specifies exactly how to lay the bits down in a file, so that you can properly speak of a CLR EXE that is not tied to a particular operating system or computer hardware.
+- The lifetime semantics of a loaded program, the mechanism by which one CLR EXE file can refer to another CLR EXE and the rules on how the runtime finds the referenced files at execution time.
+- A class library that leverages the features that the CLR provides (e.g., garbage collection, exceptions, or generic types) to give access both to basic functionality (e.g., integers, strings, arrays, lists, or dictionaries) as well as to operating system services (e.g., files, network, or user interaction).
+
+Multi-language Support
+----------------------
+
+Defining, specifying and implementing all of these details is a huge undertaking, which is why complete abstractions like the CLR are very rare. In fact, the vast majority of such reasonably complete abstractions were built for single languages. For example, the Java runtime, the Perl interpreter or the early version of the Visual Basic runtime offer similarly complete abstraction boundaries. What distinguishes the CLR from these earlier efforts is its multi-language nature. With the possible exception of Visual Basic (because it leverages the COM object model), the experience within the language is often very good, but interoperating with programs written in other languages is very difficult at best. Interoperation is difficult because these languages can only communicate with "foreign" languages by using the primitives provided by the operating system. Because the OS abstraction level is so low (e.g., the operating system has no concept of a garbage-collected heap), needlessly complicated techniques are necessary. By providing a COMMON LANGUAGE RUNTIME, the CLR allows languages to communicate with each other with high-level constructs (e.g., GC-collected structures), easing the interoperation burden dramatically.
+
+Because the runtime is shared among _many_ languages, it means that more resources can be put into supporting it well. Building good debuggers and profilers for a language is a lot of work, and thus they exist in a full-featured form only for the most important programming languages. Nevertheless, because languages that are implemented on the CLR can reuse this infrastructure, the burden on any particular language is reduced substantially. Perhaps even more important, any language built on the CLR immediately has access to _all_ the class libraries built on top of the CLR. This large (and growing) body of (debugged and supported) functionality is a huge reason why the CLR has been so successful.
+
+In short, the runtime is a complete specification of the exact bits one has to put in a file to create and run a program. The virtual machine that runs these files is at a high level appropriate for implementing a broad class of programming languages. This virtual machine, along with an ever growing body of class libraries that run on that virtual machine, is what we call the common language runtime (CLR).
+
+# The Primary Goal of the CLR
+
+Now that we have basic idea what the CLR is, it is useful to back up just a bit and understand the problem the runtime was meant to solve. At a very high level, the runtime has only one goal:
+
+> The goal of the CLR is to make programming easy.
+
+This statement is useful for two reasons. First, it is a _very_ useful guiding principle as the runtime evolves. For example, fundamentally only simple things can be easy, so adding **user visible** complexity to the runtime should always be viewed with suspicion. More important than the cost/benefit ratio of a feature is its _added exposed complexity/weighted benefit over all scenarios_ ratio. Ideally, this ratio is negative (that is, the new feature reduces complexity by removing restrictions or by generalizing existing special cases); however, more typically it is kept low by minimizing the exposed complexity and maximizing the number of scenarios to which the feature adds value.
+
+The second reason this goal is so important is that **ease of use is the fundamental reason for the CLR's success**. The CLR is not successful because it is faster or smaller than writing native code (in fact, well-written native code often wins). The CLR is not successful because of any particular feature it supports (like garbage collection, platform independence, object-oriented programming or versioning support). The CLR is successful because all of those features, as well as numerous others, combine to make programming significantly easier than it would be otherwise. Some important but often overlooked ease of use features include:
+
+1. Simplified languages (e.g., C# and Visual Basic are significantly simpler than C++)
+2. A dedication to simplicity in the class library (e.g., we only have one string type, and it is immutable; this greatly simplifies any API that uses strings)
+3. Strong consistency in the naming in the class library (e.g., requiring APIs to use whole words and consistent naming conventions)
+4. Great support in the tool chain needed to create an application (e.g., Visual Studio makes building CLR applications very simple, and Intellisense makes finding the right types and methods to create the application very easy).
+
+It is this dedication to ease of use (which goes hand in hand with simplicity of the user model) that stands out as the reason for the success of the CLR. Oddly, some of the most important ease-of-use features are also the most "boring." For example, any programming environment could apply consistent naming conventions, yet actually doing so across a large class library is quite a lot of work. Often such efforts conflict with other goals (such as retaining compatibility with existing interfaces), or they run into significant logistical concerns (such as the cost of renaming a method across a _very_ large code base). It is at times like these that we have to remind ourselves about our number-one overarching goal of the runtime and ensure that we are have our priorities straight to reach that goal.
+
+# Fundamental Features of the CLR
+
+The runtime has many features, so it is useful to categorize them as follows:
+
+1. Fundamental features – Features that have broad impact on the design of other features. These include:
+ a. Garbage Collection
+ b. Memory Safety and Type Safety
+ c. High level support for programming languages.
+
+2. Secondary features – Features enabled by the fundamental features that may not be required by many useful programs:
+ a. Program isolation with AppDomains
+ b. Program Security and sandboxing
+
+3. Other Features – Features that all runtime environments need but that do not leverage the fundamental features of the CLR. Instead, they are the result of the desire to create a complete programming environment. Among them are:
+ a. Versioning
+ b. Debugging/Profiling
+ c. Interoperation
+
+## The CLR Garbage Collector (GC)
+
+Of all the features that the CLR provides, the garbage collector deserves special notice. Garbage collection (GC) is the common term for automatic memory reclamation. In a garbage-collected system, user programs no longer need to invoke a special operator to delete memory. Instead the runtime automatically keeps track of all references to memory in the garbage-collected heap, and from time-to-time, it will traverse these references to find out which memory is still reachable by the program. All other memory is _garbage_ and can be reused for new allocations.
+
+Garbage collection is a wonderful user feature because it simplifies programming. The most obvious simplification is that most explicit delete operations are no longer necessary. While removing the delete operations is important, the real value to the programmer is a bit more subtle:
+
+1. Garbage collection simplifies interface design because you no longer have to carefully specify which side of the interface is responsible for deleting objects passed across the interface. For example, CLR interfaces simply return strings; they don't take string buffers and lengths. This means they don't have to deal with the complexity of what happens when the buffers are too small. Thus, garbage collection allows ALL interfaces in the runtime to be simpler than they otherwise would be.
+2. Garbage collection eliminates a whole class of common user mistakes. It is frightfully easy to make mistakes concerning the lifetime of a particular object, either deleting it too soon (leading to memory corruption), or too late (unreachable memory leaks). Since a typical program uses literally MILLIONS of objects, the probability for error is quite high. In addition, tracking down lifetime bugs is very difficult, especially if the object is referenced by many other objects. Making this class of mistakes impossible avoids a lot of grief.
+
+Still, it is not the usefulness of garbage collection that makes it worthy of special note here. More important is the simple requirement it places on the runtime itself:
+
+> Garbage collection requires ALL references to the GC heap to be tracked.
+
+While this is a very simple requirement, it in fact has profound ramifications for the runtime. As you can imagine, knowing where every pointer to an object is at every moment of program execution can be quite difficult. We have one mitigating factor, though. Technically, this requirement only applies to when a GC actually needs to happen (thus, in theory we don't need to know where all GC references are all the time, but only at the time of a GC). In practice, however, this mitigation doesn't completely apply because of another feature of the CLR:
+
+> The CLR supports multiple concurrent threads of execution with a single process.
+
+At any time some other thread of execution might perform an allocation that requires a garbage collection. The exact sequence of operations across concurrently executing threads is non-deterministic. We can't tell exactly what one thread will be doing when another thread requests an allocation that will trigger a GC. Thus, GCs can really happen any time. Now the CLR does NOT need to respond _immediately_ to another thread's desire to do a GC, so the CLR has a little "wiggle room" and doesn't need to track GC references at _all_ points of execution, but it _does_ need to do so at enough places that it can guarantee "timely" response to the need to do a GC caused by an allocation on another thread.
+
+What this means is that the CLR needs to track _all_ references to the GC heap _almost_ all the time. Since GC references may reside in machine registers, in local variables, statics, or other fields, there is quite a bit to track. The most problematic of these locations are machine registers and local variables because they are so intimately related to the actual execution of user code. Effectively, what this means is that the _machine code_ that manipulates GC references has another requirement: it must track all the GC references that it uses. This implies some extra work for the compiler to emit the instructions to track the references.
+
+To learn more, check out the [Garbage Collector design document](garbage-collection.md).
+
+## The Concept of "Managed Code"
+
+Code that does the extra bookkeeping so that it can report all of its live GC references "almost all the time" is called _managed code_ (because it is "managed" by the CLR). Code that does not do this is called _unmanaged code_. Thus all code that existed before the CLR is unmanaged code, and in particular, all operating system code is unmanaged.
+
+### The stack unwinding problem
+
+Clearly, because managed code needs the services of the operating system, there will be times when managed code calls unmanaged code. Similarly, because the operating system originally started the managed code, there are also times when unmanaged code calls into managed code. Thus, in general, if you stop a managed program at an arbitrary location, the call stack will have a mixture of frames created by managed code and frames created by unmanaged code.
+
+The stack frames for unmanaged code have _no_ requirements on them over and above running the program. In particular, there is no requirement that they can be _unwound_ at runtime to find their caller. What this means is that if you stop a program at an arbitrary place, and it happens to be in a unmanaged method, there is no way in general<sup>[1]</sup> to find who the caller was. You can only do this in the debugger because of extra information stored in the symbolic information (PDB file). This information is not guaranteed to be available (which is why you sometimes don't get good stack traces in a debugger). This is quite problematic for managed code, because any stack that can't be unwound might in fact contain managed code frames (which contain GC references that need to be reported).
+
+Managed code has additional requirements on it: not only must it track all the GC references it uses during its execution, but it must also be able to unwind to its caller. Additionally, whenever there is a transition from managed code to unmanaged code (or the reverse), managed code must also do additional bookkeeping to make up for the fact that unmanaged code does not know how to unwind its stack frames. Effectively, managed code links together the parts of the stack that contain managed frames. Thus, while it still may be impossible to unwind the unmanaged stack frames without additional information, it will always be possible to find the chunks of the stack that correspond to managed code and to enumerate the managed frames in those chunks.
+
+[1] More recent platform ABIs (application binary interfaces) define conventions for encoding this information, however there is typically not a strict requirement for all code to follow them.
+
+### The "World" of Managed Code
+
+The result is that special bookkeeping is needed at every transition to and from managed code. Managed code effectively lives in its own "world" where execution can't enter or leave unless the CLR knows about it. The two worlds are in a very real sense distinct from one another (at any point in time the code is in the _managed world_ or the _unmanaged world_). Moreover, because the execution of managed code is specified in a CLR format (with its [Common Intermediate Language][cil-spec] (CIL)), and it is the CLR that converts it to run on the native hardware, the CLR has _much_ more control over exactly what that execution does. For example, the CLR could change the meaning of what it means to fetch a field from an object or call a function. In fact, the CLR does exactly this to support the ability to create MarshalByReference objects. These appear to be ordinary local objects, but in fact may exist on another machine. In short, the managed world of the CLR has a large number of _execution hooks_ that it can use to support powerful features which will be explained in more detail in the coming sections.
+
+In addition, there is another important ramification of managed code that may not be so obvious. In the unmanaged world, GC pointers are not allowed (since they can't be tracked), and there is a bookkeeping cost associated with transitioning from managed to unmanaged code. What this means is that while you _can_ call arbitrary unmanaged functions from managed code, it is often not pleasant to do so. Unmanaged methods don't use GC objects in their arguments and return types, which means that any "objects" or "object handles" that those unmanaged functions create and use need to be explicitly deallocated. This is quite unfortunate. Because these APIs can't take advantage of CLR functionality such as exceptions or inheritance, they tend to have a "mismatched" user experience compared to how the interfaces would have been designed in managed code.
+
+The result of this is that unmanaged interfaces are almost always _wrapped_ before being exposed to managed code developers. For example, when accessing files, you don't use the Win32 CreateFile functions provided by the operating system, but rather the managed System.IO.File class that wraps this functionality. It is in fact extremely rare that unmanaged functionality is exposed to users directly.
+
+While this wrapping may seem to be "bad" in some way (more code that does not seem do much), it is in fact good because it actually adds quite a bit of value. Remember it was always _possible_ to expose the unmanaged interfaces directly; we _chose_ to wrap the functionality. Why? Because the overarching goal of the runtime is to **make programming easy**, and typically the unmanaged functions are not easy enough. Most often, unmanaged interfaces are _not_ designed with ease of use in mind, but rather are tuned for completeness. Anyone looking at the arguments to CreateFile or CreateProcess would be hard pressed to characterize them as "easy." Luckily, the functionality gets a "facelift" when it enters the managed world, and while this makeover is often very "low tech" (requiring nothing more complex than renaming, simplification, and organizing the functionality), it is also profoundly useful. One of the very important documents created for the CLR is the [Framework Design Guidelines][fx-design-guidelines]. This 800+ page document details best practices in making new managed class libraries.
+
+Thus, we have now seen that managed code (which is intimately involved with the CLR) differs from unmanaged code in two important ways:
+
+1. High Tech: The code lives in a distinct world, where the CLR controls most aspects of program execution at a very fine level (potentially to individual instructions), and the CLR detects when execution enters and exits managed code. This enables a wide variety of useful features.
+2. Low Tech: The fact that there is a transition cost when going from managed to unmanaged code, as well as the fact that unmanaged code cannot use GC objects encourages the practice of wrapping most unmanaged code in a managed façade. This means interfaces can get a "facelift" to simplify them and to conform to a uniform set of naming and design guidelines that produce a level of consistency and discoverability that could have existed in the unmanaged world, but does not.
+
+**Both** of these characteristics are very important to the success of managed code.
+
+## Memory and Type Safety
+
+One of the less obvious but quite far-reaching features that a garbage collector enables is that of memory safety. The invariant of memory safety is very simple: a program is memory safe if it accesses only memory that has been allocated (and not freed). This simply means that you don't have "wild" (dangling) pointers that are pointing at random locations (more precisely, at memory that was freed prematurely). Clearly, memory safety is a property we want all programs to have. Dangling pointers are always bugs, and tracking them down is often quite difficult.
+
+> A GC _is_ necessary to provide memory safety guarantees
+
+One can quickly see how a garbage collector helps in ensuring memory safety because it removes the possibility that users will prematurely free memory (and thus access memory that was not properly allocated). What may not be so obvious is that if you want to guarantee memory safety (that is make it _impossible_ for programmers to create memory-unsafe programs), practically speaking you can't avoid having a garbage collector. The reason for this is that non-trivial programs need _heap style_ (dynamic) memory allocations, where the lifetime of the objects is essentially under arbitrary program control (unlike stack-allocated, or statically-allocated memory, which has a highly constrained allocation protocol). In such an unconstrained environment, the problem of determining whether a particular explicit delete statement is correct becomes impossible to predict by program analysis. Effectively, the only way you have to determine if a delete is correct is to check it at runtime. This is exactly what a GC does (checks to see if memory is still live). Thus, for any programs that need heap-style memory allocations, if you want to guarantee memory safety, you _need_ a GC.
+
+While a GC is necessary to ensure memory safety, it is not sufficient. The GC will not prevent the program from indexing off the end of an array or accessing a field off the end of an object (possible if you compute the field's address using a base and offset computation). However, if we do prevent these cases, then we can indeed make it impossible for a programmer to create memory-unsafe programs.
+
+While the [common intermediate language][cil-spec] (CIL) _does_ have operators that can fetch and set arbitrary memory (and thus violate memory safety), it also has the following memory-safe operators and the CLR strongly encourages their use in most programming:
+
+1. Field-fetch operators (LDFLD, STFLD, LDFLDA) that fetch (read), set and take the address of a field by name.
+2. Array-fetch operators (LDELEM, STELEM, LDELEMA) that fetch, set and take the address of an array element by index. All arrays include a tag specifying their length. This facilitates an automatic bounds check before each access.
+
+By using these operators instead of the lower-level (and unsafe) _memory-fetch_ operators in user code, as well as avoiding other unsafe [CIL][cil-spec] operators (e.g., those that allow you to jump to arbitrary, and thus possibly bad locations) one could imagine building a system that is memory-safe but nothing more. The CLR does not do this, however. Instead the CLR enforces a stronger invariant: type safety.
+
+For type safety, conceptually each memory allocation is associated with a type. All operators that act on memory locations are also conceptually tagged with the type for which they are valid. Type safety then requires that memory tagged with a particular type can only undergo operations allowed for that type. Not only does this ensure memory safety (no dangling pointers), it also allows additional guarantees for each individual type.
+
+One of the most important of these type-specific guarantees is that the visibility attributes associated with a type (and in particular with fields) are enforced. Thus, if a field is declared to be private (accessible only by the methods of the type), then that privacy will indeed be respected by all other type-safe code. For example, a particular type might declare a count field that represents the count of items in a table. Assuming the fields for the count and the table are private, and assuming that the only code that updates them updates them together, there is now a strong guarantee (across all type-safe code) that the count and the number of items in the table are indeed in sync. When reasoning about programs, programmers use the concept of type safety all the time, whether they know it or not. The CLR elevates type-safety from being simply a programming language/compiler convention, to something that can be strictly enforced at run time.
+
+### Verifiable Code - Enforcing Memory and Type Safety
+
+Conceptually, to enforce type safety, every operation that the program performs has to be checked to ensure that it is operating on memory that was typed in a way that is compatible with the operation. While the system could do this all at runtime, it would be very slow. Instead, the CLR has the concept of [CIL][cil-spec] verification, where a static analysis is done on the [CIL][cil-spec] (before the code is run) to confirm that most operations are indeed type-safe. Only when this static analysis can't do a complete job are runtime checks necessary. In practice, the number of run-time checks needed is actually very small. They include the following operations:
+
+1. Casting a pointer to a base type to be a pointer to a derived type (the opposite direction can be checked statically)
+2. Array bounds checks (just as we saw for memory safety)
+3. Assigning an element in an array of pointers to a new (pointer) value. This particular check is only required because CLR arrays have liberal casting rules (more on that later...)
+
+Note that the need to do these checks places requirements on the runtime. In particular:
+
+1. All memory in the GC heap must be tagged with its type (so the casting operator can be implemented). This type information must be available at runtime, and it must be rich enough to determine if casts are valid (e.g., the runtime needs to know the inheritance hierarchy). In fact, the first field in every object on the GC heap points to a runtime data structure that represents its type.
+2. All arrays must also have their size (for bounds checking).
+3. Arrays must have complete type information about their element type.
+
+Luckily, the most expensive requirement (tagging each heap item) was something that was already necessary to support garbage collection (the GC needs to know what fields in every object contain references that need to be scanned), so the additional cost to provide type safety is low.
+
+Thus, by verifying the [CIL][cil-spec] of the code and by doing a few run-time checks, the CLR can ensure type safety (and memory safety). Nevertheless, this extra safety exacts a price in programming flexibility. While the CLR does have general memory fetch operators, these operators can only be used in very constrained ways for the code to be verifiable. In particular, all pointer arithmetic will fail verification today. Thus many classic C or C++ conventions cannot be used in verifiable code; you must use arrays instead. While this constrains programming a bit, it really is not bad (arrays are quite powerful), and the benefits (far fewer "nasty" bugs), are quite real.
+
+The CLR strongly encourages the use of verifiable, type-safe code. Even so, there are times (mostly when dealing with unmanaged code) that unverifiable programming is needed. The CLR allows this, but the best practice here is to try to confine this unsafe code as much as possible. Typical programs have only a very small fraction of their code that needs to be unsafe, and the rest can be type-safe.
+
+## High Level Features
+
+Supporting garbage collection had a profound effect on the runtime because it requires that all code must support extra bookkeeping. The desire for type-safety also had a profound effect, requiring that the description of the program (the [CIL][cil-spec]) be at a high level, where fields and methods have detailed type information. The desire for type safety also forces the [CIL][cil-spec] to support other high-level programming constructs that are type-safe. Expressing these constructs in a type-safe manner also requires runtime support. The two most important of these high-level features are used to support two essential elements of object oriented programming: inheritance and virtual call dispatch.
+
+### Object Oriented Programming
+
+Inheritance is relatively simple in a mechanical sense. The basic idea is that if the fields of type `derived` are a superset of the fields of type `base`, and `derived` lays out its fields so the fields of `base` come first, then any code that expects a pointer to an instance of `base` can be given a pointer to an instance of `derived` and the code will "just work". Thus, type `derived` is said to inherit from `base`, meaning that it can be used anywhere `base` can be used. Code becomes _polymorphic_ because the same code can be used on many distinct types. Because the runtime needs to know what type coercions are possible, the runtime must formalize the way inheritance is specified so it can validate type safety.
+
+Virtual call dispatch generalizes inheritance polymorphism. It allows base types to declare methods that will be _overridden_ by derived types. Code that uses variables of type `base` can expect that calls to virtual methods will be dispatched to the correct overridden method based on the actual type of the object at run time. While such _run-time dispatch logic_ could have been implemented using primitive [CIL][cil-spec] instructions without direct support in the runtime, it would have suffered from two important disadvantages
+
+1. It would not be type safe (mistakes in the dispatch table are catastrophic errors)
+2. Each object-oriented language would likely implement a slightly different way of implementing its virtual dispatch logic. As result, interoperability among languages would suffer (one language could not inherit from a base type implemented in another language).
+
+For this reason, the CLR has direct support for basic object-oriented features. To the degree possible, the CLR tried to make its model of inheritance "language neutral," in the sense that different languages might still share the same inheritance hierarchy. Unfortunately, that was not always possible. In particular, multiple inheritance can be implemented in many different ways. The CLR chose not to support multiple inheritance on types with fields, but does support multiple inheritance from special types (called interfaces) that are constrained not to have fields.
+
+It is important to keep in mind that while the runtime supports these object-oriented concepts, it does not require their use. Languages without the concept of inheritance (e.g., functional languages) simply don't use these facilities.
+
+### Value Types (and Boxing)
+
+A profound, yet subtle aspect of object oriented programming is the concept of object identity: the notion that objects (allocated by separate allocation calls) can be distinguished, even if all their field values are identical. Object identity is strongly related to the fact that objects are accessed by reference (pointer) rather than by value. If two variables hold the same object (their pointers address the same memory), then updates to one of the variables will affect the other variable.
+
+Unfortunately, the concept of object identity is not a good semantic match for all types. In particular, programmers don't generally think of integers as objects. If the number '1' was allocated at two different places, programmers generally want to consider those two items equal, and certainly don't want updates to one of those instances affecting the other. In fact, a broad class of programming languages called `functional languages' avoid object identity and reference semantics altogether.
+
+While it is possible to have a "pure" object oriented system, where everything (including integers) is an object (Smalltalk-80 does this), a certain amount of implementation "gymnastics" is necessary to undo this uniformity to get an efficient implementation. Other languages (Perl, Java, JavaScript) take a pragmatic view and treat some types (like integers) by value, and others by reference. The CLR also chose a mixed model, but unlike the others, allowed user-defined value types.
+
+The key characteristics of value types are:
+
+1. Each local variable, field, or array element of a value type has a distinct copy of the data in the value.
+2. When one variable, field or array element is assigned to another, the value is copied.
+3. Equality is always defined only in terms of the data in the variable (not its location).
+4. Each value type also has a corresponding reference type which has only one implicit, unnamed field. This is called its boxed value. Boxed value types can participate in inheritance and have object identity (although using the object identity of a boxed value type is strongly discouraged).
+
+Value types very closely model the C (and C++) notion of a struct (or C++ class). Like C you can have pointers to value types, but the pointers are a type distinct from the type of the struct.
+
+### Exceptions
+
+Another high-level programming construct that the CLR directly supports is exceptions. Exceptions are a language feature that allow programmers to _throw_ an arbitrary object at the point that a failure occurs. When an object is thrown, the runtime searches the call stack for a method that declares that it can _catch_ the exception. If such a catch declaration is found, execution continues from that point. The usefulness of exceptions is that they avoid the very common mistake of not checking if a called method fails. Given that exceptions help avoid programmer mistakes (thus making programming easier), it is not surprising that the CLR supports them.
+
+As an aside, while exceptions avoid one common error (not checking for failure), they do not prevent another (restoring data structures to a consistent state in the event of a failure). This means that after an exception is caught, it is difficult in general to know if continuing execution will cause additional errors (caused by the first failure). This is an area where the CLR is likely to add value in the future. Even as currently implemented, however, exceptions are a great step forward (we just need to go further).
+
+### Parameterized Types (Generics)
+
+Previous to version 2.0 of the CLR, the only parameterized types were arrays. All other containers (such as hash tables, lists, queues, etc.), all operated on a generic Object type. The inability to create List<ElemT>, or Dictionary<KeyT, ValueT> certainly had a negative performance effect because value types needed to be boxed on entry to a collection, and explicit casting was needed on element fetch. Nevertheless, that is not the overriding reason for adding parameterized types to the CLR. The main reason is that **parameterized types make programming easier**.
+
+The reason for this is subtle. The easiest way to see the effect is to imagine what a class library would look like if all types were replaced with a generic Object type. This effect is not unlike what happens in dynamically typed languages like JavaScript. In such a world, there are simply far more ways for a programmer to make incorrect (but type-safe) programs. Is the parameter for that method supposed to be a list? a string? an integer? any of the above? It is no longer obvious from looking at the method's signature. Worse, when a method returns an Object, what other methods can accept it as a parameter? Typical frameworks have hundreds of methods; if they all take parameters of type Object, it becomes very difficult to determine which Object instances are valid for the operations the method will perform. In short, strong typing help a programmer express his intent more clearly, and allows tools (e.g., the compiler) to enforce his intent. This results in big productivity boost.
+
+These benefits do not disappear just because the type gets put into a List or a Dictionary, so clearly parameterized types have value. The only real question is whether parameterized types are best thought of as a language specific feature which is "compiled out" by the time CIL is generated, or whether this feature should have first class support in the runtime. Either implementation is certainly possible. The CLR team chose first class support because without it, parameterized types would be implemented different ways by different languages. This would imply that interoperability would be cumbersome at best. In addition, expressing programmer intent for parameterized types is most valuable _at the interface_ of a class library. If the CLR did not officially support parameterized types, then class libraries could not use them, and an important usability feature would be lost.
+
+### Programs as Data (Reflection APIs)
+
+The fundamentals of the CLR are garbage collection, type safety, and high-level language features. These basic characteristics forced the specification of the program (the CIL) to be fairly high level. Once this data existed at run time (something not true for C or C++ programs), it became obvious that it would also be valuable to expose this rich data to end programmers. This idea resulted in the creation of the System.Reflection interfaces (so-called because they allow the program to look at (reflect upon) itself). This interface allows you to explore almost all aspects of a program (what types it has, the inheritance relationship, and what methods and fields are present). In fact, so little information is lost that very good "decompilers" for managed code are possible (e.g., [NET Reflector](http://www.red-gate.com/products/reflector/)). While those concerned with intellectual property protection are aghast at this capability (which can be fixed by purposefully destroying information through an operation called _obfuscating_ the program), the fact that it is possible is a testament to the richness of the information available at run time in managed code.
+
+In addition to simply inspecting programs at run time, it is also possible to perform operations on them (e.g., invoke methods, set fields, etc.), and perhaps most powerfully, to generate code from scratch at run time (System.Reflection.Emit). In fact, the runtime libraries use this capability to create specialized code for matching strings (System.Text.RegularExpressions), and to generate code for "serializing" objects to store in a file or send across the network. Capabilities like this were simply infeasible before (you would have to write a compiler!) but thanks to the runtime, are well within reach of many more programming problems.
+
+While reflection capabilities are indeed powerful, that power should be used with care. Reflection is usually significantly slower than its statically compiled counterparts. More importantly, self-referential systems are inherently harder to understand. This means that powerful features such as Reflection or Reflection.Emit should only be used when the value is clear and substantial.
+
+# Other Features
+
+The last grouping of runtime features are those that are not related to the fundamental architecture of the CLR (GC, type safety, high-level specification), but nevertheless fill important needs of any complete runtime system.
+
+## Interoperation with Unmanaged Code
+
+Managed code needs to be able to use functionality implemented in unmanaged code. There are two main "flavors" of interoperation. First is the ability simply to call unmanaged functions (this is called Platform Invoke or PINVOKE). Unmanaged code also has an object-oriented model of interoperation called COM (component object model) which has more structure than ad hoc method calls. Since both COM and the CLR have models for objects and other conventions (how errors are handled, lifetime of objects, etc.), the CLR can do a better job interoperating with COM code if it has special support.
+
+## Ahead of time Compilation
+
+In the CLR model, managed code is distributed as CIL, not native code. Translation to native code occurs at run time. As an optimization, the native code that is generated from the CIL can be saved in a file using a tool called crossgen (similar to .NET Framework NGEN tool). This avoids large amounts of compilation time at run time and is very important because the class library is so large.
+
+## Threading
+
+The CLR fully anticipated the need to support multi-threaded programs in managed code. From the start, the CLR libraries contained the System.Threading.Thread class which is a 1-to-1 wrapper over the operating system notion of a thread of execution. However, because it is just a wrapper over the operating system thread, creating a System.Threading.Thread is relatively expensive (it takes milliseconds to start). While this is fine for many operations, one style of programming creates very small work items (taking only tens of milliseconds). This is very common in server code (e.g., each task is serving just one web page) or in code that tries to take advantage of multi-processors (e.g., a multi-core sort algorithm). To support this, the CLR has the notion of a ThreadPool which allows WorkItems to be queued. In this scheme, the CLR is responsible for creating the necessary threads to do the work. While the CLR does expose the ThreadPool directly as the System.Threading.Threadpool class, the preferred mechanism is to use the [Task Parallel Library](https://msdn.microsoft.com/en-us/library/dd460717(v=vs.110).aspx), which adds additional support for very common forms of concurrency control.
+
+From an implementation perspective, the important innovation of the ThreadPool is that it is responsible for ensuring that the optimal number of threads are used to dispatch the work. The CLR does this using a feedback system where it monitors the throughput rate and the number of threads and adjusts the number of threads to maximize the throughput. This is very nice because now programmers can think mostly in terms of "exposing parallelism" (that is, creating work items), rather than the more subtle question of determining the right amount of parallelism (which depends on the workload and the hardware on which the program is run).
+
+# Summary and Resources
+
+Phew! The runtime does a lot! It has taken many pages just to describe _some_ of the features of the runtime, without even starting to talk about internal details. The hope is, however, that this introduction will provide a useful framework for a deeper understanding of those internal details. The basic outline of this framework is:
+
+- The Runtime is a complete framework for supporting programming languages
+- The Runtime's goal is to make programming easy.
+- The Fundamental features of the runtime are:
+ - Garbage Collection
+ - Memory and Type Safety
+ - Support for High-Level Language Features
+
+## Useful Links
+
+- [MSDN Entry for the CLR][clr]
+- [Wikipedia Entry for the CLR](http://en.wikipedia.org/wiki/Common_Language_Runtime)
+- [ECMA Standard for the Common Language Infrastructure (CLI)][ecma-spec]
+- [.NET Framework Design Guidelines](http://msdn.microsoft.com/en-us/library/ms229042.aspx)
+- [CoreCLR Repo Documentation](README.md)
+
+[clr]: http://msdn.microsoft.com/library/8bs2ecf4.aspx
+[ecma-spec]: ../project-docs/dotnet-standards.md
+[cil-spec]: http://download.microsoft.com/download/7/3/3/733AD403-90B2-4064-A81E-01035A7FE13C/MS%20Partition%20III.pdf
+[fx-design-guidelines]: http://msdn.microsoft.com/en-us/library/ms229042.aspx
diff --git a/Documentation/botr/method-descriptor.md b/Documentation/botr/method-descriptor.md
new file mode 100644
index 0000000..bce0bff
--- /dev/null
+++ b/Documentation/botr/method-descriptor.md
@@ -0,0 +1,343 @@
+Method Descriptor
+=================
+
+Author: Jan Kotas ([@jkotas](https://github.com/jkotas)) - 2006
+
+Introduction
+============
+
+MethodDesc (method descriptor) is the internal representation of a managed method. It serves several purposes:
+
+- Provides a unique method handle, usable throughout the runtime. For normal methods, the MethodDesc is a unique handle for a <module, metadata token, instantiation> triplet.
+- Caches frequently used information that is expensive to compute from metadata (e.g. whether the method is static).
+- Captures the runtime state of the method (e.g. whether the code has been generated for the method already).
+- Owns the entry point of the method.
+
+Design Goals and Non-goals
+--------------------------
+
+### Goals
+
+**Performance:** The design of MethodDesc is heavily optimized for size, since there is one of them for every method. For example, the MethodDesc for a normal non-generic method is 8 bytes in the current design.
+
+### Non-goals
+
+**Richness:** The MethodDesc does not cache all information about the method. It is expected that the underlying metadata has to be accessed for less frequently used information (e.g. method signature).
+
+Design of MethodDesc
+====================
+
+Kinds of MethodDescs
+--------------------
+
+There are multiple kinds of MethodDescs:
+
+**IL**
+
+Used for regular IL methods.
+
+**Instantiated**
+
+Used for less common IL methods that have generic instantiation or that do not have preallocated slot in method table.
+
+**FCall**
+
+Internal methods implemented in unmanaged code. These are [methods marked with MethodImplAttribute(MethodImplOptions.InternalCall) attribute](mscorlib.md), delegate constructors and tlbimp constructors.
+
+**NDirect**
+
+P/Invoke methods. These are methods marked with DllImport attribute.
+
+**EEImpl**
+
+Delegate methods whose implementation is provided by the runtime (Invoke, BeginInvoke, EndInvoke). See [ECMA 335 Partition II - Delegates](../project-docs/dotnet-standards.md).
+
+**Array**
+
+Array methods whose implementation is provided by the runtime (Get, Set, Address). See [ECMA Partition II – Arrays](../project-docs/dotnet-standards.md).
+
+**ComInterop**
+
+COM interface methods. Since the non-generic interfaces can be used for COM interop by default, this kind is usually used for all interface methods.
+
+**Dynamic**
+
+Dynamically created methods without underlying metadata. Produced by Stub-as-IL and LKG (light-weight code generation).
+
+Alternative Implementations
+---------------------------
+
+Virtual methods and inheritance would be the natural way to implement various kinds of MethodDesc in C++. The virtual methods would add vtable pointer to each MethodDesc, wasting a lot of precious space. The vtable pointer occupies 4 bytes on x86. Instead, the virtualization is implemented by switching based on the MethodDesc kind, which fits into 3 bits. For example:
+
+```c++
+DWORD MethodDesc::GetAttrs()
+{
+ if (IsArray())
+ return ((ArrayMethodDesc*)this)->GetAttrs();
+
+ if (IsDynamic())
+ return ((DynamicMethodDesc*)this)->GetAttrs();
+
+ return GetMDImport()->GetMethodDefProps(GetMemberDef());
+}
+```
+
+Method Slots
+------------
+
+Each MethodDesc has a slot, which contains the entry point of the method. The slot and entry point must exist for all methods, even the ones that never run like abstract methods. There are multiple places in the runtime that depend on the 1:1 mapping between entry points and MethodDescs, making this relationship an invariant.
+
+The slot is either in MethodTable or in MethodDesc itself. The location of the slot is determined by `mdcHasNonVtableSlot` bit on MethodDesc.
+
+The slot is stored in MethodTable for methods that require efficient lookup via slot index, e.g. virtual methods or methods on generic types. The MethodDesc contains the slot index to allow fast lookup of the entry point in this case.
+
+Otherwise, the slot is part of the MethodDesc itself. This arrangement improves data locality and saves working set. Also, it is not even always possible to preallocate a slot in a MethodTable upfront for dynamically created MethodDescs, such as for methods added by Edit & Continue, instantiations of generic methods or [dynamic methods](https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Reflection/Emit/DynamicMethod.cs).
+
+MethodDesc Chunks
+-----------------
+
+The MethodDescs are allocated in chunks to save space. Multiple MethodDesc tend to have identical MethodTable and upper bits of metadata token. MethodDescChunk is formed by hoisting the common information in front of an array of multiple MethodDescs. The MethodDesc contains just the index of itself in the array.
+
+![Figure 1](../images/methoddesc-fig1.png)
+
+Figure 1 MethodDescChunk and MethodTable
+
+Debugging
+---------
+
+The following SOS commands are useful for debugging MethodDesc:
+
+- **DumpMD** – dump the MethodDesc content:
+
+ !DumpMD 00912fd8
+ Method Name: My.Main()
+ Class: 009111ec
+ MethodTable: 00912fe8md
+ Token: 06000001
+ Module: 00912c14
+ IsJitted: yes
+ CodeAddr: 00ca0070
+
+- **IP2MD** – find MethodDesc for given code address:
+
+ !ip2md 00ca007c
+ MethodDesc: 00912fd8
+ Method Name: My.Main()
+ Class: 009111ec
+ MethodTable: 00912fe8md
+ Token: 06000001
+ Module: 00912c14
+ IsJitted: yes
+ CodeAddr: 00ca0070
+
+- **Name2EE** – find MethodDesc for given method name:
+
+ !name2ee hello.exe My.Main
+ Module: 00912c14 (hello.exe)
+ Token: 0x06000001
+ MethodDesc: 00912fd8
+ Name: My.Main()
+ JITTED Code Address: 00ca0070
+
+- **Token2EE** – find MethodDesc for given token (useful for finding MethodDesc for methods with weird names):
+
+ !token2ee hello.exe 0x06000001
+ Module: 00912c14 (hello.exe)
+ Token: 0x06000001
+ MethodDesc: 00912fd
+ 8Name: My.Main()
+ JITTED Code Address: 00ca0070
+
+- **DumpMT** – MD – dump all MethodDescs in the given MethodTable:
+
+ !DumpMT -MD 0x00912fe8
+ ...
+ MethodDesc Table
+ Entry MethodDesc JIT Name
+ 79354bec 7913bd48 PreJIT System.Object.ToString()
+ 793539c0 7913bd50 PreJIT System.Object.Equals(System.Object)
+ 793539b0 7913bd68 PreJIT System.Object.GetHashCode()
+ 7934a4c0 7913bd70 PreJIT System.Object.Finalize()
+ 00ca0070 00912fd8 JIT My.Main()
+ 0091303c 00912fe0 NONE My..ctor()
+
+A MethodDesc has fields with the name and signature of the method on debug builds. This is useful for debugging when the runtime state is severely corrupted and the SOS extension does not work.
+
+Precode
+=======
+
+The precode is a small fragment of code used to implement temporary entry points and an efficient wrapper for stubs. Precode is a niche code-generator for these two cases, generating the most efficient code possible. In an ideal world, all native code dynamically generated by the runtime would be produced by the JIT. That's not feasible in this case, given the specific requirements of these two scenarios. The basic precode on x86 may look like this:
+
+ mov eax,pMethodDesc // Load MethodDesc into scratch register
+ jmp target // Jump to a target
+
+**Efficient Stub wrappers:** The implementation of certain methods (e.g. P/Invoke, delegate invocation, multi dimensional array setters and getters) is provided by the runtime, typically as hand-written assembly stubs. Precode provides a space-efficient wrapper over stubs, to multiplex them for multiple callers.
+
+The worker code of the stub is wrapped by a precode fragment that can be mapped to the MethodDesc and that jumps to the worker code of the stub. The worker code of the stub can be shared between multiple methods this way. It is an important optimization used to implement P/Invoke marshalling stubs. It also creates a 1:1 mapping between MethodDescs and entry points, which establishes a simple and efficient low-level system.
+
+**Temporary entry points:** Methods must provide entry points before they are jitted so that jitted code has an address to call them. These temporary entry points are provided by precode. They are a specific form of stub wrappers.
+
+This technique is a lazy approach to jitting, which provides a performance optimization in both space and time. Otherwise, the transitive closure of a method would need to be jitted before it was executed. This would be a waste, since only the dependencies of taken code branches (e.g. if statement) require jitting.
+
+Each temporary entry point is much smaller than a typical method body. They need to be small since there are a lot of them, even at the cost of performance. The temporary entry points are executed just once before the actual code for the method is generated.
+
+The target of the temporary entry point is a PreStub, which is a special kind of stub that triggers jitting of a method. It atomically replaces the temporary entry point with a stable entry point. The stable entry point has to remain constant for the method lifetime. This invariant is required to guarantee thread safety since the method slot is always accessed without any locks taken.
+
+The **stable entry point** is either the native code or the precode. The **native code** is either jitted code or code saved in NGen image. It is common to talk about jitted code when we actually mean native code.
+
+Temporary entry points are never saved into NGen images. All entry points in NGen images are stable entry points that are never changed. It is an important optimization that reduced private working set.
+
+![Figure 2](../images/methoddesc-fig2.png)
+
+Figure 2 Entry Point State Diagram
+
+A method can have both native code and precode if there is a need to do work before the actual method body is executed. This situation typically happens for NGen image fixups. Native code is an optional MethodDesc slot in this case. This is necessary to lookup the native code of the method in a cheap uniform way.
+
+![Figure 3](../images/methoddesc-fig3.png)
+
+Figure 3 The most complex case of Precode, Stub and Native Code
+
+Single Callable vs. Multi Callable entry points
+-----------------------------------------------
+
+Entry point is needed to call the method. The MethodDesc exposes methods that encapsulate logic to get the most efficient entry point for the given situation. The key difference is whether the entry point will be used to call the method just once or whether it will be used to call the method multiple times.
+
+For example, it may be a bad idea to use the temporary entry point to call the method multiple times since it would go through the PreStub each time. On the other hand, using temporary entry point to call the method just once should be fine.
+
+The methods to get callable entry points from MethodDesc are:
+
+- MethodDesc::GetSingleCallableAddrOfCode
+- MethodDesc::GetMultiCallableAddrOfCode
+- MethodDesc::GetSingleCallableAddrOfVirtualizedCode
+- MethodDesc::GetMultiCallableAddrOfVirtualizedCode
+
+Types of precode
+----------------
+
+There are multiple specialized types of precodes.
+
+The type of precode has to be cheaply computable from the instruction sequence. On x86 and x64, the type of precode is computed by fetching a byte at a constant offset. Of course, this imposes limits on the instruction sequences used to implement the various precode types.
+
+**StubPrecode**
+
+StubPrecode is the basic precode type. It loads MethodDesc into a scratch register and then jumps. It must be implemented for precodes to work. It is used as fallback when no other specialized precode type is available.
+
+All other precodes types are optional optimizations that the platform specific files turn on via HAS\_XXX\_PRECODE defines.
+
+StubPrecode looks like this on x86:
+
+ mov eax,pMethodDesc
+ mov ebp,ebp // dummy instruction that marks the type of the precode
+ jmp target
+
+"target" points to prestub initially. It is patched to point to the final target. The final target (stub or native code) may or may not use MethodDesc in eax. Stubs often use it, native code does not use it.
+
+**FixupPrecode**
+
+FixupPrecode is used when the final target does not require MethodDesc in scratch register<sup>2</sup>. The FixupPrecode saves a few cycles by avoiding loading MethodDesc into the scratch register.
+
+The most common usage of FixupPrecode is for method fixups in NGen images.
+
+The initial state of the FixupPrecode on x86:
+
+ call PrecodeFixupThunk // This call never returns. It pops the return address
+ // and uses it to fetch the pMethodDesc below to find
+ // what the method that needs to be jitted
+ pop esi // dummy instruction that marks the type of the precode
+ dword pMethodDesc
+
+Once it has been patched to point to final target:
+
+ jmp target
+ pop edi
+ dword pMethodDesc
+
+<sup>2</sup> Passing MethodDesc in scratch register is sometimes referred to as **MethodDesc Calling Convention**.
+
+**FixupPrecode chunks**
+
+FixupPrecode chunk is a space efficient representation of multiple FixupPrecodes. It mirrors the idea of MethodDescChunk by hoisting the similar MethodDesc pointers from multiple FixupPrecodes to a shared area.
+
+The FixupPrecode chunk saves space and improves code density of the precodes. The code density improvement from FixupPrecode chunks resulted in 1% - 2% gain in big server scenarios on x64.
+
+The FixupPrecode chunks looks like this on x86:
+
+ jmp Target2
+ pop edi // dummy instruction that marks the type of the precode
+ db MethodDescChunkIndex
+ db 2 (PrecodeChunkIndex)
+
+ jmp Target1
+ pop edi
+ db MethodDescChunkIndex
+ db 1 (PrecodeChunkIndex)
+
+ jmp Target0
+ pop edi
+ db MethodDescChunkIndex
+ db 0 (PrecodeChunkIndex)
+
+ dw pMethodDescBase
+
+One FixupPrecode chunk corresponds to one MethodDescChunk. There is no 1:1 mapping between the FixupPrecodes in the chunk and MethodDescs in MethodDescChunk though. Each FixupPrecode has index of the method it belongs to. It allows allocating the FixupPrecode in the chunk only for methods that need it.
+
+**Compact entry points**
+
+Compact entry point is a space efficient implementation of temporary entry points.
+
+Temporary entry points implemented using StubPrecode or FixupPrecode can be patched to point to the actual code. Jitted code can call temporary entry point directly. The temporary entry point can be multicallable entry points in this case.
+
+Compact entry points cannot be patched to point to the actual code. Jitted code cannot call them directly. They are trading off speed for size. Calls to these entry points are indirected via slots in a table (FuncPtrStubs) that are patched to point to the actual entry point eventually. A request for a multicallable entry point allocates a StubPrecode or FixupPrecode on demand in this case.
+
+The raw speed difference is the cost of an indirect call for a compact entry point vs. the cost of one direct call and one direct jump on the given platform. The the later used to be faster by a few percent in large server scenario since it can be predicted by the hardware better (2005). It is not always the case on current (2015) hardware.
+
+The compact entry points have been historically implemented on x86 only. Their additional complexity, space vs. speed trade-off and hardware advancements made them unjustified on other platforms.
+
+The compact entry point on x86 looks like this:
+
+ entrypoint0:
+ mov al,0
+ jmp short Dispatch
+
+ entrypoint1:
+ mov al,1
+ jmp short Dispatch
+
+ entrypoint2:
+ mov al,2
+ jmp short Dispatch
+
+ Dispatch:
+ movzx eax,al
+ shl eax, 3
+ add eax, pBaseMD
+ jmp PreStub
+
+The allocation of temporary entry points always tries to pick the smallest temporary entry point from the available choices. For example, a single compact entry point is bigger than a single StubPrecode on x86. The StubPrecode will be preferred over the compact entry point in this case. The allocation of the precode for a stable entry point will try to reuse an allocated temporary entry point precode if one exists of the matching type.
+
+**ThisPtrRetBufPrecode**
+
+ThisPtrRetBufPrecode is used to switch a return buffer and the this pointer for open instance delegates returning valuetypes. It is used to convert the calling convention of MyValueType Bar(Foo x) to the calling convention of MyValueType Foo::Bar().
+
+This precode is always allocated on demand as a wrapper of the actual method entry point and stored in a table (FuncPtrStubs).
+
+ThisPtrRetBufPrecode looks like this:
+
+ mov eax,ecx
+ mov ecx,edx
+ mov edx,eax
+ nop
+ jmp entrypoint
+ dw pMethodDesc
+
+**NDirectImportPrecode**
+
+NDirectImportPrecode is used for lazy binding of unmanaged P/Invoke targets. This precode is for convenience and to reduce amount of platform specific plumbing.
+
+Each NDirectMethodDesc has NDirectImportPrecode in addition to the regular precode.
+
+NDirectImportPrecode looks like this on x86:
+
+ mov eax,pMethodDesc
+ mov eax,eax // dummy instruction that marks the type of the precode
+ jmp NDirectImportThunk // loads P/Invoke target for pMethodDesc lazily
diff --git a/Documentation/botr/mscorlib.md b/Documentation/botr/mscorlib.md
new file mode 100644
index 0000000..5b5046e
--- /dev/null
+++ b/Documentation/botr/mscorlib.md
@@ -0,0 +1,357 @@
+Mscorlib and Calling Into the Runtime
+===
+
+Author: Brian Grunkemeyer ([@briangru](https://github.com/briangru)) - 2006
+
+# Introduction
+
+Mscorlib is the assembly for defining the core parts of the type system, and a good portion of the Base Class Library. Base data types live in this assembly, and it has a tight coupling with the CLR. Here you will learn exactly how & why mscorlib.dll is special, and the basics about calling into the CLR from managed code via QCall and FCall methods. It also discusses calling from within the CLR into managed code.
+
+## Dependencies
+
+Since mscorlib defines base data types like Object, Int32, and String, mscorlib cannot depend on other managed assemblies. However, there is a strong dependency between mscorlib and the CLR. Many of the types in mscorlib need to be accessed from native code, so the layout of many managed types is defined both in managed code and in native code inside the CLR. Additionally, some fields may be defined only in debug or checked builds, so typically mscorlib must be compiled separately for checked vs. retail builds.
+
+For 64 bit platforms, some constants are also defined at compile time. So a 64 bit mscorlib.dll is slightly different from a 32 bit mscorlib.dll. Due to these constants, such as IntPtr.Size, most libraries above mscorlib should not need to build separately for 32 bit vs. 64 bit.
+
+## What Makes Mscorlib Special?
+
+Mscorlib has several unique properties, many of which are due to its tight coupling to the CLR.
+
+- Mscorlib defines the core types necessary to implement the CLR's Virtual Object System, such as the base data types (Object, Int32, String, etc).
+- The CLR must load mscorlib on startup to load certain system types.
+- Can only have one mscorlib loaded in the process at a time, due to layout issues. Loading multiple mscorlibs would require formalizing a contract of behavior, FCall methods, and datatype layout between CLR & mscorlib, and keeping that contract relatively stable across versions.
+- Mscorlib's types will be used heavily for native interop, and managed exceptions should map correctly to native error codes/formats.
+- The CLR's multiple JIT compilers may special case a small group of certain methods in mscorlib for performance reasons, both in terms of optimizing away the method (such as Math.Cos(double)), or calling a method in peculiar ways (such as Array.Length, or some implementation details on StringBuilder for getting the current thread).
+- Mscorlib will need to call into native code, via P/Invoke where appropriate, primarily into the underlying operating system or occasionally a platform adaptation layer.
+- Mscorlib will require calling into the CLR to expose some CLR-specific functionality, such as triggering a garbage collection, to load classes, or to interact with the type system in a non-trivial way. This requires a bridge between managed code and native, "manually managed" code within the CLR.
+- The CLR will need to call into managed code to call managed methods, and to get at certain functionality that is only implemented in managed code.
+
+# Interface between managed & CLR code
+
+To reiterate, the needs of managed code in mscorlib include:
+
+- The ability to access fields of some managed data structures in both managed code and "manually managed" code within the CLR
+- Managed code must be able to call into the CLR
+- The CLR must be able to call managed code.
+
+To implement these, we need a way for the CLR to specify and optionally verify the layout of a managed object in native code, a managed mechanism for calling into native code, and a native mechanism for calling into managed code.
+
+The managed mechanism for calling into native code must also support the special managed calling convention used by String's constructors, where the constructor allocates the memory used by the object (instead of the typical convention where the constructor is called after the GC allocates memory).
+
+The CLR provides a [mscorlib binder](https://github.com/dotnet/coreclr/blob/master/src/vm/binder.cpp) internally, providing a mapping between unmanaged types and fields to managed types & fields. The binder will look up & load classes, allow you to call managed methods. It also does some simple verification to ensure the correctness of any layout information specified in both managed & native code. The binder ensures that the managed class you're attempting to use exists in mscorlib, has been loaded, and the field offsets are correct. It also needs the ability to differentiate between method overloads with different signatures.
+
+# Calling from managed to native code
+
+We have two techniques for calling into the CLR from managed code. FCall allows you to call directly into the CLR code, and provides a lot of flexibility in terms of manipulating objects, though it is easy to cause GC holes by not tracking object references correctly. QCall allows you to call into the CLR via the P/Invoke, and is much harder to accidentally mis-use than FCall. FCalls are identified in managed code as extern methods with the MethodImplOptions.InternalCall bit set. QCalls are _static_ extern methods that look like regular P/Invokes, but to a library called "QCall".
+
+There is a small variant of FCall called HCall (for Helper call) for implementing JIT helpers, for doing things like accessing multi-dimensional array elements, range checks, etc. The only difference between HCall and FCall is that HCall methods won't show up in an exception stack trace.
+
+### Choosing between FCall, QCall, P/Invoke, and writing in managed code
+
+First, remember that you should be writing as much as possible in managed code. You avoid a raft of potential GC hole issues, you get a good debugging experience, and the code is often simpler. It also is preparation for ongoing refactoring of mscorlib into smaller layered fully managed libraries in [corefx](https://github.com/dotnet/corefx/).
+
+Reasons to write FCalls in the past generally fell into three camps: missing language features, better performance, or implementing unique interactions with the runtime. C# now has almost every useful language feature that you could get from C++, including unsafe code & stack-allocated buffers, and this eliminates the first two reasons for FCalls. We have ported some parts of the CLR that were heavily reliant on FCalls to managed code in the past (such as Reflection and some Encoding & String operations), and we want to continue this momentum. We may port our number formatting & String comparison code to managed in the future.
+
+If the only reason you're defining a FCall method is to call a native Win32 method, you should be using P/Invoke to call Win32 directly. P/Invoke is the public native method interface, and should be doing everything you need in a correct manner.
+
+If you still need to implement a feature inside the runtime, now consider if there is a way to reduce the frequency of transitioning to native code. Can you write the common case in managed, and only call into native for some rare corner cases? You're usually best off keeping as much as possible in managed code.
+
+QCalls are the preferred mechanism going forward. You should only use FCalls when you are "forced" to. This happens when there is common "short path" through the code that is important to optimize. This short path should not be more than a few hundred instructions, cannot allocate GC memory, take locks or throw exceptions (GC_NOTRIGGER, NOTHROWS). In all other circumstances (and especially when you enter a FCall and then simply erect HelperMethodFrame), you should be using QCall.
+
+FCalls were specifically designed for short paths of code that must be optimized. They allowed you to take explicit control over when erecting a frame was done. However it is error prone and is not worth it for many APIs. QCalls are essentially P/Invokes into CLR.
+
+As a result, QCalls give you some advantageous marshaling for SafeHandles automatically – your native method just takes a HANDLE type, and can use it without worrying whether someone will free the handle while you are in that method body. The resulting FCall method would need to use a SafeHandleHolder, and may need to protect the SafeHandle, etc. Leveraging the P/Invoke marshaler can avoid this additional plumbing code.
+
+## QCall Functional Behavior
+
+QCalls are very much like a normal P/Invoke from mscorlib.dll to CLR. Unlike FCalls, QCalls will marshal all arguments as unmanaged types like a normal P/Invoke. QCall also switch to preemptive GC mode like a normal P/Invoke. These two features should make QCalls easier to write reliably compared to FCalls. QCalls are not prone to GC holes and GC starvation bugs that are common with FCalls.
+
+QCalls perform better than FCalls that erect a HelperMethodFrame. The overhead is about 1.4x less compared to FCall w/ HelperMethodFrame overhead on x86 and x64.
+
+The preferred types for QCall arguments are primitive types that are efficiently handled by the P/Invoke marshaler (INT32, LPCWSTR, BOOL). Notice that BOOL is the correct boolean flavor for QCall arguments. On the other hand, CLR_BOOL is the correct boolean flavor for FCall arguments.
+
+The pointers to common unmanaged EE structures should be wrapped into handle types. This is to make the managed implementation type safe and avoid falling into unsafe C# everywhere. See AssemblyHandle in [vm\qcall.h][qcall] for an example.
+
+[qcall]: https://github.com/dotnet/coreclr/blob/master/src/vm/qcall.h
+
+There is a way to pass a raw object references in and out of QCalls. It is done by wrapping a pointer to a local variable in a handle. It is intentionally cumbersome and should be avoided if reasonably possible. See the StringHandleOnStack in the example below. Returning objects, especially strings, from QCalls is the only common pattern where passing the raw objects is widely acceptable. (For reasoning on why this set of restrictions helps make QCalls less prone to GC holes, read the "GC Holes, FCall, and QCall" section below.)
+
+### QCall Example - Managed Part
+
+Do not replicate the comments into your actual QCall implementation. This is for illustrative purposes.
+
+ class Foo
+ {
+ // All QCalls should have the following DllImport and
+ // SuppressUnmanagedCodeSecurity attributes
+ [DllImport(JitHelpers.QCall, CharSet = CharSet.Unicode)]
+ [SuppressUnmanagedCodeSecurity]
+ // QCalls should always be static extern.
+ private static extern bool Bar(int flags, string inString, StringHandleOnStack retString);
+
+ // Many QCalls have a thin managed wrapper around them to expose them to
+ // the world in more meaningful way.
+ public string Bar(int flags)
+ {
+ string retString = null;
+
+ // The strings are returned from QCalls by taking address
+ // of a local variable using JitHelpers.GetStringHandle method
+ if (!Bar(flags, this.Id, JitHelpers.GetStringHandle(ref retString)))
+ FatalError();
+
+ return retString;
+ }
+ }
+
+### QCall Example - Unmanaged Part
+
+Do not replicate the comments into your actual QCall implementation.
+
+The QCall entrypoint has to be registered in tables in [vm\ecalllist.h][ecalllist] using QCFuncEntry macro. See "Registering your QCall or FCall Method" below.
+
+[ecalllist]: https://github.com/dotnet/coreclr/blob/master/src/vm/ecalllist.h
+
+ class FooNative
+ {
+ public:
+ // All QCalls should be static and should be tagged with QCALLTYPE
+ static
+ BOOL QCALLTYPE Bar(int flags, LPCWSTR wszString, QCall::StringHandleOnStack retString);
+ };
+
+ BOOL QCALLTYPE FooNative::Bar(int flags, LPCWSTR wszString, QCall::StringHandleOnStack retString)
+ {
+ // All QCalls should have QCALL_CONTRACT.
+ // It is alias for THROWS; GC_TRIGGERS; MODE_PREEMPTIVE; SO_TOLERANT.
+ QCALL_CONTRACT;
+
+ // Optionally, use QCALL_CHECK instead and the expanded form of the contract
+ // if you want to specify preconditions:
+ // CONTRACTL {
+ // QCALL_CHECK;
+ // PRECONDITION(wszString != NULL);
+ // } CONTRACTL_END;
+
+ // The only line between QCALL_CONTRACT and BEGIN_QCALL
+ // should be the return value declaration if there is one.
+ BOOL retVal = FALSE;
+
+ // The body has to be enclosed in BEGIN_QCALL/END_QCALL macro. It is necessary
+ // to make the exception handling work.
+ BEGIN_QCALL;
+
+ // Validate arguments if necessary and throw exceptions.
+ // There is no convention currently on whether the argument validation should be
+ // done in managed or unmanaged code.
+ if (flags != 0)
+ COMPlusThrow(kArgumentException, L"InvalidFlags");
+
+ // No need to worry about GC moving strings passed into QCall.
+ // Marshalling pins them for us.
+ printf("%S", wszString);
+
+ // This is most the efficient way to return strings back
+ // to managed code. No need to use StringBuilder.
+ retString.Set(L"Hello");
+
+ // You can not return from inside of BEGIN_QCALL/END_QCALL.
+ // The return value has to be passed out in helper variable.
+ retVal = TRUE;
+
+ END_QCALL;
+
+ return retVal;
+ }
+
+## FCall Functional Behavior
+
+FCalls allow more flexibility in terms of passing object references around, with a higher code complexity and more opportunities to hang yourself. Additionally, FCall methods must either erect a helper method frame along their common code paths, or for any FCall of non-trivial length, explicitly poll for whether a garbage collection must occur. Failing to do so will lead to starvation issues if managed code repeatedly calls the FCall method in a tight loop, because FCalls execute while the thread only allows the GC to run in a cooperative manner.
+
+FCalls require a lot of glue, too much to describe here. Look at [fcall.h][fcall] for details.
+
+[fcall]: https://github.com/dotnet/coreclr/blob/master/src/vm/fcall.h
+
+### GC Holes, FCall, and QCall
+
+A much more complete discussion on GC holes can be found in the [CLR Code Guide](../coding-guidelines/clr-code-guide.md). Look for ["Is your code GC-safe?"](../coding-guidelines/clr-code-guide.md#is-your-code-gc-safe). This tailored discussion motivates some of the reasons why FCall and QCall have some of their strange conventions.
+
+Object references passed as parameters to FCall methods are not GC-protected, meaning that if a GC occurs, those references will point to the old location in memory of an object, not the new location. For this reason, FCalls usually follow the discipline of accepting something like "StringObject*" as their parameter type, then explicitly converting that to a STRINGREF before doing operations that may trigger a GC. You must GC protect object references before triggering a GC, if you expect to be able to use that object reference later.
+
+All GC heap allocations within an FCall method must happen within a helper method frame. If you allocate memory on the GC's heap, the GC may collect dead objects & move objects around in unpredictable ways, with some low probability. For this reason, you must manually report any object references in your method to the GC, so that if a garbage collection occurs, your object reference will be updated to refer to the new location in memory. Any pointers into managed objects (like arrays or Strings) within your code will not be updated automatically, and must be re-fetched after any operation that may allocate memory and before your first usage. Reporting a reference can be done via the GCPROTECT macros, or as parameters when you erect a helper method frame.
+
+Failing to properly report an OBJECTREF or to update an interior pointer is commonly referred to as a "GC hole", because the OBJECTREF class will do some validation that it points to a valid object every time you dereference it in checked builds. When an OBJECTREF pointing to an invalid object is dereferenced, you'll get an assert saying something like "Detected an invalid object reference. Possible GC hole?". This assert is unfortunately easy to hit when writing "manually managed" code.
+
+Note that QCall's programming model is restrictive to sidestep GC holes most of the time, by forcing you to pass in the address of an object reference on the stack. This guarantees that the object reference is GC protected by the JIT's reporting logic, and that the actual object reference will not move because it is not allocated in the GC heap. QCall is our recommended approach, precisely because it makes GC holes harder to write.
+
+### FCall Epilogue Walker for x86
+
+The managed stack walker needs to be able to find its way from FCalls. It is relative easy on newer platforms that define conventions for stack unwinding as part of the ABI. The stack unwinding conventions are not defined by ABI for x86. The runtime works around it by implementing a epilog walker. The epilog walker computes the FCall return address and callee save registers by simulating the FCall execution. This imposes limits on what constructs are allowed in the FCall implementation.
+
+Complex constructs like stack allocated objects with destructors or exception handling in the FCall implementation may confuse the epilog walker. It leads to GC holes or crashes during stack walking. There is no exact list of what constructs should be avoided to prevent this class of bugs. An FCall implementation that is fine one day may break with the next C++ compiler update. We depend on stress runs & code coverage to find bugs in this area.
+
+Setting a breakpoint inside an FCall implementation may confuse the epilog walker. It leads to an "Invalid breakpoint in a helpermethod frame epilog" assert inside [vm\i386\gmsx86.cpp](https://github.com/dotnet/coreclr/blob/master/src/vm/i386/gmsx86.cpp).
+
+### FCall Example – Managed Part
+
+Here's a real-world example from the String class:
+
+ public partial sealed class String
+ {
+ // Replaces all instances of oldChar with newChar.
+ [MethodImplAttribute(MethodImplOptions.InternalCall)]
+ public extern String Replace (char oldChar, char newChar);
+ }
+
+### FCall Example – Native Part
+
+The FCall entrypoint has to be registered in tables in [vm\ecalllist.h][ecalllist] using FCFuncEntry macro. See "Registering your QCall or FCall Method".
+
+Notice how oldBuffer and newBuffer (interior pointers into String instances) are re-fetched after allocating memory. Also, this method is an instance method in managed code, with the "this" parameter passed as the first argument. We use StringObject* as the argument type, then copy it into a STRINGREF so we get some error checking when we use it.
+
+ FCIMPL3(LPVOID, COMString::Replace, StringObject* thisRefUNSAFE, CLR_CHAR oldChar, CLR_CHAR newChar)
+ {
+ FCALL_CONTRACT;
+
+ int length = 0;
+ int firstFoundIndex = -1;
+ WCHAR *oldBuffer = NULL;
+ WCHAR *newBuffer;
+
+ STRINGREF newString = NULL;
+ STRINGREF thisRef = (STRINGREF)thisRefUNSAFE;
+
+ if (thisRef==NULL) {
+ FCThrowRes(kNullReferenceException, L"NullReference_This");
+ }
+
+ [... Removed some uninteresting code here for illustrative purposes...]
+
+ HELPER_METHOD_FRAME_BEGIN_RET_ATTRIB_2(Frame::FRAME_ATTR_RETURNOBJ, newString, thisRef);
+
+ //Get the length and allocate a new String
+ //We will definitely do an allocation here.
+ newString = NewString(length);
+
+ //After allocation, thisRef may have moved
+ oldBuffer = thisRef->GetBuffer();
+
+ //Get the buffers in both of the Strings.
+ newBuffer = newString->GetBuffer();
+
+ //Copy the characters, doing the replacement as we go.
+ for (int i=0; i<firstFoundIndex; i++) {
+ newBuffer[i]=oldBuffer[i];
+ }
+ for (int i=firstFoundIndex; i<length; i++) {
+ newBuffer[i]=(oldBuffer[i]==((WCHAR)oldChar))?
+ ((WCHAR)newChar):oldBuffer[i];
+ }
+
+ HELPER_METHOD_FRAME_END();
+
+ return OBJECTREFToObject(newString);
+ }
+ FCIMPLEND
+
+
+## Registering your QCall or FCall Method
+
+The CLR must know the name of your QCall and FCall methods, both in terms of the managed class & method names, as well as which native methods to call. That is done in [ecalllist.h][ecalllist], with two arrays. The first array maps namespace & class names to an array of function elements. That array of function elements then maps individual method names & signatures to function pointers.
+
+Say we defined an FCall method for String.Replace(char, char), in the example above. First, we need to ensure that we have an array of function elements for the String class.
+
+ // Note these have to remain sorted by name:namespace pair (Assert will wack you if you
+ ...
+ FCClassElement("String", "System", gStringFuncs)
+ ...
+
+Second, we must then ensure that gStringFuncs contains a proper entry for Replace. Note that if a method name has multiple overloads (such as String.Replace(String, String)), then we can specify a signature:
+
+ FCFuncStart(gStringFuncs)
+ ...
+ FCFuncElement("IndexOf", COMString::IndexOfChar)
+ FCFuncElementSig("Replace", &gsig_IM_Char_Char_RetStr, COMString::Replace)
+ FCFuncElementSig("Replace", &gsig_IM_Str_Str_RetStr, COMString::ReplaceString)
+ ...
+ FCFuncEnd()
+
+There is a parallel QCFuncElement macro.
+
+## Naming convention
+
+Try to use normal name (e.g. no "_", "n" or "native" prefix) for all FCalls and QCalls. It is not good idea to embed that the function is implemented in VM in the name of the function for the following reasons:
+
+- There are directly exposed public FCalls. These FCalls have to follow the naming convention for public APIs.
+- The implementation of functions do move between CLR and mscorlib.dll. It is painful to change the name of the function in all call sites when this happens.
+
+When necessary you can use "Internal" prefix to disambiguate the name of the FCall or QCall from public entry point (e.g. the public entry point does error checking and then calls shared worker function with exactly same signature). This is no different from how you would deal with this situation in pure managed code in BCL.
+
+# Types with a Managed/Unmanaged Duality
+
+Certain managed types must have a representation available in both managed & native code. You could ask whether the canonical definition of a type is in managed code or native code within the CLR, but the answer doesn't matter – the key thing is they must both be identical. This will allow the CLR's native code to access fields within a managed object in a very fast, easy to use manner. There is a more complex way of using essentially the CLR's equivalent of Reflection over MethodTables & FieldDescs to retrieve field values, but this probably doesn't perform as well as you'd like, and it isn't very usable. For commonly used types, it makes sense to declare a data structure in native code & attempt to keep the two in sync.
+
+The CLR provides a binder for this purpose. After you define your managed & native classes, you should provide some clues to the binder to help ensure that the field offsets remain the same, to quickly spot when someone accidentally adds a field to only one definition of a type.
+
+In [mscorlib.h][mscorlib.h], you can use macros ending in "_U" to describe a type, the name of fields in managed code, and the name of fields in a corresponding native data structure. Additionally, you can specify a list of methods, and reference them by name when you attempt to call them later.
+
+[mscorlib.h]: https://github.com/dotnet/coreclr/blob/master/src/vm/mscorlib.h
+
+ DEFINE_CLASS_U(SAFE_HANDLE, Interop, SafeHandle, SafeHandle)
+ DEFINE_FIELD(SAFE_HANDLE, HANDLE, handle)
+ DEFINE_FIELD_U(SAFE_HANDLE, STATE, _state, SafeHandle, m_state)
+ DEFINE_FIELD_U(SAFE_HANDLE, OWNS_HANDLE, _ownsHandle, SafeHandle, m_ownsHandle)
+ DEFINE_FIELD_U(SAFE_HANDLE, INITIALIZED, _fullyInitialized, SafeHandle, m_fullyInitialized)
+ DEFINE_METHOD(SAFE_HANDLE, GET_IS_INVALID, get_IsInvalid, IM_RetBool)
+ DEFINE_METHOD(SAFE_HANDLE, RELEASE_HANDLE, ReleaseHandle, IM_RetBool)
+ DEFINE_METHOD(SAFE_HANDLE, DISPOSE, Dispose, IM_RetVoid)
+ DEFINE_METHOD(SAFE_HANDLE, DISPOSE_BOOL, Dispose, IM_Bool_RetVoid)
+
+
+Then, you can use the REF<T> template to create a type name like SAFEHANDLEREF. All the error checking from OBJECTREF is built into the REF<T> macro, and you can freely dereference this SAFEHANDLEREF & use fields off of it in native code. You still must GC protect these references.
+
+# Calling Into Managed Code From Native
+
+Clearly there are places where the CLR must call into managed code from native. For this purpose, we have added a MethodDescCallSite class to handle a lot of plumbing for you. Conceptually, all you need to do is find the MethodDesc\* for the method you want to call, find a managed object for the "this" pointer (if you're calling an instance method), pass in an array of arguments, and deal with the return value. Internally, you'll need to potentially toggle your thread's state to allow the GC to run in preemptive mode, etc.
+
+Here's a simplified example. Note how this instance uses the binder described in the previous section to call SafeHandle's virtual ReleaseHandle method.
+
+ void SafeHandle::RunReleaseMethod(SafeHandle* psh)
+ {
+ CONTRACTL {
+ THROWS;
+ GC_TRIGGERS;
+ MODE_COOPERATIVE;
+ } CONTRACTL_END;
+
+ SAFEHANDLEREF sh(psh);
+
+ GCPROTECT_BEGIN(sh);
+
+ MethodDescCallSite releaseHandle(s_pReleaseHandleMethod, METHOD__SAFE_HANDLE__RELEASE_HANDLE, (OBJECTREF*)&sh, TypeHandle(), TRUE);
+
+ ARG_SLOT releaseArgs[] = { ObjToArgSlot(sh) };
+ if (!(BOOL)releaseHandle.Call_RetBool(releaseArgs)) {
+ MDA_TRIGGER_ASSISTANT(ReleaseHandleFailed, ReportViolation)(sh->GetTypeHandle(), sh->m_handle);
+ }
+
+ GCPROTECT_END();
+ }
+
+# Interactions with Other Subsystems
+
+## Debugger
+
+One limitation of FCalls today is that you cannot easily debug both managed code and FCalls easily in Visual Studio's Interop (or mixed mode) debugging. Setting a breakpoint today in an FCall and debugging with Interop debugging just doesn't work. This most likely won't be fixed.
+
+# Physical Architecture
+
+When the CLR starts up, mscorlib is loaded by a method called LoadBaseSystemClasses. Here, the base data types & other similar classes (like Exception) are loaded, and appropriate global pointers are set up to refer to mscorlib's types.
+
+For FCalls, look in [fcall.h][fcall] for infrastructure, and [ecalllist.h][ecalllist] to properly inform the runtime about your FCall method.
+
+For QCalls, look in [qcall.h][qcall] for associated infrastructure, and [ecalllist.h][ecalllist] to properly inform the runtime about your QCall method.
+
+More general infrastructure and some native type definitions can be found in [object.h][object.h]. The binder uses mscorlib.h to associate managed & native classes.
+
+[object.h]: https://github.com/dotnet/coreclr/blob/master/src/vm/object.h
diff --git a/Documentation/botr/porting-ryujit.md b/Documentation/botr/porting-ryujit.md
new file mode 100644
index 0000000..8eb0b0d
--- /dev/null
+++ b/Documentation/botr/porting-ryujit.md
@@ -0,0 +1,112 @@
+# RyuJIT: Porting to different platforms
+
+## What is a Platform?
+* Target instruction set and pointer size
+* Target calling convention
+* Runtime data structures (not really covered here)
+* GC encoding
+ * So far only JIT32_GCENCODER and everything else
+* Debug info (so far mostly the same for all targets?)
+* EH info (not really covered here)
+
+One advantage of the CLR is that the VM (mostly) hides the (non-ABI) OS differences
+
+## The Very High Level View
+* 32 vs. 64 bits
+ * This work is not yet complete in the backend, but should be sharable
+* Instruction set architecture:
+ * instrsXXX.h, emitXXX.cpp and targetXXX.cpp
+ * lowerXXX.cpp
+ * codeGenXXX.cpp and simdcodegenXXX.cpp
+ * unwindXXX.cpp
+* Calling Convention: all over the place
+
+## Front-end changes
+* Calling Convention
+ * Struct args and returns seem to be the most complex differences
+ * Importer and morph are highly aware of these
+ * E.g. fgMorphArgs(), fgFixupStructReturn(), fgMorphCall(), fgPromoteStructs() and the various struct assignment morphing methods
+ * HFAs on ARM
+* Tail calls are target-dependent, but probably should be less so
+* Intrinsics: each platform recognizes different methods as intrinsics (e.g. Sin only for x86, Round everywhere BUT amd64)
+* Target-specific morphs such as for mul, mod and div
+
+## Backend Changes
+* Lowering: fully expose control flow and register requirements
+* Code Generation: traverse blocks in layout order, generating code (InstrDescs) based on register assignments on nodes
+ * Then, generate prolog & epilog, as well as GC, EH and scope tables
+* ABI changes:
+ * Calling convention register requirements
+ * Lowering of calls and returns
+ * Code sequences for prologs & epilogs
+ * Allocation & layout of frame
+
+## Target ISA "Configuration"
+* Conditional compilation (set in jit.h, based on incoming define, e.g. #ifdef X86)
+```C++
+_TARGET_64_BIT_ (32 bit target is just ! _TARGET_64BIT_)
+_TARGET_XARCH_, _TARGET_ARMARCH_
+_TARGET_AMD64_, _TARGET_X86_, _TARGET_ARM64_, _TARGET_ARM_
+```
+* Target.h
+* InstrsXXX.h
+
+## Instruction Encoding
+* The instrDesc is the data structure used for encoding
+ * It is initialized with the opcode bits, and has fields for immediates and register numbers.
+ * instrDescs are collected into groups
+ * A label may only occur at the beginning of a group
+* The emitter is called to:
+ * Create new instructions (instrDescs), during CodeGen
+ * Emit the bits from the instrDescs after CodeGen is complete
+ * Update Gcinfo (live GC vars & safe points)
+
+## Adding Encodings
+* The instruction encodings are captured in instrsXXX.h. These are the opcode bits for each instruction
+* The structure of each instruction's encoding is target-dependent
+* An "instruction" is just the representation of the opcode
+* An instance of "instrDesc" represents the instruction to be emitted
+* For each "type" of instruction, emit methods need to be implemented. These follow a pattern but a target may have unique ones, e.g.
+```C++
+emitter::emitInsMov(instruction ins, emitAttr attr, GenTree* node)
+emitter::emitIns_R_I(instruction ins, emitAttr attr, regNumber reg, ssize_t val)
+emitter::emitInsTernary(instruction ins, emitAttr attr, GenTree* dst, GenTree* src1, GenTree* src2) (currently Arm64 only)
+```
+
+## Lowering
+* Lowering ensures that all register requirements are exposed for the register allocator
+ * Use count, def count, "internal" reg count, and any special register requirements
+ * Does half the work of code generation, since all computation is made explicit
+ * But it is NOT necessarily a 1:1 mapping from lowered tree nodes to target instructions
+ * Its first pass does a tree walk, transforming the instructions. Some of this is target-independent. Notable exceptions:
+ * Calls and arguments
+ * Switch lowering
+ * LEA transformation
+ * Its second pass walks the nodes in execution order
+ * Sets register requirements
+ * sometimes changes the register requirements children (which have already been traversed)
+ * Sets the block order and node locations for LSRA
+ * LinearScan:: startBlockSequence() and LinearScan::moveToNextBlock()
+
+## Register Allocation
+* Register allocation is largely target-independent
+ * The second phase of Lowering does nearly all the target-dependent work
+* Register candidates are determined in the front-end
+ * Local variables or temps, or fields of local variables or temps
+ * Not address-taken, plus a few other restrictions
+ * Sorted by lvaSortByRefCount(), and marked "lvTracked"
+
+## Addressing Modes
+* The code to find and capture addressing modes is particularly poorly abstracted
+* genCreateAddrMode(), in CodeGenCommon.cpp traverses the tree looking for an addressing mode, then captures its constituent elements (base, index, scale & offset) in "out parameters"
+ * It optionally generates code
+ * For RyuJIT, it NEVER generates code, and is only used by gtSetEvalOrder, and by lowering
+
+## Code Generation
+* For the most part, the code generation method structure is the same for all architectures
+ * Most code generation methods start with "gen"
+* Theoretically, CodeGenCommon.cpp contains code "mostly" common to all targets (this factoring is imperfect)
+ * Method prolog, epilog,
+* genCodeForBBList
+ * walks the trees in execution order, calling genCodeForTreeNode, which needs to handle all nodes that are not "contained"
+ * generates control flow code (branches, EH) for the block
diff --git a/Documentation/botr/profilability.md b/Documentation/botr/profilability.md
new file mode 100644
index 0000000..528c3f1
--- /dev/null
+++ b/Documentation/botr/profilability.md
@@ -0,0 +1,240 @@
+Implementing Profilability
+==========================
+
+This document describes technical details of adding profilability to a CLR feature. This is targeted toward devs who are modifying the profiling API so their feature can be profilable.
+
+Philosophy
+==========
+
+Contracts
+---------
+
+Before delving into the details on which contracts should be used in the profiling API, it's useful to understand the overall philosophy.
+
+A philosophy behind the default contracts movement throughout the CLR (outside of the profiling API) is to encourage the majority of the CLR to be prepared to deal with "aggressive behavior" like throwing or triggering. Below you'll see that this goes hand-in-hand with the recommendations for the callback (ICorProfilerCallback) contracts, which generally prefer the more permissive ("aggressive") of the contract choices. This gives the profiler the most flexibility in what it can do during its callback (in terms of which CLR calls it can make via ICorProfilerInfo).
+
+However, the Info functions (ICorProfilerInfo) below are just the opposite: they're preferred to be restrictive rather than permissive. Why? Because we want these to be safe for the profiler to call from as many places as possible, even from those callbacks that are more restrictive than we might like (e.g., callbacks that for some reason must be GC\_NOTRIGGER).
+
+Also, the preference for more restrictive contracts in ICorProfilerInfo doesn't contradict the overall CLR default contract philosophy, because it is expected that there will be a small minority of CLR functions that need to be restrictive. ICorProfilerInfo is the root of call paths that fall into this category. Since the profiler may be calling into the CLR at delicate times, we want these calls to be as unobtrusive as possible. These are not considered mainstream functions in the CLR, but are a small minority of special call paths that need to be careful.
+
+So the general guidance is to use default contracts throughout the CLR where possible. But when you need to blaze a path of calls originating from a profiler (i.e., from ICorProfilerInfo), that path will need to have its contracts explicitly specified, and be more restrictive than the default.
+
+Performance or ease of use?
+---------------------------
+
+Both would be nice. But if you need to make a trade-off, favor performance. The profiling API is meant to be a light-weight, thin, in-process layer between the CLR and a profiling DLL. Profiler writers are few and far between, and are mostly quite sophisticated developers. Simple validation of inputs by the CLR is expected. But we only go so far. For example, consider all the profiler IDs. They're just casted pointers of C++ EE object instances that are called into directly (AppDomain\*, MethodTable\*, etc.). A Profiler provides a bogus ID? The CLR AVs! This is expected. The CLR does not hash IDs, in order to validate a lookup . Profilers are assumed to know what they are doing.
+
+That said, I'll repeat: simple validation of inputs by the CLR is expected. Things like checking for NULL pointers, that classes requested for inspection have been initialized, "parallel parameters" are consistent (e.g., an array pointer parameter must be non-null if its size parameter is nonzero), etc.
+
+ICorProfilerCallback
+====================
+
+This interface comprises the callbacks made by the CLR into the profiler to notify the profiler of interesting events. Each callback is wrapped in a thin method in the EE that handles locating the profiler's implementation of ICorProfilerCallback(2), and calling its corresponding method.
+
+Profilers subscribe to events by specifying the corresponding flag in a call to ICorProfilerInfo::SetEventMask(). The profiling API stores these choices and exposes them to the CLR through specialized inline functions (CORProfiler\*) that mask against the bit corresponding to the flag. Then, sprinkled throughout the CLR, you'll see code that calls the ICorProfilerCallback wrapper to notify the profiler of events as they happen, but this call is conditional on the flag being set (determined by calling the specialized inline function):
+
+ {
+ //check if profiler set flag, pin profiler
+ BEGIN_PIN_PROFILER(CORProfilerTrackModuleLoads());
+
+ //call the wrapper around the profiler's callback implementation
+ g_profControlBlock.pProfInterface->ModuleLoadStarted((ModuleID) this);
+
+ //unpin profiler
+ END_PIN_PROFILER();
+ }
+
+To be clear, the code above is what you'll see sprinkled throughout the code base. The function it calls (in this case ModuleLoadStarted()) is our wrapper around the profiler's callback implementation (in this case ICorProfilerCallback::ModuleLoadStarted()). All of our wrappers appear in a single file (vm\EEToProfInterfaceImpl.cpp), and the guidance provided in the sections below relate to those wrappers; not to the above sample code that calls the wrappers.
+
+The macro BEGIN\_PIN\_PROFILER evaluates the expression passed as its argument. If the expression is TRUE, then the profiler is pinned into memory (meaning the profiler will not be able to detach from the process) and the code between the BEGIN\_PIN\_PROFILER and END\_PIN\_PROFILER macros is executed. If the expression is FALSE, all code between the BEGIN\_PIN\_PROFILER and END\_PIN\_PROFILER macros is skipped. For more information about the BEGIN\_PIN\_PROFILER and END\_PIN\_PROFILER macros, find their definition in the code base and read the comments there.
+
+Contracts
+---------
+
+Each and every callback wrapper must have some common gunk at the top. Here's an example:
+
+ CONTRACTL
+ {
+ // Yay!
+ NOTHROW;
+
+ // Yay!
+ GC_TRIGGERS;
+
+ // Yay!
+ MODE_PREEMPTIVE;
+
+ // Yay!
+ CAN_TAKE_LOCK;
+
+ // Yay!
+ ASSERT_NO_EE_LOCKS_HELD();
+ SO_NOT_MAINLINE;
+ }
+ CONTRACTL_END;
+ CLR_TO_PROFILER_ENTRYPOINT((LF_CORPROF,
+ LL_INFO10,
+ "**PROF: useful logging text here.\n"));
+
+Important points:
+
+- You must explicitly specify a value for the throws, triggers, mode, take\_lock, and ASSERT\_NO\_EE\_LOCKS\_HELD() (latter required on callbacks only). This allows us to keep our documentation for profiler-writers accurate.
+- Each contract must have its own comment (see below for specific details on contracts)
+
+There's a "preferred" value for each contract type. If possible, use that and comment it with "Yay!" so that others who copy / paste your code elsewhere will know what's best. If it's not possible to use the preferred value, comment why.
+
+Here are the preferred values for callbacks.
+
+| Preferred | Why | Details |
+| --------- | --- | ------- |
+| NOTHROW | Allows callback to be issued from any CLR context. Since Infos should be NOTHROW as well, this shouldn't be a hardship for the profiler. | Note that you will get throws violations if the profiler calls a THROWS Info function from here, even though the profiler encloses the call in a try/catch (because our contract system can't see the profiler's try/catch). So you'll need to insert a CONTRACT\_VIOLATION(ThrowsViolation) scoped just before the call into the profiler. |
+| GC\_TRIGGERS | Gives profiler the most flexibility in the Infos it can call. | If the callback is made at a delicate time where protecting all the object refs would be error-prone or significantly degrade performance, use GC\_NOTRIGGER (and comment of course!). |
+| MODE\_PREEMPTIVE if possible, otherwise MODE\_COOPERATIVE | MODE\_PREEMPTIVE gives profiler the most flexibility in the Infos it can call (except when coop is necessary due to ObjectIDs). Also, MODE\_PREEMPTIVE is a preferred "default" contract throughout the EE, and forcing callbacks to be in preemptive encourages use of preemptive elsewhere in the EE. | MODE\_COOPERATIVE is fair if you're passing ObjectID parameters to the profiler. Otherwise, specify MODE\_PREEMPTIVE. The caller of the callback should hopefully already be in preemptive mode anyway. If not, rethink why not and potentially change the caller to be in preemptive. Otherwise, you will need to use a GCX\_PREEMP() macro before calling the callback. |
+| CAN\_TAKE\_LOCK | Gives profiler the most flexibility in the Infos it can call | Nothing further, your honor. |
+| ASSERT\_NO\_EE\_LOCKS\_HELD() | Gives profiler even more flexibility on Infos it can call, as it ensures no Info could try to retake a lock or take an out-of-order lock (since no lock is taken to "retake" or destroy ordering) | This isn't actually a contract, though the contract block is a convenient place to put this, so you don't forget. As with the contracts, if this cannot be specified, comment why. |
+
+Note: EE\_THREAD\_NOT\_REQUIRED / EE\_THREAD\_REQUIRED need **not** be specified for callbacks. GC callbacks cannot specify "REQUIRED" anyway (no EE Thread might be present), and it is only interesting to consider these on the Info functions (profiler &#8594; CLR).
+
+Entrypoint macros
+-----------------
+
+As in the example above, after the contracts there should be an entrypoint macro. This takes care of logging, marking on the EE Thread object that we're in a callback, removing stack guard, and doing some asserts. There are a few variants of the macro you can use:
+
+ CLR_TO_PROFILER_ENTRYPOINT
+
+This is the preferred and typically-used macro.
+
+Other macro choices may be used **but you must comment** why the above (preferred) macro cannot be used.
+
+ *_FOR_THREAD_*
+
+These macros are used for ICorProfilerCallback methods that specify a ThreadID parameter whose value may not always be the _current_ ThreadID. You must specify the ThreadID as the first parameter to these macros. The macro will then use your ThreadID rather than GetThread(), to assert that the callback is currently allowed for that ThreadID (i.e., that we have not yet issued a ThreadDestroyed() for that ThreadID).
+
+ICorProfilerInfo
+================
+
+This interface comprises the entrypoints used by the profiler to call into the CLR.
+
+Synchronous / Asynchronous
+--------------------------
+
+Each Info call is classified as either synchronous or asynchronous. Synchronous functions must be called from within a callback, whereas asynchronous functions are safe to be called at any time.
+
+### Synchronous
+
+The vast majority of Info calls are synchronous: They can only be called by a profiler while it is executing inside a Callback. In other words, an ICorProfilerCallback must be on the stack for it to be legal to call a synchronous Info function. This is tracked by a bit on the EE Thread object. When a Callback is made, we set the bit. When the callback returns, we reset the bit. When a synchronous Info function is called, we test the bit—if it's not set, disallow the call.
+
+#### Threads without an EE Thread
+
+Because the above bit is tracked using the EE Thread object, only Info calls made on threads containing an EE Thread object have their "synchronous-ness" enforced. Any Info call made on a non-EE Thread thread is immediately considered legal. This is generally fine, as it's mainly the EE Thread threads that build up complex contexts that would be problematic to reenter. Also, it's ultimately the profiler's responsibility to ensure correctness. As described above, for performance reasons, the profiling API historically keeps its correctness checks down to a bare minimum, so as not to increase the weight. Typically, Info calls made by a profiler on a non-EE Thread fall into these categories:
+
+- An Info call made during a GC callback on a thread doing a server.
+- An Info call made on a thread of the profiler's creation, such as a sampling thread (which therefore would have no CLR code on the stack).
+
+#### Enter / leave hooks
+
+If a profiler requests enter / leave hooks and uses the fast path (i.e., direct function calls from the jitted code to the profiler with no intervening profiling API code), then any call to an Info function from within its enter / leave hooks will be considered asynchronous. Again, this is for pragmatic reasons. If profiling API code doesn't get a chance to run (for performance), then we have no opportunity to set the EE Thread bit stating that we're executing inside a callback. This means a profiler is restricted to calling only asynchronous-safe Info functions from within its enter / leave hook. This is typically acceptable, as a profiler concerned enough with perf that it requires direct function calls for enter / leave will probably not be calling any Info functions from within its enter / leave hooks anyway.
+
+The alternative is for the profiler to set a flag specifying that it wants argument or return value information, which forces an intervening profiling API C function to be called to prepare the information for the profiler's Enter / Leave hooks. When such a flag is set, the profiling API sets the EE Thread bit from inside this C function that prepares the argument / return value information from the profiler. This enables the profiler to call synchronous Info functions from within its Enter / Leave hook.
+
+### Asynchronous
+
+Asynchronous Info functions are those that are safe to be called anytime (from a callback or not). There are relatively few asynchronous Info functions. They are what a hijacking sampling profiler (e.g., Visual Studio profiler) might want to call from within one of its samples. It is critical that an Info function labeled as asynchronous be able to execute from any possible call stack. A thread could be interrupted while holding any number of locks (spin locks, thread store lock, OS heap lock, etc.), and then forced by the profiler to reenter the runtime via an asynchronous Info function. This can easily cause deadlock or data corruption. There are two ways an asynchronous Info function can ensure its own safety:
+
+- Be very, very simple. Don't take locks, don't trigger a GC, don't access data that could be inconsistent, etc. OR
+- If you need to be more complex than that, have sufficient checks at the top to ensure locks, data structures, etc., are in a safe state before proceeding.
+ - Often, this includes asking whether the current thread is currently inside a forbid suspend thread region, and bailing with an error if it is, though this is not a sufficient check in all cases.
+ - DoStackSnapshot is an example of a complex asynchronous function. It uses a combination of checks (including asking whether the current thread is currently inside a forbid suspend thread region) to determine whether to proceed or bail.
+
+Contracts
+---------
+
+Each and every Info function must have some common gunk at the top. Here's an example:
+
+ CONTRACTL
+ {
+ // Yay!
+ NOTHROW;
+
+ // Yay!
+ GC_NOTRIGGER;
+
+ // Yay!
+ MODE_ANY;
+
+ // Yay!
+ EE_THREAD_NOT_REQUIRED;
+
+ // Yay!
+ CANNOT_TAKE_LOCK;
+ SO_NOT_MAINLINE;
+ }
+ CONTRACTL_END;
+ PROFILER_TO_CLR_ENTRYPOINT_SYNC((LF_CORPROF,
+ LL_INFO1000,
+ "**PROF: EnumModuleFrozenObjects 0x%p.\n",
+ moduleID));
+
+Here are the "preferred" values for each contract type. Note these are mostly different from the preferred values for Callbacks! If that confuses you, reread section 2.
+
+| Preferred | Why | Details |
+| --------- | --- | ------- |
+| NOTHROW | Makes it easier for profiler to call; profiler doesn't need its own try / catch. | If your callees are NOTHROW then use NOTHROW. Otherwise, it's actually better to mark yourself as THROWS than to set up your own try / catch. The profiler can probably do this more efficiently by sharing a try block among multiple Info calls. |
+| GC\_NOTRIGGER | Safer for profiler to call from more situations | Go out of your way not to trigger. If an Info function _might_ trigger (e.g., loading a type if it's not already loaded), ensure there's a way, if possible, for the profiler to specify _not_ to take the trigger path (e.g., fAllowLoad parameter that can be set to FALSE), and contract that conditionally. |
+| MODE\_ANY | Safer for profiler to call from more situations | MODE\_COOPERATIVE is fair if your parameters or returns are ObjectIDs. Otherwise, MODE\_ANY is strongly preferred. |
+| CANNOT\_TAKE\_LOCK | Safer for profiler to call from more situations | Ensure your callees don't lock. If they must, comment exactly what locks are taken. |
+| Optional:EE\_THREAD\_NOT\_REQUIRED | Allows profiler to use this Info fcn from GC callbacks and from profiler-spun threads (e.g., sampling thread). | These contracts are not yet enforced, so it's fine to just leave it blank. If you're pretty sure your Info function doesn't need (or call anyone who needs) a current EE Thread, you can specify EE\_THREAD\_NOT\_REQUIRED as a hint for later when the thread contracts are enforced. |
+
+Here's an example of commented contracts in a function that's not as "yay" as the one above:
+
+ CONTRACTL
+ {
+ // ModuleILHeap::CreateNew throws
+ THROWS;
+
+ // AppDomainIterator::Next calls AppDomain::Release which can destroy AppDomain, and
+ // ~AppDomain triggers, according to its contract.
+ GC_TRIGGERS;
+
+ // Need cooperative mode, otherwise objectId can become invalid
+ if (GetThreadNULLOk() != NULL) { MODE_COOPERATIVE; }
+
+ // Yay!
+ EE_THREAD_NOT_REQUIRED;
+
+ // Generics::GetExactInstantiationsFromCallInformation eventually
+ // reads metadata which causes us to take a reader lock.
+ CAN_TAKE_LOCK;
+ }
+ CONTRACTL_END;
+
+Entrypoint macros
+-----------------
+
+After the contracts, there should be an entrypoint macro. This takes care of logging and, in the case of a synchronous function, consulting callback state flags to enforce it's really called synchronously. Use one of these, depending on whether the Info function is synchronous, asynchronous, or callable only from within the Initialize callback:
+
+- PROFILER\_TO\_CLR\_ENTRYPOINT\_**SYNC** _(typical choice)_
+- PROFILER\_TO\_CLR\_ENTRYPOINT\_**ASYNC**
+- PROFILER\_TO\_CLR\_ENTRYPOINT\_CALLABLE\_ON\_INIT\_ONLY
+
+As described above, asynchronous Info methods are rare and carry a higher burden. The preferred contracts above are even "more preferred" if the method is asynchronous, and these 2 are outright required: GC\_NOTRIGGER & MODE\_ANY. CANNOT\_TAKE\_LOCK, while even more preferred in an async than sync function, is not always possible. See _Asynchronous_ section above for what to do.
+
+Files You'll Modify
+===================
+
+It's pretty straightforward where to go, to add or modify methods, and code inspection is all you'll need to figure it out. Here are the places you'll need to go.
+
+corprof.idl
+-----------
+
+All profiling API interfaces and types are defined in [src\inc\corprof.idl](https://github.com/dotnet/coreclr/blob/master/src/inc/corprof.idl). Go here first to define your types and methods.
+
+EEToProfInterfaceImpl.\*
+-----------------------
+
+Wrapper around the profiler's implementation of ICorProfilerCallback is located at [src\vm\EEToProfInterfaceImpl.\*](https://github.com/dotnet/coreclr/tree/master/src/vm).
+
+ProfToEEInterfaceImpl.\*
+-----------------------
+
+Implementation of ICorProfilerInfo is located at [src\vm\ProfToEEInterfaceImpl.\*](https://github.com/dotnet/coreclr/tree/master/src/vm).
diff --git a/Documentation/botr/profiling.md b/Documentation/botr/profiling.md
new file mode 100644
index 0000000..b83f78b
--- /dev/null
+++ b/Documentation/botr/profiling.md
@@ -0,0 +1,513 @@
+Profiling
+=========
+
+Profiling, in this document, means monitoring the execution of a program which is executing on the Common Language Runtime (CLR). This document details the interfaces, provided by the Runtime, to access such information.
+
+Although it is called the Profiling API, the functionality provided by it is suitable for use by more than just traditional profiling tools. Traditional profiling tools focus on measuring the execution of the program—time spent in each function, or memory usage of the program over time. However, the profiling API is really targeted at a broader class of diagnostic tools, such as code-coverage utilities or even advanced debugging aids.
+
+The common thread among all of these uses is that they are all diagnostic in nature — the tool is written to monitor the execution of a program. The Profiling API should never be used by the program itself, and the correctness of the program's execution should not depend on (or be affected by) having a profiler active against it.
+
+Profiling a CLR program requires more support than profiling conventionally compiled machine code. This is because the CLR has concepts such as application domains, garbage collection, managed exception handling and JIT compilation of code (converting Intermediate Language into native machine code), that the existing conventional profiling mechanisms are unable to identify and provide useful information. The Profiling API provides this missing information in an efficient way that causes minimal impact on the performance of the CLR and the profiled program.
+
+Note that JIT-compiling routines at runtime provide good opportunities, as the API allows a profiler to change the in-memory IL code stream for a routine, and then request that it be JIT-compiled anew. In this way, the profiler can dynamically add instrumentation code to particular routines that need deeper investigation. Although this approach is possible in conventional scenarios, it's much easier to do this for the CLR.
+
+Goals for the Profiling API
+===========================
+
+- Expose information that existing profilers will require for a user to determine and analyze performance of a program run on the CLR. Specifically:
+
+ - Common Language Runtime startup and shutdown events
+ - Application domain creation and shutdown events
+ - Assembly loading and unloading events
+ - Module load/unload events
+ - Com VTable creation and destruction events
+ - JIT-compiles, and code pitching events
+ - Class load/unload events
+ - Thread birth/death/synchronization
+ - Function entry/exit events
+ - Exceptions
+ - Transitions between managed and unmanaged execution
+ - Transitions between different Runtime _contexts_
+ - Information about Runtime suspensions
+ - Information about the Runtime memory heap and garbage collection activity
+
+- Callable from any (non-managed) COM-compatible language
+- Efficient, in terms of CPU and memory consumption - the act of profiling should not cause such a big change upon the program being profiled that the results are misleading
+- Useful to both _sampling_ and _non-sampling_ profilers. [A _sampling _profiler inspects the profilee at regular clock ticks - maybe 5 milliseconds apart, say. A _non-sampling _profiler is informed of events, synchronously with the thread that causes them]
+
+Non-goals for the Profiling API
+===============================
+
+- The Profiling API does **not** support profiling unmanaged code. Existing mechanisms must instead be used to profile unmanaged code. The CLR profiling API works only for managed code. However, profiler provides managed/unmanaged transition events to determine the boundaries between managed and unmanaged code.
+- The Profiling API does **not** support writing applications that will modify their own code, for purposes such as aspect-oriented programming.
+- The Profiling API does **not** provide information needed to check bounds. The CLR provides intrinsic support for bounds checking of all managed code.
+
+The CLR code profiler interfaces do not support remote profiling due to the following reasons:
+
+- It is necessary to minimize execution time using these interfaces so that profiling results will not be unduly affected. This is especially true where execution performance is being monitored. However, it is not a limitation when the interfaces are used to monitor memory usage or to obtain Runtime information on stack frames, objects, etc.
+- The code profiler needs to register one or more callback interfaces with the Runtime on the local machine on which the application being profiled runs. This limits the ability to create a remote code profiler.
+
+Profiling API – Overview
+========================
+
+The profiling API within CLR allows the user to monitor the execution and memory usage of a running application. Typically, this API will be used to write a code profiler package. In the sections that follow, we will talk about a profiler as a package built to monitor execution of _any_ managed application.
+
+The profiling API is used by a profiler DLL, loaded into the same process as the program being profiled. The profiler DLL implements a callback interface (ICorProfilerCallback2). The runtime calls methods on that interface to notify the profiler of events in the profiled process. The profiler can call back into the runtime with methods on ICorProfilerInfo to get information about the state of the profiled application.
+
+Note that only the data-gathering part of the profiler solution should be running in-process with the profiled application—UI and data analysis should be done in a separate process.
+
+![Profiling Process Overview]: images/profiling-overview.png
+
+The _ICorProfilerCallback_ and _ICorProfilerCallback2 _interfaces consists of methods with names like ClassLoadStarted, ClassLoadFinished, JITCompilationStarted. Each time the CLR loads/unloads a class, compiles a function, etc., it calls the corresponding method in the profiler's _ICorProfilerCallback/ICorProfilerCallback2_ interface. (And similarly for all of the other notifications; see later for details)
+
+So, for example, a profiler could measure code performance via the two notifications FunctionEnter and FunctionLeave. It simply timestamps each notification, accumulates results, then outputs a list indicating which functions consumed the most cpu time, or most wall-clock time, during execution of the application.
+
+The _ICorProfilerCallback/ICorProfilerCallback2_ interface can be considered to be the "notifications API".
+
+The other interface involved for profiling is _ICorProfilerInfo_. The profiler calls this, as required, to obtain more information to help its analysis. For example, whenever the CLR calls FunctionEnter it supplies a value for the FunctionId. The profiler can discover more information about that FunctionId by calling the _ICorProfilerInfo::GetFunctionInfo_ to discover the function's parent class, its name, etc, etc.
+
+The picture so far describes what happens once the application and profiler are running. But how are the two connected together when an application is started? The CLR makes the connection during its initialization in each process. It decides whether to connect to a profiler, and which profiler that should be, depending upon the value for two environment variables, checked one after the other:
+
+- Cor\_Enable\_Profiling - only connect with a profiler if this environment variable exists and is set to a non-zero value.
+- Cor\_Profiler - connect with the profiler with this CLSID or ProgID (which must have been stored previously in the Registry). The Cor\_Profiler environment variable is defined as a string:
+ - set Cor\_Profiler={32E2F4DA-1BEA-47ea-88F9-C5DAF691C94A}, or
+ - set Cor\_Proflier="MyProfiler"
+- The profiler class is the one that implements _ICorProfilerCallback/ICorProfilerCallback2_. It is required that a profiler implement ICorProfilerCallback2; if it does not, it will not be loaded.
+
+When both checks above pass, the CLR creates an instance of the profiler in a similar fashion to _CoCreateInstance_. The profiler is not loaded through a direct call to _CoCreateInstance_ so that a call to _CoInitialize_ may be avoided, which requires setting the threading model. It then calls the _ICorProfilerCallback::Initialize_ method in the profiler. The signature of this method is:
+
+ HRESULT Initialize(IUnknown \*pICorProfilerInfoUnk)
+
+The profiler must QueryInterface pICorProfilerInfoUnk for an _ICorProfilerInfo_ interface pointer and save it so that it can call for more info during later profiling. It then calls ICorProfilerInfo::SetEventMask to say which categories of notifications it is interested in. For example:
+
+ ICorProfilerInfo\* pInfo;
+
+ pICorProfilerInfoUnk->QueryInterface(IID\_ICorProfilerInfo, (void\*\*)&pInfo);
+
+ pInfo->SetEventMask(COR\_PRF\_MONITOR\_ENTERLEAVE | COR\_PRF\_MONITOR\_GC)
+
+This mask would be used for a profiler interested only in function enter/leave notifications and garbage collection notifications. The profiler then simply returns, and is off and running!
+
+By setting the notifications mask in this way, the profiler can limit which notifications it receives. This obviously helps the user build a simpler, or special-purpose profiler; it also reduces wasted cpu time in sending notifications that the profiler would simply 'drop on the floor' (see later for details).
+
+TODO: This text is a bit confusing. It seems to be conflating the fact that you need to create a different 'environment' (as in environment variables) to specify a different profiler and the fact that only one profiler can attach to a process at once. It may also be conflating launch vs. attach scenarios. Is that right??
+
+Note that only one profiler can be profiling a process at one time in a given environment. In different environments it is possible to have two different profilers registered in each environment, each profiling separate processes.
+
+Certain profiler events are IMMUTABLE which means that once they are set in the _ICorProfilerCallback::Initialize_ callback they cannot be turned off using ICorProfilerInfo::SetEventMask(). Trying to change an immutable event will result in SetEventMask returning a failed HRESULT.
+
+The profiler must be implemented as an inproc COM server – a DLL, which is mapped into the same address space as the process being profiled. Any other type of COM server is not supported; if a profiler, for example, wants to monitor applications from a remote computer, it must implement 'collector agents' on each machine, which batch results and communicate them to the central data collection machine.
+
+Profiling API – Recurring Concepts
+==================================
+
+This brief section explains a few concepts that apply throughout the profiling API, rather than repeat them with the description of each method.
+
+IDs
+---
+
+Runtime notifications supply an ID for reported classes, threads, AppDomains, etc. These IDs can be used to query the Runtime for more info. These IDs are simply the address of a block in memory that describes the item; however, they should be treated as opaque handles by any profiler. If an invalid ID is used in a call to any Profiling API function then the results are undefined. Most likely, the result will be an access violation. The user has to ensure that the ID's used are perfectly valid. The profiling API does not perform any type of validation since that would create overhead and it would slow down the execution considerably.
+
+### Uniqueness
+
+A ProcessID is unique system-wide for the lifetime of the process. All other IDs are unique process-wide for the lifetime of the ID.
+
+### Hierarchy & Containment
+
+ID's are arranged in a hierarchy, mirroring the hierarchy in the process. Processes contain AppDomains contain Assemblies contain Modules contain Classes contain Functions. Threads are contained within Processes, and may move from AppDomain to AppDomain. Objects are mostly contained within AppDomains (a very few objects may be members of more than one AppDomain at a time). Contexts are contained within Processes.
+
+### Lifetime & Stability
+
+When a given ID dies, all IDs contained within it die.
+
+ProcessID – Alive and stable from the call to Initialize until the return from Shutdown.
+
+AppDomainID – Alive and stable from the call to AppDomainCreationFinished until the return from AppDomainShutdownStarted.
+
+AssemblyID, ModuleID, ClassID – Alive and stable from the call to LoadFinished for the ID until the return from UnloadStarted for the ID.
+
+FunctionID – Alive and stable from the call to JITCompilationFinished or JITCachedFunctionSearchFinished until the death of the containing ClassID.
+
+ThreadID – Alive and stable from the call to ThreadCreated until the return from ThreadDestroyed.
+
+ObjectID – Alive beginning with the call to ObjectAllocated. Eligible to change or die with each garbage collection.
+
+GCHandleID – Alive from the call to HandleCreated until the return from HandleDestroyed.
+
+In addition, any ID returned from a profiling API function will be alive at the time it is returned.
+
+### App-Domain Affinity
+
+There is an AppDomainID for each user-created app-domain in the process, plus the "default" domain, plus a special pseudo-domain used for holding domain-neutral assemblies.
+
+Assembly, Module, Class, Function, and GCHandleIDs have app-domain affinity, meaning that if an assembly is loaded into multiple app domains, it (and all of the modules, classes, and functions contained within it) will have a different ID in each, and operations upon each ID will take effect only in the associated app domain. Domain-neutral assemblies will appear in the special pseudo-domain mentioned above.
+
+### Special Notes
+
+All IDs except ObjectID should be treated as opaque values. Most IDs are fairly self-explanatory. A few are worth explaining in more detail:
+
+**ClassIDs** represent classes. In the case of generic classes, they represent fully-instantiated types. List<int>, List<char>, List<object>, and List<string> each have their own ClassID. List<T> is an uninstantiated type, and has no ClassID. Dictionary<string,V> is a partially-instantiated type, and has no ClassID.
+
+**FunctionIDs** represent native code for a function. In the case of generic functions (or functions on generic classes), there may be multiple native code instantiations for a given function, and thus multiple FunctionIDs. Native code instantiations may be shared between different types — for example List<string> and List<object> share all code—so a FunctionID may "belong" to more than one ClassID.
+
+**ObjectIDs** represent garbage-collected objects. An ObjectID is the current address of the object at the time the ObjectID is received by the profiler, and may change with each garbage collection. Thus, an ObjectID value is only valid between the time it is received and when the next garbage collection begins. The CLR also supplies notifications that allow a profiler to update its internal maps that track objects, so that a profiler may maintain a valid ObjectID across garbage collections.
+
+**GCHandleIDs** represent entries in the GC's handle table. GCHandleIDs, unlike ObjectIDs, are opaque values. GC handles are created by the runtime itself in some situations, or can be created by user code using the System.Runtime.InteropServices.GCHandle structure. (Note that the GCHandle structure merely represents the handle; the handle does not "live" within the GCHandle struct.)
+
+**ThreadIDs** represent managed threads. If a host supports execution in fiber mode, a managed thread may exist on different OS threads, depending on when it is examined. ( **NOTE:** Profiling of fiber-mode applications is not supported.)
+
+Callback Return Values
+----------------------
+
+A profiler returns a status, as an HRESULT, for each notification triggered by the CLR. That status may have the value S\_OK or E\_FAIL. Currently the Runtime ignores this status value in every callback except ObjectReferences.
+
+Caller-Allocated Buffers
+------------------------
+
+ICorProfilerInfo functions that take caller-allocated buffers typically conform to the following signature:
+
+ HRESULT GetBuffer( [in] /\* Some query information \*/,
+ [in] ULONG32 cBuffer,
+ [out] ULONG32 \*pcBuffer,
+ [out, size\_is(cBuffer), length\_is(\*pcMap)] /\* TYPE \*/ buffer[] );
+
+These functions will always behave as follows:
+
+- cBuffer is the number of elements allocated in the buffer.
+- \*pcBuffer will be set to the total number of elements available.
+- buffer will be filled with as many elements as possible
+
+If any elements are returned, the return value will be S\_OK. It is the caller's responsibility to check if the buffer was large enough.
+
+If buffer is NULL, cBuffer must be 0. The function will return S\_OK and set \*pcBuffer to the total number of elements available.
+
+Optional Out Parameters
+-----------------------
+
+All [out] parameters on the API are optional, unless a function has only one [out] parameter. A profiler simply passes NULL for any [out] parameters it is not interested in. The profiler must also pass consistent values for any associated [in] parameters—e.g., if the NULL [out] parameter is a buffer to be filled with data, the [in] parameter specifying its size must be 0.
+
+Notification Thread
+-------------------
+
+In most cases, the notifications are executed by the same thread as generated the event. Such notifications (for example, FunctionEnter and FunctionLeave_)_ don't need to supply the explicit ThreadID. Also, the profiler might choose to use thread-local storage to store and update its analysis blocks, as compared with indexing into global storage, based off the ThreadID of the affected thread.
+
+Each notification documents which thread does the call – either the thread which generated the event or some utility thread (e.g. garbage collector) within the Runtime. For any callback that might be invoked by a different thread, a user can call the _ICorProfilerInfo::GetCurrentThreadID_ to discover the thread that generated the event.
+
+Note that these callbacks are not serialized. The profiler developer must write defensive code, by creating thread safe data structures and by locking the profiler code where necessary to prevent parallel access from multiple threads. Therefore, in certain cases it is possible to receive an unusual sequence of callbacks. For example assume a managed application is spawning two threads, which are executing identical code. In this case it is possible to receive a JITCompilationStarted event for some function from one thread and before receiving the respective JITCompilationFinished callback, the other thread has already sent a FunctionEnter callback. Therefore the user will receive a FunctionEnter callback for a function that it seems not fully JIT compiled yet!
+
+GC-Safe Callouts
+----------------
+
+When the CLR calls certain functions in the _ICorProfilerCallback_, the Runtime cannot perform a garbage collection until the Profiler returns control from that call. This is because profiling services cannot always construct the stack into a state that is safe for a garbage collection; instead garbage collection is disabled around that callback. For these cases, the Profiler should take care to return control as soon as possible. The callbacks where this applies are:
+
+- FunctionEnter, FunctionLeave, FunctionTailCall
+- ExceptionOSHandlerEnter, ExceptionOSHandlerLeave
+- ExceptionUnwindFunctionEnter, ExceptionUnwindFunctionLeave
+- ExceptionUnwindFinallyEnter, ExceptionUnwindFinallyLeave
+- ExceptionCatcherEnter, ExceptionCatcherLeave
+- ExceptionCLRCatcherFound, ExceptionCLRCatcherExecute
+- COMClassicVTableCreated, COMClassicVTableDestroyed
+
+In addition, the following callbacks may or may not allow the Profiler to block. This is indicated, call-by-call, via the fIsSafeToBlockargument. This set includes:
+
+- JITCompilationStarted, JITCompilationFinished
+
+Note that if the Profiler _does _block, it will delay garbage collection. This is harmless, as long as the Profiler code itself does not attempt to allocate space in the managed heap, which could induce deadlock.
+
+Using COM
+---------
+
+Though the profiling API interfaces are defined as COM interfaces, the runtime does not actually initialize COM in order to use them. This is in order to avoid having to set the threading model via CoInitialize before the managed application has had a chance to specify its desired threading model. Similarly, the profiler itself should not call CoInitialize, since it may pick a threading model that is incompatible with the application being profiled and therefore break the app.
+
+Callbacks and Stack Depth
+-------------------------
+
+Profiler callbacks may be issued in extremely stack-constrained circumstances, and a stack overflow within a profiler callback will lead to immediate process exit. A profiler should be careful to use as little stack as possible in response to callbacks. If the profiler is intended for use against processes that are robust against stack overflow, the profiler itself should also avoid triggering stack overflow.
+
+How to profile a NT Service
+---------------------------
+
+Profiling is enabled through environment variables, and since NT Services are started when the Operating System boots, those environment variables must be present and set to the required value at that time. Thus, to profile an NT Service, the appropriate environment variables must be set in advance, system-wide, via:
+
+MyComputer -> Properties -> Advanced -> EnvironmentVariables -> System Variables
+
+Both **Cor\_Enable\_Profiling** and **COR\_PROFILER have to be set** , and the user must ensure that the Profiler DLL is registered. Then, the target machine should be re-booted so that the NT Services pick up those changes. Note that this will enable profiling on a system-wide basis. So, to prevent every managed application that is run subsequently from being profiled, the user should delete those system environment variables after the re-boot.
+
+Profiling API – High-Level Description
+======================================
+
+Loader Callbacks
+----------------
+
+The loader callbacks are those issued for app domain, assembly, module, and class loading.
+
+One might expect that the CLR would notify an assembly load, followed by one or more module loads for that assembly. However, what actually happens depends on any number of factors within the implementation of the loader. The profiler may depend on the following:
+
+- A Started callback will be delivered before the Finished callback for the same ID.
+- Started and Finished callbacks will be delivered on the same thread.
+
+Though the loader callbacks are arranged in Started/Finished pairs, they cannot be used to accurately attribute time to operations within the loader.
+
+Call stacks
+-----------
+
+The profiling API provides two ways of obtaining call stacks—a snapshot method, suitable for sparse gathering of callstacks, and a shadow-stack method, suitable for tracking the callstack at every instant.
+
+### Stack Snapshot
+
+A stack snapshot is a trace of the stack of a thread at an instant in time. The profiling API provides support for tracing the managed functions on the stack, but leaves the tracing of unmanaged functions to the profiler's own stack walker.
+
+### Shadow Stack
+
+Using the above snapshot method too frequently can quickly become a performance issue. When stack traces need to be taken often, profilers should instead build a "shadow stack" using the FunctionEnter, FunctionLeave, FunctionTailCall, and Exception\* callbacks. The shadow stack is always current and can be quickly copied to storage whenever a stack snapshot is needed.
+
+A shadow stack may obtain function arguments, return values, and information about generic instantiations. This information is only available through the shadow stack, because it's readily available at function-enter time but may have been optimized away later in the run of the function.
+
+Garbage Collection
+------------------
+
+When the profiler specifies the COR\_PRF\_MONITOR\_GC flag, all the GC events will be triggered in the profiler except the _ICorProfilerCallback::ObjectAllocated_ events. They are explicitly controlled by another flag (see next section), for performance reasons. Note that when the COR\_PRF\_MONITOR\_GC is enabled, the Concurrent Garbage Collection is turned off.
+
+A profiler may use the GarbageCollectionStarted/Finished callbacks to identify that a GC is taking place, and which generations are covered.
+
+### Tracking Moved Objects
+
+Garbage collection reclaims the memory occupied by 'dead' objects and compacts that freed space. As a result, live objects are moved within the heap. The effect is that _ObjectIDs_ handed out by previous notifications change their value (the internal state of the object itself does not change (other than its references to other objects), just its location in memory, and therefore its _ObjectID_). The _MovedReferences_ notification lets a profiler update its internal tables that are tracking info by _ObjectID_. Its name is somewhat misleading, as it is issued even for objects that were not moved.
+
+The number of objects in the heap can number thousands or millions. With such large numbers, it's impractical to notify their movement by providing a before-and-after ID for each object. However, the garbage collector tends to move contiguous runs of live objects as a 'bunch' – so they end up at new locations in the heap, but they are still contiguous. This notification reports the "before" and "after" _ObjectID_ of these contiguous runs of objects. (see example below)
+
+In other words, if an _ObjectID_ value lies within the range:
+
+ _oldObjectIDRangeStart[i] <= ObjectID < oldObjectIDRangeStart[i] + cObjectIDRangeLength[i]_
+
+ for _0 <= i < cMovedObjectIDRanges_, then the _ObjectID_ value has changed to
+
+ _ObjectID - oldObjectIDRangeStart[i] + newObjectIDRangeStart[i]_
+
+All of these callbacks are made while the Runtime is suspended, so none of the _ObjectID_ values can change until the Runtime resumes and another GC occurs.
+
+**Example:** The diagram below shows 10 objects, before garbage collection. They lie at start addresses (equivalent to _ObjectIDs_) of 08, 09, 10, 12, 13, 15, 16, 17, 18 and 19. _ObjectIDs_ 09, 13 and 19 are dead (shown shaded); their space will be reclaimed during garbage collection.
+
+![Garbage Collection]: profiling-gc.png
+
+The "After" picture shows how the space occupied by dead objects has been reclaimed to hold live objects. The live objects have been moved in the heap to the new locations shown. As a result, their _ObjectIDs_ all change. The simplistic way to describe these changes is with a table of before-and-after _ObjectIDs_, like this:
+
+| | oldObjectIDRangeStart[] | newObjectIDRangeStart[] |
+|:--:|:-----------------------:|:-----------------------:|
+| 0 | 08 | 07 |
+| 1 | 09 | |
+| 2 | 10 | 08 |
+| 3 | 12 | 10 |
+| 3 | 13 | |
+| 4 | 15 | 11 |
+| 5 | 16 | 12 |
+| 6 | 17 | 13 |
+| 7 | 18 | 14 |
+| 8 | 19 | |
+
+This works, but clearly, we can compact the information by specifying starts and sizes of contiguous runs, like this:
+
+| | oldObjectIDRangeStart[] | newObjectIDRangeStart[] | cObjectIDRangeLength[] |
+|:--:|:-----------------------:|:-----------------------:|:----------------------:|
+| 0 | 08 | 07 | 1 |
+| 1 | 10 | 08 | 3 |
+| 2 | 15 | 11 | 4 |
+
+This corresponds to exactly how _MovedReferences_ reports the information. Note that _MovedReferencesCallback_ is reporting the new layout of the object BEFORE they actually get relocated in the heap. So the old _ObjectIDs_ are still valid for calls to the _ICorProfilerInfo_ interface (and the new _ObjectIDs_ are not).
+
+#### Detecting All Deleted Objects
+
+MovedReferences will report all objects that survive a compacting GC, regardless of whether they move; anything not reported did not survive. However not all GC's are compacting.
+
+The profiler may call ICorProfilerInfo2::GetGenerationBounds to get the boundaries of the GC heap segments. The rangeLength field in the resulting COR\_PRF\_GC\_GENERATION\_RANGE structs can be used to figure out the extent of live objects in a compacted generation.
+
+The GarbageCollectionStarted callback indicates which generations are being collected by the current GC. All objects that are in a generation that is not being collected will survive the GC.
+
+For a non-compacting GC (a GC in which no objects get moved at all), the SurvivingReferences callback is delivered to indicate which objects survived the GC.
+
+Note that a single GC may be compacting for one generation and non-compacting for another. Any given generation will receive either SurvivingReferences callbacks or MovedReferences callbacks for a given GC, but not both.
+
+#### Remarks
+
+The application is halted following a garbage collection until the Runtime is done passing information about the heap to the code profiler. The method _ICorProfilerInfo::GetClassFromObject_ can be used to obtain the _ClassID_ of the class of which the object is an instance. The method _ICorProfilerInfo::GetTokenFromClass_ can be used to obtain metadata information about the class.
+
+RootReferences2 allows the profiler to identify objects held via special handles. The generation bounds information supplied by GetGenerationBounds combined with the collected-generation information supplied by GarbageCollectionStarted enable the profiler to identify objects that live in generations that were not collected.
+
+Object Inspection
+-----------------
+
+The FunctionEnter2/Leave2 callbacks provide information about the arguments and return value of a function, as regions of memory. The arguments are stored left-to-right in the given memory regions. A profiler can use the metadata signature of the function to interpret the arguments, as follows:
+
+| **ELEMENT\_TYPE** | **Representation** |
+| -------------------------------------- | -------------------------- |
+| Primitives (ELEMENT\_TYPE <= R8, I, U) | Primitive values |
+| Value types (VALUETYPE) | Depends on type |
+| Reference types (CLASS, STRING, OBJECT, ARRAY, GENERICINST, SZARRAY) | ObjectID (pointer into GC heap) |
+| BYREF | Managed pointer (NOT an ObjectID, but may be pointing to stack or GC heap) |
+| PTR | Unmanaged pointer (not movable by GC) |
+| FNPTR | Pointer-sized opaque value |
+| TYPEDBYREF | Managed pointer, followed by a pointer-sized opaque value |
+
+The differences between an ObjectID and a managed pointer are:
+
+- ObjectID's only point into the GC heap or frozen object heap. Managed pointers may point to the stack as well.
+- ObjectID's always point to the beginning of an object. Managed pointers may point to one of its fields.
+- Managed pointers cannot be passed to functions that expect an ObjectID
+
+### Inspecting Complex Types
+
+Inspecting reference types or non-primitive value types requires some advanced techniques.
+
+For value types and reference types other than strings or arrays, GetClassLayout provides the offset for each field. The profiler can then use the metadata to determine the type of the field and recursively evaluate it. (Note that GetClassLayout returns only the fields defined by the class itself; fields defined by the parent class are not included.)
+
+For boxed value types, GetBoxClassLayout provides the offset of the value type within the box. The layout of the value type itself does not change, so once the profiler has found the value type within the box, it can use GetClassLayout to understand its layout.
+
+For strings, GetStringClassLayout provides the offsets of interesting pieces of data in the string object.
+
+Arrays are somewhat special, in that to understand arrays a function must be called for every array object, rather than just for the type. (This is because there are too many formats of arrays to describe using offsets.) GetArrayObjectInfo is provided to do the interpretation.
+
+@TODO: Callbacks from which inspection is safe
+
+@TODO: Functions that are legal to call when threads are hard-suspended
+
+### Inspecting Static Fields
+
+GetThreadStaticAddress, GetAppDomainStaticAddress, GetContextStaticAddress, and GetRVAStaticAddress provide information about the location of static fields. Looking at the memory at that location, you interpret it as follows:
+
+- Reference types: ObjectID
+- Value types: ObjectID of box containing the actual value
+- Primitive types: Primitive value
+
+There are four types of statics. The following table describes what they are and how to identify them.
+
+| **Static Type** | **Definition** | **Identifying in Metadata** |
+| --------------- | -------------- | --------------------------- |
+| AppDomain | Your basic static field—has a different value in each app domain. | Static field with no attached custom attributes |
+| Thread | Managed TLS—a static field with a unique value for each thread and each app domain. | Static field with System.ThreadStaticAttribute |
+| RVA | Process-scoped static field with a home in the module's data section | Static field with hasRVA flag |
+| Context | Static field with a different value in each COM+ Context | Static field with System.ContextStaticAttribute |
+
+Exceptions
+----------
+
+Notifications of exceptions are the most difficult of all notifications to describe and to understand. This is because of the inherent complexity in exception processing. The set of exception notifications described below was designed to provide all the information required for a sophisticated profiler – so that, at every instant, it can keep track of which pass (first or second), which frame, which filter and which finally block is being executed, for every thread in the profilee process. Note that the Exception notifications do not provide any _threadID's_ but a profiler can always call _ICorProfilerInfo::GetCurrentThreadID_ to discover which managed thread throws the exception.
+
+![Exception callback sequence]: profiling-exception-callback-sequence.png
+
+The figure above displays how the code profiler receives the various callbacks, when monitoring exception events. Each thread starts out in "Normal Execution." When the thread is in a state within the big gray box, the exception system has control of the thread—any non-exception-related callbacks (e.g. ObjectAllocated) that occur while the thread is in one of these states may be attributed to the exception system itself. When the thread is in a state outside of the big gray box, it is running arbitrary managed code.
+
+### Nested Exceptions
+
+Threads that have transitioned into managed code in the midst of processing an exception could throw another exception, which would result in a whole new pass of exception handling (the "New EH Pass" boxes above). If such a "nested" exception escapes the filter/finally/catch from the original exception, it can affect the original exception:
+
+- If the nested exception occurred within a filter, and escapes the filter, the filter will be considered to return "false" and the first pass will continue.
+- If the nested exception occurred within a finally, and escapes the finally, the original exception's processing will never resume.
+- If the nested exception occurred within a catch, and escapes the catch, the original exception's processing will never resume.
+
+### Unmanaged Handlers
+
+An exception might be handled in unmanaged code. In this case, the profiler will see the unwind phase, but no notification of any catch handlers. Execution will simply resume normally in the unmanaged code. An unmanaged-aware profiler will be able to detect this, but a managed-only profiler may see any number of things, including but not limited to:
+
+- An UnmanagedToManagedTransition callback as the unmanaged code calls or returns to managed code.
+- Thread termination (if the unmanaged code was at the root of the thread).
+- App termination (if the unmanaged code terminates the app).
+
+### CLR Handlers
+
+An exception might be handled by the CLR itself. In this case, the profiler will see the unwind phase, but no notification of any catch handlers. It may see execution resume normally in managed or unmanaged code.
+
+### Unhandled Exceptions
+
+By default, an unhandled exception will lead to process termination. If an application has locked back to the legacy exception policy, an unhandled exception on certain kinds of threads may only lead to thread termination.
+
+Code Generation
+---------------
+
+### Getting from IL to Native Code
+
+The IL in a .NET assembly may get compiled to native code in one of two ways: it may get JIT-compiled at run time, or it may be compiled into a "native image" by a tool called NGEN.exe (or CrossGen.exe for CoreCLR). Both the JIT-compiler and NGEN have a number of flags that control code generation.
+
+At the time an assembly is loaded, the CLR first looks for a native image for the assembly. If no native image is found with the right set of code-generation flags, the CLR will JIT-compile the functions in the assembly as they are needed during the run. Even when a native image is found and loaded, the CLR may end up JIT-compiling some of the functions in the assembly.
+
+### Profiler Control over Code-Generation
+
+The profiler has control over code generation, as described below:
+
+| **Flag** | **Effect** |
+| ------------------------------ | --- |
+| COR\_PRF\_USE\_PROFILE\_IMAGES | Causes the native image search to look for profiler-enhanced images (ngen /profile).Has no effect on JITted code. |
+| COR\_PRF\_DISABLE\_INLINING | Has no effect on the native image search.If JITting, disables inlining. All other optimizations remain in effect. |
+| COR\_PRF\_DISABLE\_OPTIMIZATIONS | Has no effect on the native image search.If JITting, disables all optimizations, including inlining. |
+| COR\_PRF\_MONITOR\_ENTERLEAVE | Causes the native image search to look for profiler-enhanced images (ngen /profile).If JITting, inserts enter/leave hooks into the generated code. |
+| COR\_PRF\_MONITOR\_CODE\_TRANSITIONS | Causes the native image search to look for profiler-enhanced images (ngen /profile).If JITting, inserts hooks at managed/unmanaged transition points. |
+
+### Profilers and Native Images
+
+When NGEN.exe creates a native image, it does much of the work that the CLR would have done at run-time—for example, class loading and method compilation. As a result, in cases where work was done at NGEN time, certain profiler callbacks will not be received at run-time:
+
+- JITCompilation\*
+- ClassLoad\*, ClassUnload\*
+
+To deal with this situation, profilers that do not wish to perturb the process by requesting profiler-enhanced native images should be prepared to lazily gather any data required about FunctionIDs or ClassIDs as they are encountered.
+
+### Profiler-Enhanced Native Images
+
+Creating a native image with NGEN /profile turns on a set of code-generation flags that make the image easier to profile:
+
+- Enter/leave hooks are inserted into the code.
+- Managed/unmanaged transition hooks are inserted into the code.
+- JITCachedFunctionSearch notifications are given as each function in the native image is invoked for the first time.
+- ClassLoad notifications are given as each class in the native image is used for the first time.
+
+Because profiler-enhanced native images differ significantly from regular ones, profilers should only use them when the extra perturbation is acceptable.
+
+TODO: Instrumentation
+
+TODO: Remoting
+
+Security Issues in Profiling
+============================
+
+A profiler DLL is an unmanaged DLL that is effectively running as part of the CLR's execution engine itself. As a result, the code in the profiler DLL is not subject to the restrictions of managed code-access security, and the only limitations on it are those imposed by the OS on the user running the profiled application.
+
+Combining Managed and Unmanaged Code in a Code Profiler
+=======================================================
+
+A close review of the CLR Profiling API creates the impression that you could write a profiler that has managed and unmanaged components that call to each other through COM Interop or ndirect calls.
+
+Although this is possible from a design perspective, the CLR Profiling API does not support it. A CLR profiler is supposed to be purely unmanaged. Attempts to combine managed and unmanaged code from a CLR profiler can cause crashes, hangs and deadlocks. The danger is clear since the managed parts of the profiler will "fire" events back to its unmanaged component, which subsequently would call into the managed part of the profiler etc. The danger at this point is clear.
+
+The only location that a CLR profiler could invoke managed code safely would be through replacement of the MSIL body of a method. The profiler before the JIT-compilation of a function is completed inserts managed calls in the MSIL body of a method and then lets the JIT compile it. This technique can successfully be used for selective instrumentation of managed code, or it can be used to gather statistics and times about the JIT.
+
+Alternatively a code profiler could insert native "hooks" in the MSIL body of every managed function that call into unmanaged code. That technique could be used for instrumentation and coverage. For example a code profiler could be inserting instrumentation hooks after every MSIL block to ensure that the block has been executed. The modification of the MSIL body of a method is very delicate operation and there are many factors that should be taken into consideration.
+
+Profiling Unmanaged Code
+========================
+
+There is minimal support in the Runtime profiling interfaces for profiling unmanaged code. The following functionality is provided:
+
+- Enumeration of stack chains. This allows a code profiler to determine the boundary between managed code and unmanaged code.
+- Determine if a stack chain corresponds to managed or native code.
+
+These methods are available through the in-process subset of the CLR debugging API. These are defined in the CorDebug.IDL and explained in DebugRef.doc, please refer to both for more details.
+
+Sampling Profilers
+==================
+
+Hijacking
+---------
+
+Some sampling profilers operate by hijacking the thread at sample time and forcing it to do the work of the sample. This is a very tricky practice that we do not recommend. The rest of this section is mostly to discourage you from going this way.
+
+### Timing of Hijacks
+
+A hijacking profiler must track the runtime suspension events (COR\_PRF\_MONITOR\_SUSPENDS). The profiler should assume that when it returns from a RuntimeThreadSuspended callback, the runtime will hijack that thread. The profiler must avoid having its hijack conflict with the runtime's hijack. To do so, the profiler must ensure that:
+
+1. The profiler does not attempt to hijack a thread between RuntimeThreadSuspended and RuntimeThreadResumed.
+1. If the profiler has begun hijacking before the RuntimeThreadSuspended callback was issued, the callback does not return before the hijack completes.
+
+This can be accomplished by some simple synchronization.
+
+#### Initializing the Runtime
+
+If the profiler has its own thread on which it will be calling ICorProfilerInfo functions, it needs to ensure that it calls one such function before doing any thread suspensions. This is because the runtime has per-thread state that needs to be initialized with all other threads running to avoid possible deadlocks.
diff --git a/Documentation/botr/readytorun-overview.md b/Documentation/botr/readytorun-overview.md
new file mode 100644
index 0000000..9e9f334
--- /dev/null
+++ b/Documentation/botr/readytorun-overview.md
@@ -0,0 +1,335 @@
+Managed Executables with Native Code
+===
+
+# Motivation
+
+Since shipping the .NET Runtime over 10 years ago, there has only been one file format which can be used to distribute and deploy managed code components: the CLI file format. This format expresses all execution as machine independent intermediate language (IL) which must either be interpreted or compiled to native code sometime before the code is run. This lack of an efficient, directly executable file format is a very significant difference between unmanaged and managed code, and has become more and more problematic over time. Problems include:
+
+- Native code generation takes a relatively long time and consumes power.
+- For security / tamper-resistance, there is a very strong desire to validate any native code that gets run (e.g. code is signed).
+- Existing native codegen strategies produce brittle code such that when the runtime or low level framework is updated, all native code is invalidated, which forces the need for recompilation of all that code.
+
+All of these problems and complexity are things that unmanaged code simply avoids. They are avoided because unmanaged code has a format with the following characteristics:
+
+- The executable format can be efficiently executed directly. Very little needs to be updated at runtime (binding _some_ external references) to prepare for execution. What does need to be updated can be done lazily.
+- As long as a set of known versioning rules are followed, version compatible changes in one executable do not affect any other executable (you can update your executables independently of one another).
+- The format is clearly defined, which allows variety of compilers to produce it.
+
+In this proposal we attack this discrepancy between managed and unmanaged code head on: by giving managed code a file format that has the characteristics of unmanaged code listed above. Having such a format brings managed up to at least parity with unmanaged code with respect to deployment characteristics. This is a huge win!
+
+
+## Problem Constraints
+
+The .NET Runtime has had a native code story (NGEN) for a long time. However what is being proposed here is architecturally different than NGEN. NGEN is fundamentally a cache (it is optional and only affects the performance of the app) and thus the fragility of the images was simply not a concern. If anything changes, the NGEN image is discarded and regenerated. On the other hand:
+
+**A native file format carries a strong guarantee that the file will continue to run despite updates and improvements to the runtime or framework.**
+
+Most of this proposal is the details of achieving this guarantee while giving up as little performance as possible.
+
+This compatibility guarantee means that, unlike NGEN, anything you place in the file is a _liability_ because you will have to support it in all future runtimes. This drives a desire to be 'minimalist' and only place things into the format that really need to be there. For everything we place into the format we have to believe either:
+
+1. It is very unlikely to change (in particular we have not changed it over the current life of CLR)
+2. We have a scheme in which we can create future runtimes that could support both old and new format efficiently (both in terms of runtime efficiency and engineering complexity).
+
+Each feature of the file format needs to have an answer to the question of how it versions, and we will be trying to be as 'minimalist' as possible.
+
+
+## Solution Outline
+
+As mentioned, while NGEN is a native file format, it is not an appropriate starting point for this proposal because it is too fragile.
+
+Looking carefully at the CLI file format shows that it is really 'not that bad' as a starting point. At its heart CLI is a set of database-like tables (one for types, methods, fields, etc.), which have entries that point at variable-length things (e.g. method names, signatures, method bodies). Thus CLI is 'pay for play' and since it is already public and version resilient, there is very little downside to including it in the format. By including it we also get the following useful properties:
+
+- Immediate support for _all_ features of the runtime (at least for files that include complete CLI within them)
+- The option to only add the 'most important' data required to support fast, direct execution. Everything else can be left in CLI format and use the CLI code paths. This is quite valuable given our desire to be minimalist in augmenting the format.
+
+Moreover there is an 'obvious' way of extending the CIL file to include the additional data we need. A CLI file has a well-defined header structure, and that header already has a field that can point of to 'additional information'. This is used today in NGEN images. We would use this same technique to allow the existing CLI format to include a new 'Native Header' that would then point at any additional information needed to support fast, direct execution.
+
+The most important parts of this extra information include:
+
+1. Native code for the methods (as well as a way of referencing things outside the module)
+2. Garbage Collection (GC) information for each method that allows you to know what values in registers and on the stack are pointers to the GC heap wherever a GC is allowed.
+3. Exception handling (EH) tables that allow an exception handler to be found when an exception is thrown.
+4. A table that allows the GC and EH to be found given just the current instruction pointer (IP) within the code. (IP map).
+5. A table that links the information in the metadata to the corresponding native structure.
+
+That is, we need something to link the world of metadata to the world of native. We can't eliminate meta-data completely because we want to support existing functionality. In particular we need to be able to support having other CLI images refer to types, methods and fields in this image. They will do so by referencing the information in the metadata, but once they find the target in the metadata, we will need to find the actual native code or type information corresponding to that meta-data entry. This is the purpose of the additional table. Effectively, this table is the 'export' mechanism for managed references.
+
+Some of this information can be omitted or stored in more efficient form, e.g.:
+
+- The garbage collection information can be omitted for environments with conservative garbage collection, such as IL2CPP.
+- The full metadata information is not strictly required for 'private' methods or types so it is possible to strip it from the CLI image.
+- The metadata can be stored in more efficient form, such as the .NET Native metadata format.
+- The platform native executable format (ELF, Mach-O) can be used as envelope instead of PE to take advantage of platform OS loader.
+
+
+## Definition of Version Compatibility for Native Code
+
+Even for IL or unmanaged native code, there are limits to what compatible changes can be made. For example, deleting a public method is sure to be an incompatible change for any extern code using that method.
+
+Since CIL already has a set of [compatibility rules](https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/breaking-changes.md), ideally the native format would have the same set of compatibility rules as CIL. Unfortunately, that is difficult to do efficiently in all cases. In those cases we have multiple choices:
+
+1. Change the compatibility rules to disallow some changes
+2. Never generate native structures for the problematic cases (fall back to CIL techniques)
+3. Generate native structures for the problematic cases, but use them only if there was no incompatible change made
+4. Generate less efficient native code that is resilient
+
+Generally the hardest versioning issues revolve around:
+
+- Value types (structs)
+- Generic methods over value types (structs)
+
+These are problematic because value classes are valuable precisely _because_ they have less overhead than classes. They achieve this value by being 'inlined' where they are used. This makes the code generated for value classes very fragile with respect to any changes to the value class's layout, which is bad for resilience. Generics over structs have a similar issue.
+
+Thus this proposal does _not_ suggest that we try to solve the problem of having version resilience in the presence of layout changes to value types. Instead we suggest creating a new compatibility rule:
+
+**It is a breaking change to change the number or type of any (including private) fields of a public value type (struct). However if the struct is non-public (that is internal), and not reachable from any nesting of value type fields in any public value type, then the restriction does not apply.**
+
+This is a compatibility that is not present for CIL. All other changes allowed by CIL can be allowed by native code without prohibitive penalty. In particular the following changes are allowed:
+
+1. Adding instance and static fields to reference classes
+2. Adding static fields to a value class.
+3. Adding virtual, instance or static methods to a reference or value class
+4. Changing existing methods (assuming the semantics is compatible).
+5. Adding new classes.
+
+
+## Version Bubbles
+
+When changes to managed code are made, we have to make sure that all the artifacts in a native code image _only_ depend on information in other modules that _cannot_ _change_ without breaking the compatibility rules. What is interesting about this problem is that the constraints only come into play when you _cross_ module boundaries.
+As an example, consider the issue of inlining of method bodies. If module A would inline a method from Module B, that would break our desired versioning property because now if that method in module B changes, there is code in Module A that would need to be updated (which we do not wish to do). Thus inlining is illegal across modules. Inlining _within_ a module, however, is still perfectly fine.
+
+Thus in general the performance impact of versioning decreases as module size increases because there are fewer cross-module references. We can take advantage of this observation by defining something called a version bubble. **A version bubble is a set of DLLs that we are willing to update as a set.** From a versioning perspective, this set of DLLs is a single module. Inlining and other cross-module optimizations are allowed within a version bubble.
+
+It is worth reiterating the general principle covered in this section
+
+**Code of methods and types that do NOT span version bubbles does NOT pay a performance penalty.**
+
+This principle is important because it means that only a fraction (for most apps a small fraction) of all code will pay any performance penalties we discuss in the sections that follow.
+
+The extreme case is where the entire application is a single version bubble. This configuration does not need to pay any performance penalty for respecting versioning rules. It still benefits from a clearly defined file format and runtime contract that are the essential part of this proposal.
+
+## Runtime Versioning
+
+The runtime versioning is solved using different techniques because the runtime is responsible for interpretation of the binary format.
+
+To allow changes in the runtime, we simply require that the new runtime handle all old formats as well as the new format. The 'main defense' in the design of the file format is having version numbers on important structures so that the runtime has the option of supporting a new version of that structure as well as the old version unambiguously by checking the version number. Fundamentally, we are forcing the developers of the runtime to be aware of this constraint and code and test accordingly.
+
+### Restrictions on Runtime Evolution
+
+As mentioned previously, when designing for version compatibility we have the choice of either simply disallowing a change (by changing the breaking change rules), or insuring that the format is sufficiently flexible to allow evolution. For example, for managed code we have opted to disallow changes to value type (struct) layout so that codegen for structs can be efficient. In addition, the design also includes a small number of restrictions that affect the flexibility of evolving the runtime itself. They are:
+
+- The field layout of `System.Object` cannot change. (First, there is a pointer sized field for type information and then the other fields.)
+- The field layout of arrays cannot change. (First, there is a pointer sized field for type information, and then a pointer sized field for the length. After these fields is the array data, packed using existing alignment rules.)
+- The field layout of `System.String` cannot change. (First, there is a pointer sized field for type information, and then a int32 sized field for the length. After these fields is the zero terminated string data in UTF16 encoding.)
+
+These restrictions were made because the likelihood of ever wanting to change these restrictions is low, and the performance cost _not_ having these assumptions is high. If we did not assume the field layout of `System.Object` never changes, then _every_ field fetch object outside the framework itself would span a version bubble and pay a penalty. Similarly if we don't assume the field layout for arrays or strings, then every access will pay a versioning penalty.
+
+## Selective use of the JIT
+
+One final point that is worth making is that selective use of the JIT compiler is another tool that can be used to avoid code quality penalties associated with version resilience, in environments where JITing is permitted. For example, assume that there is a hot user method that calls across a version bubble to a method that would a good candidate for inlining, but is not inlined because of versioning constraints. For such cases, we could have an attribute that indicates that a particular method should be compiled at runtime. Since the JIT compiler is free to generate fragile code, it can perform this inlining and thus the program steady-state performance improves. It is true that a startup time cost has been paid, but if the number of such 'hot' methods is small, the amount of JIT compilation (and thus its penalty) is not great. The point is that application developers can make this determination on a case by case basis. It is very easy for the runtime to support this capability.
+
+
+# Version Resilient Native Code Generation
+
+Because our new native format starts with the current CLI format, we have the option of falling back to it whenever we wish to. Thus we can choose to add new parts to the format in chunks. In this section we talk about the 'native code' chunk. Here we discuss the parts of the format needed to emit native code for the bodies of 'ordinary' methods. Native images that have this addition information will not need to call the JIT compiler, but will still need to call the type loader to create types.
+
+It is useful to break the problem of generating version resilient native code by CIL instruction. Many CIL instructions (e.g. `ADD`, `MUL`, `LDLOC` ... naturally translate to native code in a version resilient ways. However CIL that deals with object model (e.g. `NEWOBJ`, `LDFLD`, etc) need special care as explained below. The descriptions below are roughly ordered in the performance priority in typical applications. Typically, each section will describe what code generation looks like when all information is within the version bubble, and then when the information crosses version bubbles. We use x64 as our native instruction set, applying the same strategy to other processor architectures is straightforward. We use the following trivial example to demonstrate the concepts
+
+ interface Intf
+ {
+ void intfMethod();
+ }
+
+ class BaseClass
+ {
+ static int sField;
+ int iField;
+
+ public void iMethod()
+ {
+ }
+
+ public virtual void vMethod(BaseClass aC)
+ {
+ }
+ }
+
+ class SubClass : BaseClass, Intf
+ {
+ int subField;
+
+ public override void vMethod(BaseClass aC)
+ {
+ }
+
+ virtual void intfMethod()
+ {
+ }
+ }
+
+## Instance Field access - LDFLD / STFLD
+
+The CLR stores fields in the 'standard' way, so if RCX holds a BaseClass then
+
+ MOV RAX, [RCX + iField_Offset]
+
+will fetch `iField` from this object. `iField_Offset` is a constant known at native code generation time. This is known at compile time only because we mandated that the field layout of `System.Object` is fixed, and thus the entire inheritance chain of `BaseClass` is in the version bubble. It's also true even when fields in `BaseClass` contain structs (even from outside the version bubble), because we have made it a breaking change to modify the field layout of any public value type. Thus for types whose inheritance hierarchy does not span a version bubble, field fetch is as it always was.
+
+To consider the inter-bubble case, assume that `SubClass` is defined in a different version bubble than BaseClass and we are fetching `subField`. The normal layout rules for classes require `subField` to come after all the fields of `BaseClass`. However `BaseClass` could change over time, so we can't wire in a literal constant anymore. Instead we require the following code
+
+ MOV TMP, [SIZE_OF_BASECLASS]
+ MOV EAX, [RCX + TMP + subfield_OffsetInSubClass]
+
+ .data // In the data section
+ SIZE_OF_BASECLASS: UINT32 // One per EXTERN CLASS that is subclassed
+
+Which simply assumes that a uint32 sized location has been reserved in the module and that it will be filled in with the size of `BaseClass` before this code is executed. Now a field fetch has one extra instruction, which fetches this size and that dynamic value is used to compute the field. This sequence is a great candidate for CSE (common sub-expression elimination) optimization when multiple fields of the same class are accessed by single method.
+
+A special attention needs to be given to alignment requirements of `SubClass`.
+
+### GC Write Barrier
+
+The .NET GC is generational, which means that most GCs do not collect the whole heap, and instead only collect the 'new' part (which is much more likely to contain garbage). To do this it needs to know the set of roots that point into this 'new' part. This is what the GC write barrier does. Every time an object reference that lives in the GC heap is updated, bookkeeping code needs to be called to log that fact. Any fields whose values were updated are used as potential roots on these partial GCs. The important part here is that any field update of a GC reference must do this extra bookkeeping.
+
+The write barrier is implemented as a set of helper functions in the runtime. These functions have special calling conventions (they do not trash any registers). Thus these helpers act more like instructions than calls. The write barrier logic does not need to be changed to support versioning (it works fine the way it is).
+
+
+### Initializing the field size information
+
+A key observation is that you only need this overhead for each distinct class that inherits across a version bubble. Thus there is unlikely to be many slots like `SIZE_OF_BASECLASS`. Because there are likely to be few of them, the compiler can choose to simply initialize them at module load.
+
+Note that if you accessed an instance field of a class that was defined in another module, it is not the size that you need but the offset of a particular field. The code generated will be the same (in fact it will be simpler as no displacement is needed in the second instruction). Our coding guidelines strongly discourage public instance fields so this scenario is not particularly likely in practice (it will end up being a property call) but we can handle it in a natural way. Note also that even complex inheritance hierarchies that span multiple version bubbles are not a problem. In the end all you need is the final size of the base type. It might take a bit longer to compute during one time initialization, but that is the extent of the extra cost.
+
+### Performance Impact
+
+Clearly we have added an instruction and thus made the code bigger and more expensive to run. However what is also true is that the additional cost is small. The 'worst' case would be if this field fetch was in a tight loop. To measure this we created a linked list element which inherited across a version bubble. The list was long (1K) but small enough to fit in the L1 cache. Even for this extreme example (which by the way is contrived, linked list nodes do not normally inherit in such a way), the extra cost was small (< 1%).
+
+### Null checks
+
+The managed runtime requires any field access on null instance pointer to generate null reference exception. To avoid inserting explicit null checks, the code generator assumes that memory access at addresses smaller than certain threshold (64k on Windows NT) will generate null reference exception. If we allowed unlimited growth of the base class for cross-version bubble inheritance hierarchies, this optimization would be no longer possible.
+
+To make this optimization possible, we will limit growth of the base class size for cross-module inheritance hierarchies. It is a new versioning restriction that does not exist in IL today.
+
+
+## Non-Virtual Method Calls - CALL
+
+### Intra-module call
+
+If RCX holds a `BaseClass` and the caller of `iMethod` is in the same module as BaseClass then a method call is simple machine call instruction
+
+ CALL ENTRY_IMETHOD
+
+### Inter-module call
+
+However if the caller is outside the module of BaseClass (even if it is in the same version bubble) we need to call it using an indirection
+
+ CALL [PTR_IMETHOD]
+
+ .data // In the data section
+ PTR_IMETHOD: PTR = RUNTIME_ENTRY_FIXUP_METHOD // One per call TARGET.
+
+Just like the field case, the pointer sized data slot `PTR_IMETHOD` must be fixed up to point at the entry point of `BaseClass.iMethod`. However unlike the field case, because we are fixing up a call (and not a MOV), we can have the call fix itself up lazily via standard delay loading mechanism.
+The delay loading mechanism often uses low-level tricks for maximum efficiency. Any low-level implementation of delay loading can be used as long as the resolution of the call target is left to the runtime.
+
+### Retained Flexibility for runtime innovation
+
+Note that it might seem that we have forever removed the possibility of innovating in the way we do SLOT fixup, since we 'burn' these details into the code generation and runtime helpers. However this is not true. What we have done is require that we support the _current_ mechanism for doing such fixup. Thus we must always support a `RUNTIME_ENTRY_FIXUP_METHOD` helper. However we could devise a completely different scheme. All that would be required is that you use a _new_ helper and _keep_ the old one. Thus you can have a mix of old and new native code in the same process without issue.
+
+### Calling Convention
+
+The examples above did not have arguments and the issue of calling convention was not obvious. However it is certainly true that the native code at the call site does depend heavily on the calling convention and that convention must be agreed to between the caller and the callee at least for any particular caller-callee pair.
+
+The issue of calling convention is not specific to managed code and thus hardware manufacturers typically define a calling convention that tends to be used by all languages on the system (thus allowing interoperability). In fact for all platforms except x86, CLR attempts to follow the platform calling convention.
+
+Our understanding of the most appropriate managed convention evolved over time. Our experience tells us that it is worthwhile for implementation simplicity to always pass managed `this` pointer in the fixed register, even if the platform standard calling convention says otherwise.
+
+#### Managed Code Specific Conventions
+
+In addition the normal conventions for passing parameters as well as the normal convention of having a hidden byref parameter for returning value types, CLR has a few managed code specific argument conventions:
+
+1. Shared generic code has a hidden parameter that represents the type parameters in some cases for methods on generic types and for generic methods.
+2. GC interactions with hidden return buffer. The convention for whether the hidden return buffer can be allocated in the GC heap, and thus needs to be written to using write barrier.
+
+These conventions would be codified as well.
+
+### Performance Impact
+
+Because it was already the case that methods outside the current module had to use an indirect call, versionability does not introduce more overhead for non-virtual method calls if inlining was not done. Thus the main cost of making the native code version resilient is the requirement that no cross version bubble inlining can happen.
+
+The best solution to this problem is to avoid 'chatty' library designs (Unfortunately, `IEnumerable`, is such a chatty design, where each iteration does a `MoveNext` and `Current` property fetch). Another mitigation is the one mentioned previously: to allow clients of the library to selectively JIT compile some methods that make these chatty calls. Finally you can also use new custom `NonVersionableAttribute` attribute, which effectively changes the versioning contract to indicate that the library supplier has given up his right to change that method's body and thus it would be legal to inline.
+
+The proposal is to disallow cross-version bubble inlining by default, and selectively allow inlining for critical methods (by giving up the right to change the method).
+
+Experiments with disabled cross-module inlining with the selectively enabled inlining of critical methods showed no visible regression in ASP.NET throughput.
+
+## Non-Virtual calls as the baseline solution to all other versioning issues
+
+It is important to observe that once you have a mechanism for doing non-virtual function calls in a version resilient way (by having an indirect CALL through a slot that that can be fixed lazily at runtime, all other versioning problems _can_ be solved in that way by calling back to the 'definer' module, and having the operation occur there instead. Issues associated with this technique
+
+1. You will pay the cost of a true indirection function call and return, as well as any argument setup cost. This cost may be visible in constructs that do not contain a call naturally, like fetching string literals or other constants. You may be able to get better performance from another technique (for example, we did so with instance field access).
+2. It introduces a lot of indirect calls. It is not friendly to systems that disallow on the fly code generation. A small helper stub has to be created at runtime in the most straightforward implementation, or there has to be a scheme how to pre-create or recycle the stubs.
+3. It requires that the defining assembly 'know' the operations that it is responsible for defining. In general this could be fixed by JIT compiling whatever is needed at runtime (where the needed operations are known), but JIT compiling is the kind of expensive operation that we are trying to avoid at runtime.
+
+So while there are limitations to the technique, it works very well on a broad class of issues, and is conceptually simple. Moreover, it has very nice simplicity on the caller side (a single indirect call). It is hard to get simpler than this. This simplicity means that you have wired very few assumptions into the caller which maximizes the versioning flexibility, which is another very nice attribute. Finally, this technique also allows generation of optimal code once the indirect call was made. This makes for a very flexible technique that we will use again and again.
+
+The runtime currently supports two mechanisms for virtual dispatch. One mechanism is called virtual stub dispatch (VSD). It is used when calling interface methods. The other is a variation on traditional vtable-based dispatch and it is used when a non-interface virtual is called. We first discuss the VSD approach.
+
+Assume that RCX holds a `Intf` then the call to `intfMethod()` would look like
+
+ CALL [PTR_CALLSITE]
+ .data // in the data section
+ PTR_CALLSITE: INT_PTR = RUNTIME_ENTRY_FIXUP_METHOD // One per call SITE.
+
+This looks same as the cross-module, non-virtual case, but there are important differences. Like the non-virtual case there is an indirect call through a pointer that lives in the module. However unlike the non-virtual case, there is one such slot per call site (not per target). What is in this slot is always guaranteed to get to the target (in this case to `Intf.intfMethod()`), but it is expected to change over time. It starts out pointing to a 'dumb' stub which simply calls a runtime helper that does the lookup (in likely a slow way). However, it can update the `PTR_CALLSITE` slot to a stub that efficiently dispatches to the interface for the type that actually occurred (the remaining details of stubbed based interface dispatch are not relevant to versioning).
+
+The above description is accurate for the current CLR implementation for interface dispatch. What's more, is that nothing needs to be changed about the code generation to make it version resilient. It 'just works' today. Thus interface dispatch is version resilient with no performance penalty.
+
+What's more, we can actually see VSD is really just a modification of the basic 'indirect call through updateable slot' technique that was used for non-virtual method dispatch. The main difference is that because the target depends on values that are not known until runtime (the type of the 'this' pointer), the 'fixup' function can never remove itself completely but must always check this runtime value and react accordingly (which might include fixing up the slot again). To make as likely as possible that the value in the fixup slot stabilizes, we create a fixup slot per call site (rather than per target).
+
+### Vtable Dispatch
+
+The CLR current also supports doing virtual dispatch through function tables (vtables). Unfortunately, vtables have the same version resilience problem as fields. This problem can be fixed in a similar way, however unlike fields, the likelihood of having many cross bubble fixups is higher for methods than for instance fields. Further, unlike fields we already have a version resilient mechanism that works (VSD), so it would have to be better than that to be worth investing in. Vtable dispatch is only better than VSD for polymorphic call sites (where VSD needs to resort to a hash lookup). If we find we need to improve dispatch for this case we have some possible mitigations to try:
+
+1. If the polymorphism is limited, simply trying more cases before falling back to the hash table has been prototyped and seems to be a useful optimization.
+2. For high polymorphism case, we can explore the idea of dynamic vtable slots (where over time the virtual method a particular vtable slot holds can change). Before falling back to the hash table a virtual method could claim a vtable slot and now the dispatch of that method for _any_ type will be fast.
+
+In short, because of the flexibility and natural version resilience of VSD, we propose determining if VSD can be 'fixed' before investing in making vtables version resilient and use VSD for all cross version bubble interface dispatch. This does not preclude using vtables within a version bubble, nor adding support for vtable based dispatch in the future if we determine that VSD dispatch can't be fixed.
+
+
+## Object Creation - NEWOBJ / NEWARR
+
+Object allocation is always done by a helper call that allocates the uninitialized object memory (but does initialize the type information `MethodTable` pointer), followed by calling the class constructor. There are a number of different helpers depending on the characteristics of the type (does it have a finalizer, is it smaller than a certain size, ...).
+
+We will defer the choice of the helper to use to allocate the object to runtime. For example, to create an instance of `SubClass` the code would be:
+
+ CALL [NEWOBJ_SUBCLASS]
+ MOV RCX, RAX // EAX holds the new object
+ // If the constructor had parameters, set them
+ CALL SUBCLASS_CONSTRUCTOR
+
+ .data // In the data section
+ NEWOBJ_SUBCLASS: RUNTIME_ENTRY_FIXUP // One per type
+
+where the `NEWOBJ_SUBCLASS` would be fixed up using the standard lazy technique.
+
+The same technique works for creating new arrays (NEWARR instruction).
+
+
+## Type Casting - ISINST / CASTCLASS
+
+The proposal is to use the same technique as for object creation. Note that type casting could easily be a case where VSD techniques would be helpful (as any particular call might be monomorphic), and thus caching the result of the last type cast would be a performance win. However this optimization is not necessary for version resilience.
+
+
+## GC Information for Types
+
+To do its job the garbage collector must be able to take an arbitrary object in the GC heap and find all the GC references in that object. It is also necessary for the GC to 'scan' the GC from start to end, which means it needs to know the size of every object. Fast access to two pieces of information is what is needed.
+From a versioning perspective, the fundamental problem with GC information is that (like field offsets) it incorporates information from the entire inheritance hierarchy in general case. This means that the information is not version resilient.
+
+While it is possible to make the GC information resilient and have the GC use this resilient data, GC happens frequently and type loading happens infrequently, so arguably you should trade type loading speed for GC speed if given the choice. Moreover the size of the GC information is typically quite small (e.g. 12-32 bytes) and will only occur for those types that cross version bubbles. Thus forming the GC information on the fly (from a version resilient form) is a reasonable starting point.
+
+Another important observation is that `MethodTable` contains other very frequently accessed data, like flags indicating whether the `MethodTable` represents an array, or pointer to parent type. This data tends to change a lot with the evolution of the runtime. Thus, generating method tables at runtime will solve a number of other versioning issues in addition to the GC information versioning.
+
+# Current State
+
+The design and implementation is a work in progress under code name ReadyToRun (`FEATURE_READYTORUN`). RyuJIT is used as the code generator to produce the ReadyToRun images currently.
diff --git a/Documentation/botr/ryujit-overview.md b/Documentation/botr/ryujit-overview.md
new file mode 100644
index 0000000..ee84a9a
--- /dev/null
+++ b/Documentation/botr/ryujit-overview.md
@@ -0,0 +1,558 @@
+JIT Compiler Structure
+===
+
+# Introduction
+
+RyuJIT is the code name for the next generation Just in Time Compiler (aka “JIT”) for the AMD64 .NET runtime. Its first implementation is for the AMD64 architecture. It is derived from a code base that is still in use for the other targets of .NET.
+
+The primary design considerations for RyuJIT are to:
+
+* Maintain a high compatibility bar with previous JITs, especially those for x86 (jit32) and x64 (jit64).
+* Support and enable good runtime performance through code optimizations, register allocation, and code generation.
+* Ensure good throughput via largely linear-order optimizations and transformations, along with limitations on tracked variables for analyses (such as dataflow) that are inherently super-linear.
+* Ensure that the JIT architecture is designed to support a range of targets and scenarios.
+
+The first objective was the primary motivation for evolving the existing code base, rather than starting from scratch or departing more drastically from the existing IR and architecture.
+
+# Execution Environment and External Interface
+
+RyuJIT provides the just in time compilation service for the .NET runtime. The runtime itself is variously called the EE (execution engine), the VM (virtual machine) or simply the CLR (common language runtime). Depending upon the configuration, the EE and JIT may reside in the same or different executable files. RyuJIT implements the JIT side of the JIT/EE interfaces:
+
+* `ICorJitCompiler` – this is the interface that the JIT compiler implements. This interface is defined in [src/inc/corjit.h](https://github.com/dotnet/coreclr/blob/master/src/inc/corjit.h) and its implementation is in [src/jit/ee_il_dll.cpp](https://github.com/dotnet/coreclr/blob/master/src/jit/ee_il_dll.cpp). The following are the key methods on this interface:
+ * `compileMethod` is the main entry point for the JIT. The EE passes it a `ICorJitInfo` object, and the “info” containing the IL, the method header, and various other useful tidbits. It returns a pointer to the code, its size, and additional GC, EH and (optionally) debug info.
+ * `getVersionIdentifier` is the mechanism by which the JIT/EE interface is versioned. There is a single GUID (manually generated) which the JIT and EE must agree on.
+ * `getMaxIntrinsicSIMDVectorLength` communicates to the EE the largest SIMD vector length that the JIT can support.
+* `ICorJitInfo` – this is the interface that the EE implements. It has many methods defined on it that allow the JIT to look up metadata tokens, traverse type signatures, compute field and vtable offsets, find method entry points, construct string literals, etc. This bulk of this interface is inherited from `ICorJitDynamicInfo` which is defined in [src/inc/corinfo.h](https://github.com/dotnet/coreclr/blob/master/src/inc/corinfo.h). The implementation is defined in [src/vm/jitinterface.cpp](https://github.com/dotnet/coreclr/blob/master/src/vm/jitinterface.cpp).
+
+# Internal Representation (IR)
+
+## Overview of the IR
+
+The RyuJIT IR can be described at a high level as follows:
+
+* The Compiler object is the primary data structure of the JIT. Each method is represented as a doubly-linked list of `BasicBlock` objects. The Compiler object points to the head of this list with the `fgFirstBB` link, as well as having additional pointers to the end of the list, and other distinguished locations.
+ * `ICorJitCompiler::CompileMethod()` is invoked for each method, and creates a new Compiler object. Thus, the JIT need not worry about thread synchronization while accessing Compiler state. The EE has the necessary synchronization to ensure there is a single JIT’d copy of a method when two or more threads try to trigger JIT compilation of the same method.
+* `BasicBlock` nodes contain a list of doubly-linked statements with no internal control flow (there is an exception for the case of the qmark/colon operator)
+ * The `BasicBlock` also contains the dataflow information, when available.
+* `GenTree` nodes represent the operations and statement of the method being compiled.
+ * It includes the type of the node, as well as value number, assertions, and register assignments when available.
+* `LclVarDsc` represents a local variable, argument or JIT-created temp. It has a `gtLclNum` which is the identifier usually associated with the variable in the JIT and its dumps. The `LclVarDsc` contains the type, use count, weighted use count, frame or register assignment etc. These are often referred to simply as “lclVars”. They can be tracked (`lvTracked`), in which case they participate in dataflow analysis, and have a different index (`lvVarIndex`) to allow for the use of dense bit vectors.
+
+![RyuJIT IR Overview](../images/ryujit-ir-overview.png)
+
+The IR has two modes:
+
+* In tree-order mode, non-statement nodes (often described as expression nodes, though they are not always strictly expressions) are linked only via parent-child links (unidirectional). That is, the consuming node has pointers to the nodes that produce its input operands.
+* In linear-order mode, non-statement nodes have both parent-child links as well as execution order links (`gtPrev` and `gtNext`).
+ * In the interest of maintaining functionality that depends upon the validity of the tree ordering, the linear mode of the `GenTree` IR has an unusual constraint that the execution order must represent a valid traversal of the parent-child links.
+
+A separate representation, `insGroup` and `instrDesc`, is used during the actual instruction encoding.
+
+### Statement Order
+
+During the “front end” of the JIT compiler (prior to Rationalization), the execution order of the `GenTree` nodes on a statement is fully described by the “tree” order – that is, the links from the top node of a statement (the `gtStmtExpr`) to its children. The order is determined by a depth-first, left-to-right traversal of the tree, with the exception of nodes marked `GTF_REVERSE_OPS` on binary nodes, whose second operand is traversed before its first.
+
+After rationalization, the execution order can no longer be deduced from the tree order alone. At this point, the dominant ordering becomes “linear order”. This is because at this point any `GT_COMMA` nodes have been replaced by embedded statements, whose position in the execution order can only be determined by the `gtNext` and `gtPrev` links on the tree nodes.
+
+This modality is captured in the `fgOrder` flag on the Compiler object – it is either `FGOrderTree` or `FGOrderLinear`.
+
+## GenTree Nodes
+
+Each operation is represented as a GenTree node, with an opcode (GT_xxx), zero or more child `GenTree` nodes, and additional fields as needed to represent the semantics of that node.
+
+The `GenTree` nodes are doubly-linked in execution order, but the links are not necessarily valid during all phases of the JIT.
+
+The statement nodes utilize the same `GenTree` base type as the operation nodes, though they are not truly related.
+
+* The statement nodes are doubly-linked. The first statement node in a block points to the last node in the block via its `gtPrev` link. Note that the last statement node does *not* point to the first; that is, the list is not fully circular.
+* Each statement node contains two `GenTree` links – `gtStmtExpr` points to the top-level node in the statement (i.e. the root of the tree that represents the statement), while `gtStmtList` points to the first node in execution order (again, this link is not always valid).
+
+### Example of Post-Import IR
+
+For this snippet of code (extracted from [tests/src/JIT/CodeGenBringUpTests/DblRoots.cs](https://github.com/dotnet/coreclr/blob/master/tests/src/JIT/CodeGenBringUpTests/DblRoots.cs)):
+
+ r1 = (-b + Math.Sqrt(b*b - 4*a*c))/(2*a);
+
+A stripped-down dump of the `GenTree` nodes just after they are imported looks like this:
+
+ ▌ stmtExpr void (top level) (IL 0x000...0x026)
+ │ ┌──▌ lclVar double V00 arg0
+ │ ┌──▌ * double
+ │ │ └──▌ dconst double 2.00
+ │ ┌──▌ / double
+ │ │ │ ┌──▌ mathFN double sqrt
+ │ │ │ │ │ ┌──▌ lclVar double V02 arg2
+ │ │ │ │ │ ┌──▌ * double
+ │ │ │ │ │ │ │ ┌──▌ lclVar double V00 arg0
+ │ │ │ │ │ │ └──▌ * double
+ │ │ │ │ │ │ └──▌ dconst double 4.00
+ │ │ │ │ └──▌ - double
+ │ │ │ │ │ lclVar double V01 arg1
+ │ │ │ │ └──▌ * double
+ │ │ │ │ └──▌ lclVar double V01 arg1
+ │ │ └──▌ + double
+ │ │ └──▌ unary - double
+ │ │ └──▌ lclVar double V01 arg1
+ └──▌ = double
+ └──▌ indir double
+ └──▌ lclVar byref V03 arg3
+
+## Types
+
+The JIT is primarily concerned with “primitive” types, i.e. integers, reference types, pointers, and floating point types. It must also be concerned with the format of user-defined value types (i.e. struct types derived from `System.ValueType`) – specifically, their size and the offset of any GC references they contain, so that they can be correctly initialized and copied. The primitive types are represented in the JIT by the `var_types` enum, and any additional information required for struct types is obtained from the JIT/EE interface by the use of an opaque `CORINFO_CLASS_HANDLE`.
+
+## Dataflow Information
+
+In order to limit throughput impact, the JIT limits the number of lvlVars for which liveness information is computed. These are the tracked lvlVars (`lvTracked` is true), and they are the only candidates for register allocation.
+
+The liveness analysis determines the set of defs, as well as the uses that are upward exposed, for each block. It then propagates the liveness information. The result of the analysis is captured in the following:
+
+* The live-in and live-out sets are captured in the `bbLiveIn` and `bbLiveOut` fields of the `BasicBlock`.
+* The `GTF_VAR_DEF` flag is set on a lvlVar `GenTree` node that is a definition.
+* The `GTF_VAR_USEASG` flag is set (in addition to the `GTF_VAR_DEF` flag) for the target of an update (e.g. +=).
+* The `GTF_VAR_USEDEF` is set on the target of an assignment of a binary operator with the same lvlVar as an operand.
+
+## SSA
+
+Static single assignment (SSA) form is constructed in a traditional manner [[1]](#[1]). The SSA names are recorded on the lvlVar references. While SSA form usually retains a pointer or link to the defining reference, RyuJIT currently retains only the `BasicBlock` in which the definition of each SSA name resides.
+
+## Value Numbering
+
+Value numbering utilizes SSA for lvlVar values, but also performs value numbering of expression trees. It takes advantage of type safety by not invalidating the value number for field references with a heap write, unless the write is to the same field. The IR nodes are annotated with the value numbers, which are indexes into a type-specific value number store. Value numbering traverses the trees, performing symbolic evaluation of many operations.
+
+# Phases of RyuJIT
+
+The top-level function of interest is `Compiler::compCompile`. It invokes the following phases in order.
+
+| **Phase** | **IR Transformations** |
+| --- | --- |
+|[Pre-import](#pre-import)|`Compiler->lvaTable` created and filled in for each user argument and variable. BasicBlock list initialized.|
+|[Importation](#importation)|`GenTree` nodes created and linked in to Statements, and Statements into BasicBlocks. Inlining candidates identified.|
+|[Inlining](#inlining)|The IR for inlined methods is incorporated into the flowgraph.|
+|[Struct Promotion](#struct-promotion)|New lvlVars are created for each field of a promoted struct.|
+|[Mark Address-Exposed Locals](#mark-addr-exposed)|lvlVars with references occurring in an address-taken context are marked. This must be kept up-to-date.|
+|[Morph Blocks](#morph-blocks)|Performs localized transformations, including mandatory normalization as well as simple optimizations.|
+|[Eliminate Qmarks](#eliminate-qmarks)|All `GT_QMARK` nodes are eliminated, other than simple ones that do not require control flow.|
+|[Flowgraph Analysis](#flowgraph-analysis)|`BasicBlock` predecessors are computed, and must be kept valid. Loops are identified, and normalized, cloned and/or unrolled.|
+|[Normalize IR for Optimization](#normalize-ir)|lvlVar references counts are set, and must be kept valid. Evaluation order of `GenTree` nodes (`gtNext`/`gtPrev`) is determined, and must be kept valid.|
+|[SSA and Value Numbering Optimizations](#ssa-vn)|Computes liveness (`bbLiveIn` and `bbLiveOut` on `BasicBlocks`), and dominators. Builds SSA for tracked lvlVars. Computes value numbers.|
+|[Loop Invariant Code Hoisting](#licm)|Hoists expressions out of loops.|
+|[Copy Propagation](#copy-propagation)|Copy propagation based on value numbers.|
+|[Common Subexpression Elimination (CSE)](#cse)|Elimination of redundant subexressions based on value numbers.|
+|[Assertion Propagation](#assertion-propagation)|Utilizes value numbers to propagate and transform based on properties such as non-nullness.|
+|[Range analysis](#range-analysis)|Eliminate array index range checks based on value numbers and assertions|
+|[Rationalization](#rationalization)|Flowgraph order changes from `FGOrderTree` to `FGOrderLinear`. All `GT_COMMA`, `GT_ASG` and `GT_ADDR` nodes are transformed.|
+|[Lowering](#lowering)|Register requirements are fully specified (`gtLsraInfo`). All control flow is explicit.|
+|[Register allocation](#reg-alloc)|Registers are assigned (`gtRegNum` and/or `gtRsvdRegs`),and the number of spill temps calculated.|
+|[Code Generation](#code-generation)|Determines frame layout. Generates code for each `BasicBlock`. Generates prolog & epilog code for the method. Emit EH, GC and Debug info.|
+
+## <a name="pre-import"/>Pre-import
+
+Prior to reading in the IL for the method, the JIT initializes the local variable table, and scans the IL to find branch targets and form BasicBlocks.
+
+## <a name="importation">Importation
+
+Importation is the phase that creates the IR for the method, reading in one IL instruction at a time, and building up the statements. During this process, it may need to generate IR with multiple, nested expressions. This is the purpose of the non-expression-like IR nodes:
+
+* It may need to evaluate part of the expression into a temp, in which case it will use a comma (`GT_COMMA`) node to ensure that the temp is evaluated in the proper execution order – i.e. `GT_COMMA(GT_ASG(temp, exp), temp)` is inserted into the tree where “exp” would go.
+* It may need to create conditional expressions, but adding control flow at this point would be quite messy. In this case it generates question mark/colon (?: or `GT_QMARK`/`GT_COLON`) trees that may be nested within an expression.
+
+During importation, tail call candidates (either explicitly marked or opportunistically identified) are identified and flagged. They are further validated, and possibly unmarked, during morphing.
+
+## Morphing
+
+The `fgMorph` phase includes a number of transformations:
+
+### <a name="inlining"/>Inlining
+
+The `fgInline` phase determines whether each call site is a candidate for inlining. The initial determination is made via a state machine that runs over the candidate method’s IL. It estimates the native code size corresponding to the inline method, and uses a set of heuristics, including the estimated size of the current method) to determine if inlining would be profitable. If so, a separate Compiler object is created, and the importation phase is called to create the tree for the candidate inline method. Inlining may be aborted prior to completion, if any conditions are encountered that indicate that it may be unprofitable (or otherwise incorrect). If inlining is successful, the inlinee compiler’s trees are incorporated into the inliner compiler (the “parent”), with args and returns appropriately transformed.
+
+### <a name="struct-promotion"/>Struct Promotion
+
+Struct promotion (`fgPromoteStructs()`) analyzes the local variables and temps, and determines if their fields are candidates for tracking (and possibly enregistering) separately. It first determines whether it is possible to promote, which takes into account whether the layout may have holes or overlapping fields, whether its fields (flattening any contained structs) will fit in registers, etc.
+
+Next, it determines whether it is likely to be profitable, based on the number of fields, and whether the fields are individually referenced.
+
+When a lvlVar is promoted, there are now N+1 lvlVars for the struct, where N is the number of fields. The original struct lvlVar is not considered to be tracked, but its fields may be.
+
+### <a name="mark-addr-exposed"/>Mark Address-Exposed Locals
+
+This phase traverses the expression trees, propagating the context (e.g. taking the address, indirecting) to determine which lvlVars have their address taken, and which therefore will not be register candidates. If a struct lvlVar has been promoted, and is then found to be address-taken, it will be considered “dependently promoted”, which is an odd way of saying that the fields will still be separately tracked, but they will not be register candidates.
+
+### <a name="morph-blocks"/>Morph Blocks
+
+What is often thought of as “morph” involves localized transformations to the trees. In addition to performing simple optimizing transformations, it performs some normalization that is required, such as converting field and array accesses into pointer arithmetic. It can (and must) be called by subsequent phases on newly added or modified trees. During the main Morph phase, the boolean `fgGlobalMorph` is set on the Compiler argument, which governs which transformations are permissible.
+
+### <a name="eliminate-qmarks"/>Eliminate Qmarks
+
+This expands most `GT_QMARK`/`GT_COLON` trees into blocks, except for the case that is instantiating a condition.
+
+## <a name="flowgraph-analysis"/>Flowgraph Analysis
+
+At this point, a number of analyses and transformations are done on the flowgraph:
+
+* Computing the predecessors of each block
+* Computing edge weights, if profile information is available
+* Computing reachability and dominators
+* Identifying and normalizing loops (transforming while loops to “do while”)
+* Cloning and unrolling of loops
+
+## <a name="normalize-ir"/>Normalize IR for Optimization
+
+At this point, a number of properties are computed on the IR, and must remain valid for the remaining phases. We will call this “normalization”
+
+* `lvaMarkLocalVars` – set the reference counts (raw and weighted) for lvlVars, sort them, and determine which will be tracked (currently up to 128). Note that after this point any transformation that adds or removes lvlVar references must update the reference counts.
+* `optOptimizeBools` – this optimizes Boolean expressions, and may change the flowgraph (why is it not done prior to reachability and dominators?)
+* Link the trees in evaluation order (setting `gtNext` and `gtPrev` fields): and `fgFindOperOrder()` and `fgSetBlockOrder()`.
+
+## <a name="ssa-vn"/>SSA and Value Numbering Optimizations
+
+The next set of optimizations are built on top of SSA and value numbering. First, the SSA representation is built (during which dataflow analysis, aka liveness, is computed on the lclVars), then value numbering is done using SSA.
+
+### <a name="licm"/>Loop Invariant Code Hoisting
+
+This phase traverses all the loop nests, in outer-to-inner order (thus hoisting expressions outside the largest loop in which they are invariant). It traverses all of the statements in the blocks in the loop that are always executed. If the statement is:
+
+* A valid CSE candidate
+* Has no side-effects
+* Does not raise an exception OR occurs in the loop prior to any side-effects
+* Has a valid value number, and it is a lvlVar defined outside the loop, or its children (the value numbers from which it was computed) are invariant.
+
+### <a name="copy-propagation"/>Copy Propagation
+
+This phase walks each block in the graph (in dominator-first order, maintaining context between dominator and child) keeping track of every live definition. When it encounters a variable that shares the VN with a live definition, it is replaced with the variable in the live definition.
+
+The JIT currently requires that the IR be maintained in conventional SSA form, as there is no “out of SSA” translation (see the comments on `optVnCopyProp()` for more information).
+
+### <a name="cse"/>Common Subexpression Elimination (CSE)
+
+Utilizes value numbers to identify redundant computations, which are then evaluated to a new temp lvlVar, and then reused.
+
+### <a name="assertion-propagation"/>Assertion Propagation
+
+Utilizes value numbers to propagate and transform based on properties such as non-nullness.
+
+### <a name="range-analysis"/>Range analysis
+
+Optimize array index range checks based on value numbers and assertions.
+
+## <a name=rationalization"/>Rationalization
+
+As the JIT has evolved, changes have been made to improve the ability to reason over the tree in both “tree order” and “linear order”. These changes have been termed the “rationalization” of the IR. In the spirit of reuse and evolution, some of the changes have been made only in the later (“backend”) components of the JIT. The corresponding transformations are made to the IR by a “Rationalizer” component. It is expected that over time some of these changes will migrate to an earlier place in the JIT phase order:
+
+* Elimination of assignment nodes (`GT_ASG`). The assignment node was problematic because the semantics of its destination (left hand side of the assignment) could not be determined without context. For example, a `GT_LCL_VAR` on the left-hand side of an assignment is a definition of the local variable, but on the right-hand side it is a use. Furthermore, since the execution order requires that the children be executed before the parent, it is unnatural that the left-hand side of the assignment appears in execution order before the assignment operator.
+ * During rationalization, all assignments are replaced by stores, which either represent their destination on the store node itself (e.g. `GT_LCL_VAR`), or by the use of a child address node (e.g. `GT_STORE_IND`).
+* Elimination of address nodes (`GT_ADDR`). These are problematic because of the need for parent context to analyze the child.
+* Elimination of “comma” nodes (`GT_COMMA`). These nodes are introduced for convenience during importation, during which a single tree is constructed at a time, and not incorporated into the statement list until it is completed. When it is necessary, for example, to store a partially-constructed tree into a temporary variable, a `GT_COMMA` node is used to link it into the tree. However, in later phases, these comma nodes are an impediment to analysis, and thus are split into separate statements.
+ * In some cases, it is not possible to fully extract the tree into a separate statement, due to execution order dependencies. In these cases, an “embedded” statement is created. While these are conceptually very similar to the `GT_COMMA` nodes, they do not masquerade as expressions.
+* Elimination of “QMark” (`GT_QMARK`/`GT_COLON`) nodes is actually done at the end of morphing, long before the current rationalization phase. The presence of these nodes made analyses (especially dataflow) overly complex.
+
+For our earlier example (Example of Post-Import IR), here is what the simplified dump looks like just prior to Rationalization (the $ annotations are value numbers). Note that some common subexpressions have been computed into new temporary lvlVars, and that computation has been inserted as a `GT_COMMA` (comma) node in the IR:
+
+ ▌ stmtExpr void (top level) (IL 0x000...0x026)
+ │ ┌──▌ lclVar double V07 cse1 $185
+ │ ┌──▌ comma double $185
+ │ │ │ ┌──▌ dconst double 2.00 $143
+ │ │ │ ┌──▌ \* double $185
+ │ │ │ │ └──▌ lclVar double V00 arg0 u:2 $80
+ │ │ └──▌ = double $VN.Void
+ │ │ └──▌ lclVar double V07 cse1 $185
+ │ ┌──▌ / double $186
+ │ │ │ ┌──▌ unary - double $84
+ │ │ │ │ └──▌ lclVar double V01 arg1 u:2 $81
+ │ │ └──▌ + double $184
+ │ │ │ ┌──▌ lclVar double V06 cse0 $83
+ │ │ └──▌ comma double $83
+ │ │ │ ┌──▌ mathFN double sqrt $83
+ │ │ │ │ │ ┌──▌ lclVar double V02 arg2 u:2 $82
+ │ │ │ │ │ ┌──▌ \* double $182
+ │ │ │ │ │ │ │ ┌──▌ dconst double 4.00 $141
+ │ │ │ │ │ │ └──▌ \* double $181
+ │ │ │ │ │ │ └──▌ lclVar double V00 arg0 u:2 $80
+ │ │ │ │ └──▌ - double $183
+ │ │ │ │ │ ┌──▌ lclVar double V01 arg1 u:2 $81
+ │ │ │ │ └──▌ \* double $180
+ │ │ │ │ └──▌ lclVar double V01 arg1 u:2 $81
+ │ │ └──▌ = double $VN.Void
+ │ │ └──▌ lclVar double V06 cse0 $83
+ └──▌ = double $VN.Void
+ └──▌ indir double $186
+ └──▌ lclVar byref V03 arg3 u:2 (last use) $c0
+
+After rationalization, the nodes are presented in execution order, and the `GT_COMMA` (comma) and `GT_ASG` (=) nodes have been eliminated:
+
+ ▌ stmtExpr void (top level) (IL 0x000... ???)
+ │ ┌──▌ lclVar double V01 arg1
+ │ ├──▌ lclVar double V01 arg1
+ │ ┌──▌ \* double
+ │ │ ┌──▌ lclVar double V00 arg0
+ │ │ ├──▌ dconst double 4.00
+ │ │ ┌──▌ \* double
+ │ │ ├──▌ lclVar double V02 arg2
+ │ ├──▌ \* double
+ │ ┌──▌ - double
+ │ ┌──▌ mathFN double sqrt
+ └──▌ st.lclVar double V06
+
+ ▌ stmtExpr void (top level) (IL 0x000...0x026)
+ │ ┌──▌ lclVar double V06
+ │ │ ┌──▌ lclVar double V01 arg1
+ │ ├──▌ unary - double
+ │ ┌──▌ + double
+ │ │ { ▌ stmtExpr void (embedded) (IL 0x000... ???)
+ │ │ { │ ┌──▌ lclVar double V00 arg0
+ │ │ { │ ├──▌ dconst double 2.00
+ │ │ { │ ┌──▌ \* double
+ │ │ { └──▌ st.lclVar double V07
+ │ ├──▌ lclVar double V07
+ │ ┌──▌ / double
+ │ ├──▌ lclVar byref V03 arg3
+ └──▌ storeIndir double
+
+
+Note that the first operand of the first comma has been extracted into a separate statement, but the second comma causes an embedded statement to be created, in order to preserve execution order.
+
+## <a name="lowering"/>Lowering
+
+Lowering is responsible for transforming the IR in such a way that the control flow, and any register requirements, are fully exposed.
+
+It accomplishes this in two passes.
+
+The first pass is a post-order traversal that performs context-dependent transformations such as expanding switch statements (using a switch table or a series of conditional branches), constructing addressing modes, etc. For example, this:
+
+ ┌──▌ lclVar ref V00 arg0
+ │ ┌──▌ lclVar int V03 loc1
+ │ ┌──▌ cast long <- int
+ │ ├──▌ const long 2
+ ├──▌ << long
+ ┌──▌ + byref
+ ├──▌ const long 16
+ ┌──▌ + byref
+ ┌──▌ indir int
+
+Is transformed into this, in which the addressing mode is explicit:
+
+ ┌──▌ lclVar ref V00 arg0
+ │ ┌──▌ lclVar int V03 loc1
+ ├──▌ cast long <- int
+ ┌──▌ lea(b+(i*4)+16) byref
+ ┌──▌ indir int
+
+The next pass annotates the nodes with register requirements, and this is done in an execution order traversal (effectively post-order) in order to ensure that the children are visited prior to the parent. It may also do some transformations that do not require the parent context, such as determining the code generation strategy for block assignments (e.g. `GT_COPYBLK`) which may become helper calls, unrolled loops, or an instruction like rep stos.
+
+The register requirements are expressed in the `TreeNodeInfo` (`gtLsraInfo`) for each node. For example, for the `copyBlk` node in this snippet:
+
+ Source │ ┌──▌ const(h) long 0xCA4000 static
+ Destination │ ├──▌ &lclVar byref V04 loc4
+ │ ├──▌ const int 34
+ └──▌ copyBlk void
+
+The `TreeNodeInfo` would be as follows:
+
+ +<TreeNodeInfo @ 15 0=1 1i 1f
+ src=[allInt]
+ int=[rax rcx rdx rbx rbp rsi rdi r8-r15 mm0-mm5]
+ dst=[allInt] I>
+
+The “@ 15” is the location number of the node. The “0=1” indicates that there are zero destination registers (because this defines only memory), and 1 source register (the address of lclVar V04). The “1i” indicates that it requires 1 internal integer register (for copying the remainder after copying 16-byte sized chunks), the “1f” indicates that it requires 1 internal floating point register (for copying the two 16-byte chunks). The src, int and dst fields are encoded masks that indicate the register constraints for the source, internal and destination registers, respectively.
+
+## <a name="reg-alloc"/>Register allocation
+
+The RyuJIT register allocator uses a Linear Scan algorithm, with an approach similar to [[2]](#[2]). In brief, it operates on two main data structures:
+
+* `Intervals` (representing live ranges of variables or tree expressions) and `RegRecords` (representing physical registers), both of which derive from `Referent`.
+* `RefPositions`, which represent uses or defs (or variants thereof, such as ExposedUses) of either `Intervals` or physical registers.
+
+Pre-conditions:
+
+* The `NodeInfo` is initialized for each tree node to indicate:
+ * Number of registers consumed and produced by the node.
+ * Number and type (int versus float) of internal registers required.
+
+Allocation proceeds in 4 phases:
+
+* Determine the order in which the `BasicBlocks` will be allocated, and which predecessor of each block will be used to determine the starting location for variables live-in to the `BasicBlock`.
+* Construct Intervals for each tracked lvlVar, then walk the `BasicBlocks` in the determined order building `RefPositions` for each register use, def, or kill.
+* Allocate the registers by traversing the `RefPositions`.
+* Write back the register assignments, and perform any necessary moves at block boundaries where the allocations don’t match.
+
+Post-conditions:
+
+* The `gtRegNum` property of all `GenTree` nodes that require a register has been set to a valid register number.
+* The `gtRsvdRegs` field (a set/mask of registers) has the requested number of registers specified for internal use.
+* All spilled values (lvlVar or expression) are marked with `GTF_SPILL` at their definition. For lvlVars, they are also marked with `GTF_SPILLED` at any use at which the value must be reloaded.
+* For all lvlVars that are register candidates:
+ * `lvRegNum` = initial register location (or `REG_STK`)
+ * `lvRegister` flag set if it always lives in the same register
+ * `lvSpilled` flag is set if it is ever spilled
+* The maximum number of simultaneously-live spill locations of each type (used for spilling expression trees) has been communicated via calls to `compiler->tmpPreAllocateTemps(type)`.
+
+## <a name="code-generation"/>Code Generation
+
+The process of code generation is relatively straightforward, as Lowering has done some of the work already. Code generation proceeds roughly as follows:
+
+* Determine the frame layout – allocating space on the frame for any lvlVars that are not fully enregistered, as well as any spill temps required for spilling non-lvlVar expressions.
+* For each `BasicBlock`, in layout order, and each `GenTree` node in the block, in execution order:
+ * If the node is “contained” (i.e. its operation is subsumed by a parent node), do nothing.
+ * Otherwise, “consume” all the register operands of the node.
+ * This updates the liveness information (i.e. marking a lvlVar as dead if this is the last use), and performs any needed copies.
+ * This must be done in correct execution order, obeying any reverse flags (GTF_REVERSE_OPS) on the operands, so that register conflicts are handled properly.
+ * Track the live variables in registers, as well as the live stack variables that contain GC refs.
+ * Produce the `instrDesc(s)` for the operation, with the current live GC references.
+ * Update the scope information (debug info) at block boundaries.
+* Generate the prolog and epilog code.
+* Write the final instruction bytes. It does this by invoking the emitter, which holds all the `instrDescs`.
+
+# Phase-dependent Properties and Invariants of the IR
+
+There are several properties of the IR that are valid only during (or after) specific phases of the JIT. This section describes the phase transitions, and how the IR properties are affected.
+
+## Phase Transitions
+
+* Flowgraph analysis
+ * Sets the predecessors of each block, which must be kept valid after this phase.
+ * Computes reachability and dominators. These may be invalidated by changes to the flowgraph.
+ * Computes edge weights, if profile information is available.
+ * Identifies and normalizes loops. These may be invalidated, but must be marked as such.
+* Normalization
+ * The lvlVar reference counts are set by `lvaMarkLocalVars()`.
+ * Statement ordering is determined by `fgSetBlockOrder()`. Execution order is a depth-first preorder traversal of the nodes, with the operands usually executed in order. The exceptions are:
+ * Commutative operators, which can have the `GTF_REVERSE_OPS` flag set to indicate that op2 should be evaluated before op1.
+ * Assignments, which can also have the `GTF_REVERSE_OPS` flag set to indicate that the rhs (op2) should be evaluated before the target address (if any) on the lhs (op1) is evaluated. This can only be done if there are no side-effects in the expression for the lhs.
+* Rationalization
+ * All `GT_COMMA` nodes are split into separate statements, which may be embedded in other statements in execution order.
+ * All `GT_ASG` trees are transformed into `GT_STORE` variants (e.g. `GT_STORE_LCL_VAR`).
+ * All `GT_ADDR` nodes are eliminated (e.g. with `GT_LCL_VAR_ADDR`).
+* Lowering
+ * `GenTree` nodes are split or transformed as needed to expose all of their register requirements and any necessary `flowgraph` changes (e.g., for switch statements).
+
+## GenTree phase-dependent properties
+
+Ordering:
+
+* For `GenTreeStmt` nodes, the `gtNext` and `gtPrev` fields must always be consistent. The last statement in the `BasicBlock` must have `gtNext` equal to null. By convention, the `gtPrev` of the first statement in the `BasicBlock` must be the last statement of the `BasicBlock`.
+ * In all phases, `gtStmtExpr` points to the top-level node of the expression.
+* For non-statement nodes, the `gtNext` and `gtPrev` fields are either null, prior to ordering, or they are consistent (i.e. `A->gtPrev->gtNext = A`, and `A->gtNext->gtPrev == A`, if they are non-null).
+* After normalization the `gtStmtList` of the containing statement points to the first node to be executed.
+* Prior to normalization, the `gtNext` and `gtPrev` pointers on the expression (non-statement) `GenTree` nodes are invalid. The expression nodes are only traversed via the links from parent to child (e.g. `node->gtGetOp1()`, or `node->gtOp.gtOp1`). The `gtNext/gtPrev` links are set by `fgSetBlockOrder()`.
+ * After normalization, and prior to rationalization, the parent/child links remain the primary traversal mechanism. The evaluation order of any nested expression-statements (usually assignments) is enforced by the `GT_COMMA` in which they are contained.
+* After rationalization, all `GT_COMMA` nodes are eliminated, and the primary traversal mechanism becomes the `gtNext/gtPrev` links. Statements may be embedded within other statements, but the nodes of each statement preserve the valid traversal order.
+* In tree ordering:
+ * The `gtPrev` of the first node (`gtStmtList`) is always null.
+ * The `gtNext` of the last node (`gtStmtExpr`) is always null.
+* In linear ordering:
+ * The nodes of each statement are ordered such that `gtStmtList` is encountered first, and `gtStmtExpr` is encountered last.
+ * The nodes of an embedded statement S2 (starting with `S2->gtStmtList`) appear in the ordering after a node from the “containing” statement S1, and no other node from S1 will appear in the list prior to the `gtStmtExpr` of S2. However, there may be multiple levels of nesting of embedded statements.
+
+TreeNodeInfo:
+
+* The `TreeNodeInfo` (`gtLsraInfo`) is set during the Lowering phase, and communicates the register requirements of the node, including the number and types of registers used as sources, destinations and internal registers. Currently only a single destination per node is supported.
+
+## LclVar phase-dependent properties
+
+Prior to normalization, the reference counts (`lvRefCnt` and `lvRefCntWtd`) are not valid. After normalization they must be updated when lvlVar references are added or removed.
+
+# Supporting technologies and components
+
+## Instruction encoding
+
+Instruction encoding is performed by the emitter ([emit.h](https://github.com/dotnet/coreclr/blob/master/src/jit/emit.h)), using the `insGroup`/`instrDesc` representation. The code generator calls methods on the emitter to construct `instrDescs`. The encodings information is captured in the following:
+
+* The “instruction” enumeration itemizes the different instructions available on each target, and is used as an index into the various encoding tables (e.g. `instInfo[]`, `emitInsModeFmtTab[]`) generated from the `instrs{tgt}.h` (e.g., [instrsxarch.h](https://github.com/dotnet/coreclr/blob/master/src/jit/instrsxarch.h)).
+* The skeleton encodings are contained in the tables, and then there are methods on the emitter that handle the special encoding constraints for the various instructions, addressing modes, register types, etc.
+
+## GC Info
+
+Reporting of live GC references is done in two ways:
+
+* For stack locations that are not tracked (these could be spill locations or lvlVars – local variables or temps – that are not register candidates), they are initialized to null in the prolog, and reported as live for the entire method.
+* For lvlVars with tracked lifetimes, or for expression involving GC references, we report the range over which the reference is live. This is done by the emitter, which adds this information to the instruction group, and which terminates instruction groups when the GC info changes.
+
+The tracking of GC reference lifetimes is done via the `GCInfo` class in the JIT. It is declared in [src/jit/jitgcinfo.h](https://github.com/dotnet/coreclr/blob/master/src/jit/jitgcinfo.h) (to differentiate it from [src/inc/gcinfo.h](https://github.com/dotnet/coreclr/blob/master/src/inc/gcinfo.h)), and implemented in [src/jit/gcinfo.cpp](https://github.com/dotnet/coreclr/blob/master/src/jit/gcinfo.cpp).
+
+In a JitDump, the generated GC info can be seen following the “In gcInfoBlockHdrSave()” line.
+
+## Debugger info
+
+Debug info consists primarily of two types of information in the JIT:
+
+* Mapping of IL offsets to native code offsets. This is accomplished via:
+ * the `gtStmtILoffsx` on the statement nodes (`GenTreeStmt`)
+ * the `gtLclILoffs` on lvlVar references (`GenTreeLclVar`)
+ * The IL offsets are captured during CodeGen by calling `CodeGen::genIPmappingAdd()`, and then written to debug tables by `CodeGen::genIPmappingGen()`.
+* Mapping of user locals to location (register or stack). This is accomplished via:
+ * Struct `siVarLoc` (in [compiler.h](https://github.com/dotnet/coreclr/blob/master/src/jit/compiler.h)) captures the location
+ * `VarScopeDsc` ([compiler.h](https://github.com/dotnet/coreclr/blob/master/src/jit/compiler.h)) captures the live range of a local variable in a given location.
+
+## Exception handling
+
+Exception handling information is captured in an `EHblkDsc` for each exception handling region. Each region includes the first and last blocks of the try and handler regions, exception type, enclosing region, among other things. Look at [jiteh.h](https://github.com/dotnet/coreclr/blob/master/src/jit/jiteh.h) and [jiteh.cpp](https://github.com/dotnet/coreclr/blob/master/src/jit/jiteh.cpp), especially, for details. Look at `Compiler::fgVerifyHandlerTab()` to see how the exception table constraints are verified.
+
+# Reading a JitDump
+
+One of the best ways of learning about the JIT compiler is examining a compilation dump in detail. The dump shows you all the really important details of the basic data structures without all the implementation detail of the code. Debugging a JIT bug almost always begins with a JitDump. Only after the problem is isolated by the dump does it make sense to start debugging the JIT code itself.
+
+Dumps are also useful because they give you good places to place breakpoints. If you want to see what is happening at some point in the dump, simply search for the dump text in the source code. This gives you a great place to put a conditional breakpoint.
+
+There is not a strong convention about what or how the information is dumped, but generally you can find phase-specific information by searching for the phase name. Some useful points follow.
+
+## How to create a JitDump
+
+You can enable dumps by setting the `COMPlus_JitDump` environment variable to a space-separated list of the method(s) you want to dump. For example:
+
+```cmd
+:: Print out lots of useful info when
+:: compiling methods named Main/GetEnumerator
+set "COMPlus_JitDump=Main GetEnumerator"
+```
+
+See [Setting configuration variables](../building/viewing-jit-dumps.md#setting-configuration-variables) for more details on this.
+
+Full instructions for dumping the compilation of some managed code can be found here: [viewing-jit-dumps.md](../building/viewing-jit-dumps.md)
+
+## Reading expression trees
+
+It takes some time to learn to “read” the expression trees, which are printed with the children indented from the parent, and, for binary operators, with the first operand below the parent and the second operand above.
+
+Here is an example dump
+
+ [000027] ------------ ▌ stmtExpr void (top level) (IL 0x010... ???)
+ [000026] --C-G------- └──▌ return double
+ [000024] --C-G------- └──▌ call double BringUpTest.DblSqrt
+ [000021] ------------ │ ┌──▌ lclVar double V02 arg2
+ [000022] ------------ │ ┌──▌ - double
+ [000020] ------------ │ │ └──▌ lclVar double V03 loc0
+ [000023] ------------ arg0 └──▌ * double
+ [000017] ------------ │ ┌──▌ lclVar double V01 arg1
+ [000018] ------------ │ ┌──▌ - double
+ [000016] ------------ │ │ └──▌ lclVar double V03 loc0
+ [000019] ------------ └──▌ * double
+ [000013] ------------ │ ┌──▌ lclVar double V00 arg0
+ [000014] ------------ │ ┌──▌ - double
+ [000012] ------------ │ │ └──▌ lclVar double V03 loc0
+ [000015] ------------ └──▌ * double
+ [000011] ------------ └──▌ lclVar double V03 loc0
+
+The tree nodes are indented to represent the parent-child relationship. Binary operators print first the right hand side, then the operator node itself, then the left hand side. This scheme makes sense if you look at the dump “sideways” (lean your head to the left). Oriented this way, the left hand side operator is actually on the left side, and the right hand operator is on the right side so you can almost visualize the tree if you look at it sideways. The indentation level is also there as a backup.
+
+Tree nodes are identified by their `gtTreeID`. This field only exists in DEBUG builds, but is quite useful for debugging, since all tree nodes are created from the routine `gtNewNode` (in [src/jit/gentree.cpp](https://github.com/dotnet/coreclr/blob/master/src/jit/gentree.cpp)). If you find a bad tree and wish to understand how it got corrupted, you can place a conditional breakpoint at the end of `gtNewNode` to see when it is created, and then a data breakpoint on the field that you believe is corrupted.
+
+The trees are connected by line characters (either in ASCII, by default, or in slightly more readable Unicode when `COMPlus_JitDumpAscii=0` is specified), to make it a bit easier to read.
+
+ N037 ( 0, 0) [000391] ----------L- arg0 SETUP │ ┌──▌ argPlace ref REG NA $1c1
+ N041 ( 2, 8) [000389] ------------ │ │ ┌──▌ const(h) long 0xB410A098 REG rcx $240
+ N043 ( 4, 10) [000390] ----G------- │ │ ┌──▌ indir ref REG rcx $1c1
+ N045 ( 4, 10) [000488] ----G------- arg0 in rcx │ ├──▌ putarg_reg ref REG rcx
+ N049 ( 18, 16) [000269] --C-G------- └──▌ call void System.Diagnostics.TraceInternal.Fail $VN.Void
+
+## Variable naming
+
+The dump uses the index into the local variable table as its name. The arguments to the function come first, then the local variables, then any compiler generated temps. Thus in a function with 2 parameters (remember “this” is also a parameter), and one local variable, the first argument would be variable 0, the second argument variable 1, and the local variable would be variable 2. As described earlier, tracked variables are given a tracked variable index which identifies the bit for that variable in the dataflow bit vectors. This can lead to confusion as to whether the variable number is its index into the local variable table, or its tracked index. In the dumps when we refer to a variable by its local variable table index we use the ‘V’ prefix, and when we print the tracked index we prefix it by a ‘T’.
+
+## References
+
+<a name="[1]"/>
+[1] P. Briggs, K. D. Cooper, T. J. Harvey, and L. T. Simpson, "Practical improvements to the construction and destruction of static single assignment form," Software --- Practice and Experience, vol. 28, no. 8, pp. 859---881, Jul. 1998.
+
+<a name="[2]"/>
+[2] Wimmer, C. and Mössenböck, D. "Optimized Interval Splitting in a Linear Scan Register Allocator," ACM VEE 2005, pp. 132-141. [http://portal.acm.org/citation.cfm?id=1064998&dl=ACM&coll=ACM&CFID=105967773&CFTOKEN=80545349](http://portal.acm.org/citation.cfm?id=1064998&dl=ACM&coll=ACM&CFID=105967773&CFTOKEN=80545349)
diff --git a/Documentation/botr/stackwalking.md b/Documentation/botr/stackwalking.md
new file mode 100644
index 0000000..a976aa8
--- /dev/null
+++ b/Documentation/botr/stackwalking.md
@@ -0,0 +1,85 @@
+Stackwalking in the CLR
+===
+
+Author: Rudi Martin ([@Rudi-Martin](https://github.com/Rudi-Martin)) - 2008
+
+The CLR makes heavy use of a technique known as stack walking (or stack crawling). This involves iterating the sequence of call frames for a particular thread, from the most recent (the thread's current function) back down to the base of the stack.
+
+The runtime uses stack walks for a number of purposes:
+
+- The runtime walks the stacks of all threads during garbage collection, looking for managed roots (local variables holding object references in the frames of managed methods that need to be reported to the GC to keep the objects alive and possibly track their movement if the GC decides to compact the heap).
+- On some platforms the stack walker is used during the processing of exceptions (looking for handlers in the first pass and unwinding the stack in the second).
+- The debugger uses the functionality when generating managed stack traces.
+- Various miscellaneous methods, usually those close to some public managed API, perform a stack walk to pick up information about their caller (such as the method, class or assembly of that caller).
+
+# The Stack Model
+
+Here we define some common terms and describe the typical layout of a thread's stack.
+
+Logically, a stack is divided up into some number of _frames_. Each frame represents some function (managed or unmanaged) that is either currently executing or has called into some other function and is waiting for it to return. A frame contains state required by the specific invocation of its associated function. Typically this includes space for local variables, pushed arguments for a call to another function, saved caller registers etc.
+
+The exact definition of a frame varies from platform to platform and on many platforms there isn't a hard definition of a frame format that all functions adhere to (x86 is an example of this). Instead the compiler is often free to optimize the exact format of frames. On such systems it is not possible to guarantee that a stackwalk will return 100% correct or complete results (for debugging purposes, debug symbols such as pdbs are used to fill in the gaps so that debuggers can generate more accurate stack traces).
+
+This is not a problem for the CLR, however, since we do not require a fully generalized stack walk. Instead we are only interested in those frames that are managed (i.e. represent a managed method) or, to some extent, frames coming from unmanaged code used to implement part of the runtime itself. In particular there is no guarantee about fidelity of 3rd party unmanaged frames other than to note where such frames transition into or out of the runtime itself (i.e. one of the frame types we do care about).
+
+Because we control the format of the frames we're interested in (we'll delve into the details of this later) we can ensure that those frames are crawlable with 100% fidelity. The only additional requirement is a mechanism to link disjoint groups of runtime frames together such that we can skip over any intervening unmanaged (and otherwise uncrawlable) frames.
+
+The following diagram illustrates a stack containing all the frames types (note that this document uses a convention where stacks grow towards the top of the page):
+
+![image](../images/stack.png)
+
+# Making Frames Crawlable
+
+## Managed Frames
+
+Because the runtime owns and controls the JIT (Just-in-Time compiler) it can arrange for managed methods to always leave a crawlable frame. One solution here would be to utilize a rigid frame format for all methods (e.g. the x86 EBP frame format). In practice, however, this can be inefficient, especially for small leaf methods (such as typical property accessors).
+
+Since methods are typically called more times than their frames are crawled (stack crawls are relatively rare in the runtime, at least with respect to the rate at which methods are typically called) it makes sense to trade method call performance for some additional crawl time processing. As a result the JIT generates additional metadata for each method it compiles that includes sufficient information for the stack crawler to decode a stack frame belonging to that method.
+
+This metadata can be found via a hash-table lookup with an instruction pointer somewhere within the method as the key. The JIT utilizes compression techniques in order to minimize the impact of this additional per-method metadata.
+
+Given initial values for a few important registers (e.g. EIP, ESP and EBP on x86 based systems) the stack crawler can locate a managed method and its associated JIT metadata and use this information to roll back the register values to those current in the method's caller. In this fashion a sequence of managed method frames can be traversed from the most recent to the oldest caller. This operation is sometimes referred to as a _virtual unwind_ (virtual because we're not actually updating the real values of ESP etc., leaving the stack intact).
+
+## Runtime Unmanaged Frames
+
+The runtime is partially implemented in unmanaged code (e.g. coreclr.dll). Most of this code is special in that it operates as _manually managed_ code. That is, it obeys many of the rules and protocols of managed code but in an explicitly controlled fashion. For instance such code can explicitly enable or disable GC pre-emptive mode and needs to manage its use of object references accordingly.
+
+Another area where this careful interaction with managed code comes into play is during stackwalks. Since the majority of the runtime's unmanaged code is written in C++ we don't have the same control over method frame format as managed code. At the same time there are many instances where runtime unmanaged frames contain information that is important during a stack walk. These include cases where unmanaged functions hold object references in local variables (which must be reported during garbage collections) and exception processing.
+
+Rather than attempt to make each unmanaged frame crawable, unmanaged functions with interesting data to report to stack crawls bundle up the information into a data structure called a Frame. The choice of name is unfortunate as it can lead to ambiguity in stack related discussions. This document will always refer to the data structure variant as a capitalized Frame.
+
+Frame is actually the abstract base class of an entire hierarchy of Frame types. Frame is sub-typed in order to express different types of information that might be interesting to a stack walk.
+
+But how does the stack walker find these Frames and how do they relate to the frames utilized by managed methods?
+
+Each Frame is part of a singly linked list, having a next pointer to the next oldest Frame on this thread's stack (or null if the Frame is the oldest). The CLR Thread structure holds a pointer to the newest Frame. Unmanaged runtime code can push or pop Frames as needed by manipulating the Thread structure and Frame list.
+
+In this fashion the stack walker can iterate unmanaged Frames in newest to oldest order (the same order in which managed frames are iterated). But managed and unmanaged methods can be interleaved, and it would be wrong to process all managed frames followed by unmanaged Frames or vice versa since that would not accurately represent the real calling sequence.
+
+To solve this problem Frames are further restricted in that they must be allocated on the stack in the frame of the method that pushes them onto the Frame list. Since the stack walker knows the stack bounds of each managed frame it can perform simple pointer comparisons to determine whether a given Frame is older or newer than a given managed frame.
+
+Essentially the stack walker, having decoded the current frame, always has two possible choices for the next (older) frame: the next managed frame determined via a virtual unwind of the register set or the next oldest Frame on the Thread's Frame list. It can decide which is appropriate by determining which occupies stack space nearer the stack top. The actual calculation involved is platform dependent but usually devolves to one or two pointer comparisons.
+
+When managed code calls into the unmanaged runtime one of several forms of transition Frame is often pushed by the unmanaged target method. This is needed both to record the register state of the calling managed method (so that the stack walker can resume virtual unwinding of managed frames once it has finished enumerating the unmanaged Frames) and in many cases because managed object references are passed as arguments to the unmanaged method and must be reported to the GC in the event of a garbage collection.
+
+A full description of the available Frame types and their uses is beyond the scope of the document. Further details can be found in the [frames.h](https://github.com/dotnet/coreclr/blob/master/src/vm/frames.h) header file.
+
+# Stackwalker Interface
+
+The full stack walk interface is exposed to runtime unmanaged code only (a simplified subset is available to managed code via the System.Diagnostics.StackTrace class). The typical entrypoint is via the StackWalkFramesEx() method on the runtime Thread class.
+
+The caller of this method provides three main inputs:
+
+1. Some context indicating the starting point of the walk. This is either an initial register set (for instance if you've suspended the target thread and can call GetThreadContext() on it) or an initial Frame (in cases where you know the code in question is in runtime unmanaged code). Although most stack walks are made from the top of the stack it's possible to start lower down if you can determine the correct starting context.
+2. A function pointer and associated context. The function provided is called by the stack walker for each interesting frame (in order from the newest to the oldest). The context value provided is passed to each invocation of the callback so that it can record or build up state during the walk.
+3. Flags indicating what sort of frames should trigger a callback. This allows the caller to specify that only pure managed method frames should be reported for instance. For a full list see [threads.h](https://github.com/dotnet/coreclr/blob/master/src/vm/threads.h) (just above the declaration of StackWalkFramesEx()).
+
+StackWalkFramesEx() returns an enum value that indicates whether the walk terminated normally (got to the stack base and ran out of methods to report), was aborted by one of the callbacks (the callbacks return an enum of the same type to the stack walk to control this) or suffered some other miscellaneous error.
+
+Aside from the context value passed to StackWalkFramesEx(), stack callback functions are passed one other piece of context: the CrawlFrame. This class is defined in [stackwalk.h](https://github.com/dotnet/coreclr/blob/master/src/vm/stackwalk.h) and contains all sorts of context gathered as the stack walk proceeds.
+
+For instance the CrawlFrame indicates the MethodDesc* for managed frames and the Frame* for unmanaged Frames. It also provides the current register set inferred by virtually unwinding frames up to that point.
+
+# Stackwalk Implementation Details
+
+Further low-level details of the stack walk implementation are currently outside the scope of this document. If you have knowledge of these and would care to share that knowledge please feel free to update this document.
diff --git a/Documentation/botr/threading.md b/Documentation/botr/threading.md
new file mode 100644
index 0000000..2e13d52
--- /dev/null
+++ b/Documentation/botr/threading.md
@@ -0,0 +1,210 @@
+CLR Threading Overview
+======================
+
+Managed vs. Native Threads
+==========================
+
+Managed code executes on "managed threads," which are distinct from the native threads provided by the operating system. A native thread is a thread of execution of native code on a physical machine; a managed thread is a virtual thread of execution on the CLR's virtual machine.
+
+Just as the JIT compiler maps "virtual" IL instructions into native instructions that execute on the physical machine, the CLR's threading infrastructure maps "virtual" managed threads onto the native threads provided by the operating system.
+
+At any given time, a managed thread may or may not be assigned to a native thread for execution. For example, a managed thread that has been created (via "new System.Threading.Thread") but not yet started (via System.Threading.Thread.Start) is a managed thread that has not yet been assigned to a native thread. Similarly, a managed thread may, in principle, move between multiple native threads over the course of its execution, though in practice the CLR does not currently support this.
+
+The public Thread interface available to managed code intentionally hides the details of the underlying native threads. because:
+
+- Managed threads are not necessarily mapped to a single native thread (and may not be mapped to a native thread at all).
+- Different operating systems expose different abstractions for native threads.
+- In principle, managed threads are "virtualized".
+
+The CLR provide equivalent abstractions for managed threads, implemented by the CLR itself. For example, it does not expose the operating system's thread-local storage (TLS) mechanism, but instead provide managed "thread-static" variables. Similarly, it does not expose the native thread's "thread ID," but instead provide a "managed thread ID" which is generated independently of the OS. However, for diagnostic purposes, some details of the underlying native thread may be obtained via types in the System.Diagnostics namespace.
+
+Managed threads require additional functionality typically not needed by native threads. First, managed threads hold GC references on their stacks, so the CLR must be able to enumerate (and possibly modify) these references every time a GC occurs. To do this, the CLR must "suspend" each managed thread (stop it at a point where all of its GC references can be found). Second, when an AppDomain is unloaded, the CLR must ensure that no thread is executing code in that AppDomain. This requires the ability to force a thread to unwind out of that AppDomain. The CLR does this by injecting a ThreadAbortException into such threads.
+
+Data Structures
+===============
+
+Every managed thread has an associated Thread object, defined in [threads.h][threads.h]. This object tracks everything the VM needs to know about the managed thread. This includes things that are _necessary_, such as the thread's current GC mode and Frame chain, as well as many things that are allocated per-thread simply for performance reasons (such as some fast arena-style allocators).
+
+All Thread objects are stored in the ThreadStore (also defined in [threads.h][threads.h]), which is a simple list of all known Thread objects. To enumerate all managed threads, one must first acquire the ThreadStoreLock, then use ThreadStore::GetAllThreadList to enumerate all Thread objects. This list may include managed threads which are not currently assigned to native threads (for example, they may not yet be started, or the native thread may already have exited).
+
+[threads.h]: (https://github.com/dotnet/coreclr/blob/master/src/vm/threads.h)
+
+Each managed thread that is currently assigned to a native thread is reachable via a native thread-local storage (TLS) slot on that native thread. This allows code that is executing on that native thread to get the corresponding Thread object, via GetThread().
+
+Additionally, many managed threads have a _managed_ Thread object (System.Threading.Thread) which is distinct from the native Thread object. The managed Thread object provides methods for managed code to interact with the thread, and is mostly a wrapper around functionality offered by the native Thread object. The current managed Thread object is reachable (from managed code) via Thread.CurrentThread.
+
+In a debugger, the SOS extension command "!Threads" can be used to enumerate all Thread objects in the ThreadStore.
+
+Thread Lifetimes
+================
+
+A managed thread is created in the following situations:
+
+1. Managed code explicitly asks the CLR to create a new thread via System.Threading.Thread.
+2. The CLR creates the managed thread directly (see "special threads" below).
+3. Native code calls managed code on a native thread which is not yet associated with a managed thread (via "reverse p/invoke" or COM interop).
+4. A managed process starts (invoking its Main method on the process' Main thread).
+
+In cases #1 and #2, the CLR is responsible for creating a native thread to back the managed thread. This is not done until the thread is actually _started_. In such cases, the native thread is "owned" by the CLR; the CLR is responsible for the native thread's lifetime. In these cases, the CLR is aware of the existence of the thread by virtue of the fact that the CLR created it in the first place.
+
+In cases #3 and #4, the native thread already existed prior to the creation of the managed thread, and is owned by code external to the CLR. The CLR is not responsible for the native thread's lifetime. The CLR becomes aware of these threads the first time they attempt to call managed code.
+
+When a native thread dies, the CLR is notified via its DllMain function. This happens inside of the OS "loader lock," so there is little that can be done (safely) while processing this notification. So rather than destroying the data structures associated with the managed thread, the thread is simply marked as "dead" and signals the finalizer thread to run. The finalizer thread then sweeps through the threads in the ThreadStore and destroys any that are both dead _and_ unreachable via managed code.
+
+Suspension
+==========
+
+The CLR must be able to find all references to managed objects in order to perform a GC. Managed code is constantly accessing the GC heap, and manipulating references stored on the stack and in registers. The CLR must ensure that all managed threads are stopped (so they aren't modifying the heap) to safely and reliably find all managed objects. It only stops at _safe point_, when registers and stack locations can be inspected for live references.
+
+Another way of putting this is that the GC heap, and every thread's stack and register state, is "shared state," accessed by multiple threads. As with most shared state, some sort of "lock" is required to protect it. Managed code must hold this lock while accessing the heap, and can only release the lock at safe points.
+
+The CLR refers to this "lock" as the thread's "GC mode." A thread which is in "cooperative mode" holds its lock; it must "cooperate" with the GC (by releasing the lock) in order for a GC to proceed. A thread which is in "preemptive" mode does not hold its lock – the GC may proceed "preemptively" because the thread is known to not be accessing the GC heap.
+
+A GC may only proceed when all managed threads are in "preemptive" mode (not holding the lock). The process of moving all managed threads to preemptive mode is known as "GC suspension" or "suspending the Execution Engine (EE)."
+
+A naïve implementation of this "lock" would be for each managed thread to actually acquire and release a real lock around each access to the GC heap. Then the GC would simply attempt to acquire the lock on each thread; once it had acquired all threads' locks, it would be safe to perform the GC.
+
+However, this naïve approach is unsatisfactory for two reasons. First, it would require managed code to spend a lot of time acquiring and releasing the lock (or at least checking whether the GC was attempting to acquire the lock – known as "GC polling.") Second, it would require the JIT to emit "GC info" describing the layout of the stack and registers for every point in JIT'd code; this information would consume large amounts of memory.
+
+We refined this naïve approach by separating JIT'd managed code into "partially interruptible" and "fully interruptible" code. In partially interruptible code, the only safe points are calls to other methods, and explicit "GC poll" locations where the JIT emits code to check whether a GC is pending. GC info need only be emitted for these locations. In fully interruptible code, every instruction is a safe point, and the JIT emits GC info for every instruction – but it does not emit GC polls. Instead, fully interruptible code may be "interrupted" by hijacking the thread (a process which is discussed later in this document). The JIT chooses whether to emit fully- or partially-interruptible code based on heuristics to find the best tradeoff between code quality, size of the GC info, and GC suspension latency.
+
+Given the above, there are three fundamental operations to define: entering cooperative mode, leaving cooperative mode, and suspending the EE.
+
+Entering Cooperative Mode
+-------------------------
+
+A thread enters cooperative mode by calling Thread::DisablePreemptiveGC. This acquires the "lock" for the current thread, as follows:
+
+1. If a GC is in progress (the GC holds the lock) then block until the GC is complete.
+2. Mark the thread as being in cooperative mode. No GC may proceed until the thread reenters preemptive mode.
+
+These two steps proceed as if they were atomic.
+
+Entering Preemptive Mode
+------------------------
+
+A thread enters preemptive mode (releases the lock) by calling Thread::EnablePreemptiveGC. This simply marks the thread as no longer being in cooperative mode, and informs the GC thread that it may be able to proceed.
+
+Suspending the EE
+-----------------
+
+When a GC needs to occur, the first step is to suspend the EE. This is done by GCHeap::SuspendEE, which proceeds as follows:
+
+1. Set a global flag (g\_fTrapReturningThreads) to indicate that a GC is in progress. Any threads that attempt to enter cooperative mode will block until the GC is complete.
+2. Find all threads currently executing in cooperative mode. For each such thread, attempt to hijack the thread and force it to leave cooperative mode.
+3. Repeat until no threads are running in cooperative mode.
+
+Hijacking
+---------
+
+Hijacking for GC suspension is done by Thread::SysSuspendForGC. This method attempts to force any managed thread that is currently running in cooperative mode, to leave cooperative mode at a "safe point." It does this by enumerating all managed threads (walking the ThreadStore), and for each managed thread currently running in cooperative mode.
+
+1. Suspend the underlying native thread. This is done with the Win32 SuspendThread API. This API forcibly stops the thread from running, at some random point in its execution (not necessarily a safe point).
+2. Get the current CONTEXT for the thread, via GetThreadContext. This is an OS concept; CONTEXT represents the current register state of the thread. This allows us to inspect its instruction pointer, and thus determine what type of code it is currently executing.
+3. Check again if the thread is in cooperative mode, as it may have already left cooperative mode before it could be suspended. If so, the thread is in dangerous territory: the thread may be executing arbitrary native code, and must be resumed immediately to avoid deadlocks.
+4. Check if the thread is running managed code. It is possible that it is executing native VM code in cooperative mode (see Synchronization, below), in which case the thread must be immediately resumed as in the previous step.
+5. Now the thread is suspended in managed code. Depending on whether that code is fully- or partially-interruptable, one of the following is performed:
+ * If fully interruptable, it is safe to perform a GC at any point, since the thread is, by definition, at a safe point. It is reasonable to leave the thread suspended at this point (because it's safe) but various historical OS bugs prevent this from working, because the CONTEXT retrieved earlier may be corrupt). Instead, the thread's instruction pointer is overwritten, redirecting it to a stub that will capture a more complete CONTEXT, leave cooperative mode, wait for the GC to complete, reenter cooperative mode, and restore the thread to its previous state.
+ * If partially-interruptable, the thread is, by definition, not at a safe point. However, the caller will be at a safe point (method transition). Using that knowledge, the CLR "hijacks" the top-most stack frame's return address (physically overwrite that location on the stack) with a stub similar to the one used for fully-interruptable code. When the method returns, it will no longer return to its actual caller, but rather to the stub (the method may also perform a GC poll, inserted by the JIT, before that point, which will cause it to leave cooperative mode and undo the hijack).
+
+ThreadAbort / AppDomain-Unload
+==============================
+
+In order to unload an AppDomain, the CLR must ensure that no thread is running in that AppDomain. To accomplish this, all managed threads are enumerated, and "abort" any threads which have stack frames belonging to the AppDomain being unloaded. A ThreadAbortException is "injected" into the running thread, which causes the thread to unwind (executing backout code along the way) until it is no longer executing in the AppDomain, at which point the ThreadAbortException is translated into an AppDomainUnloaded exception.
+
+ThreadAbortException is a special type of exception. It can be caught by user code, but the CLR ensures that the exception will be rethrown after the user's exception handler is executed. Thus ThreadAbortException is sometimes referred to as "uncatchable," though this is not strictly true.
+
+A ThreadAbortException is typically 'thrown' by simply setting a bit on the managed thread marking it as "aborting." This bit is checked by various parts of the CLR (most notably, every return from a p/invoke) and often times setting this bit is all that is needed to get the thread aborted in a timely manner.
+
+However, if the thread is, for example, executing a long-running managed loop, it may never check this bit. To get such a thread to abort faster, the thread i "hijacked" and forced to raise a ThreadAbortException. This hijacking is done in the same way as GC suspension, except that the stubs that the thread is redirected to will cause a ThreadAbortException to be raised, rather than waiting for a GC to complete.
+
+This hijacking means that a ThreadAbortException can be raised at essentially any arbitrary point in managed code. This makes it extremely difficult for managed code to deal successfully with a ThreadAbortException. It is therefore unwise to use this mechanism for any purpose other than AppDomain-Unload, which ensures that any state corrupted by the ThreadAbort will be cleaned up along with the AppDomain.
+
+Synchronization: Managed
+========================
+
+Managed code has access to many synchronization primitives, collected within the System.Threading namespace. These include wrappers for native OS primitives like Mutex, Event, and Semaphore objects, as well as some abstractions such as Barriers and SpinLocks. However, the primary synchronization mechanism used by most managed code is System.Threading.Monitor, which provides a high-performance locking facility on _any managed object_, and additionally provides "condition variable" semantics for signaling changes in the state protected by a lock.
+
+Monitor is implemented as a "hybrid lock;" it has features of both a spin-lock and a kernel-based lock like a Mutex. The idea is that most locks are held only briefly, so it takes less time to simply spin-wait for the lock to be released, than it would to make a call into the kernel to block the thread. It is important not to waste CPU cycles spinning, so if the lock has not been acquired after a brief period of spinning, the implementation falls back to blocking in the kernel.
+
+Because any object may potentially be used as a lock/condition variable, every object must have a location in which to store the lock information. This is done with "object headers" and "sync blocks."
+
+The object header is a machine-word-sized field that precedes every managed object. It is used for many purposes, such as storing the object's hash code. One such purpose is holding the object's lock state. If more per-object data is needed than will fit in the object header, we "inflate" the object by creating a "sync block."
+
+Sync blocks are stored in the Sync Block Table, and are addressed by sync block indexes. Each object with an associated sync block has the index of that index in the object's object header.
+
+The details of object headers and sync blocks are defined in [syncblk.h][syncblk.h]/[.cpp][syncblk.cpp].
+
+[syncblk.h]: https://github.com/dotnet/coreclr/blob/master/src/vm/syncblk.h
+[syncblk.cpp]: https://github.com/dotnet/coreclr/blob/master/src/vm/syncblk.cpp
+
+If there is room on the object header, Monitor stores the managed thread ID of the thread that currently holds the lock on the object (or zero (0) if no thread holds the lock). Acquiring the lock in this case is a simple matter of spin-waiting until the object header's thread ID is zero, and then atomically setting it to the current thread's managed thread ID.
+
+If the lock cannot be acquired in this manner after some number of spins, or the object header is already being used for other purposes, a sync block must be created for the object. This has additional data, including an event that can be used to block the current thread, allowing us to stop spinning and efficiently wait for the lock to be released.
+
+An object that is used as a condition variable (via Monitor.Wait and Monitor.Pulse) must always be inflated, as there is not enough room in the sync block to hold the required state.
+
+Synchronization: Native
+=======================
+
+The native portion of the CLR must also be aware of threading, as it will be invoked by managed code on multiple threads. This requires native synchronization mechanisms, such as locks, events, etc.
+
+The ITaskHost API allows a host to override many aspects of managed threading, including thread creation, destruction, and synchronization. The ability of a host to override native synchronization means that VM code can generally not use native synchronization primitives (Critical Sections, Mutexes, Events, etc.) directly, but rather must use the VM's wrappers over these.
+
+Additionally, as described above, GC suspension is a special kind of "lock" that affects nearly every aspect of the CLR. Native code in the VM may enter "cooperative" mode if it must manipulate GC heap objects, and thus the "GC suspension lock" becomes one of the most important synchronization mechanisms in native VM code, as well as managed.
+
+The major synchronization mechanisms used in native VM code are the GC mode, and Crst.
+
+GC Mode
+-------
+
+As discussed above, all managed code runs in cooperative mode, because it may manipulate the GC heap. Generally, native code does not touch managed objects, and thus runs in preemptive mode. But some native code in the VM must access the GC heap, and thus must run in cooperative mode.
+
+Native code generally does not manipulate the GC mode directly, but rather uses two macros: GCX\_COOP and GCX\_PREEMP. These enter the desired mode, and erect "holders" to cause the thread to revert to the previous mode when the scope is exited.
+
+It is important to understand that GCX\_COOP effectively acquires a lock on the GC heap. No GC may proceed while the thread is in cooperative mode. And native code cannot be "hijacked" as is done for managed code, so the thread will remain in cooperative mode until it explicitly switches back to preemptive mode.
+
+Thus entering cooperative mode in native code is discouraged. In cases where cooperative mode must be entered, it should be kept to as short a time as possible. The thread should not be blocked in this mode, and in particular cannot generally acquire locks safely.
+
+Similarly, GCX\_PREEMP potentially _releases_ a lock that had been held by the thread. Great care must be taken to ensure that all GC references are properly protected before entering preemptive mode.
+
+The [Rules of the Code](../coding-guidelines/clr-code-guide.md) document describes the disciplines needed to ensure safety around GC mode switches.
+
+Crst
+----
+
+Just as Monitor is the preferred locking mechanism for managed code, Crst is the preferred mechanism for VM code. Like Monitor, Crst is a hybrid lock that is aware of hosts and GC modes. Crst also implements deadlock avoidance via "lock leveling," described in the [Crst Leveling chapter of the BotR](../coding-guidelines/clr-code-guide.md#264-entering-and-leaving-crsts).
+
+It is generally illegal to acquire a Crst while in cooperative mode, though exceptions are made where absolutely necessary.
+
+Special Threads
+===============
+
+In addition to managing threads created by managed code, the CLR creates several "special" threads for its own use.
+
+Finalizer Thread
+----------------
+
+This thread is created in every process that runs managed code. When the GC determines that a finalizable object is no longer reachable, it places that object on a finalization queue. At the end of a GC, the finalizer thread is signaled to process all finalizers currently in this queue. Each object is then dequeued, one by one, and its finalizer is executed.
+
+This thread is also used to perform various CLR-internal housekeeping tasks, and to wait for notifications of some external events (such as a low-memory condition, which signals the GC to collect more aggressively). See GCHeap::FinalizerThreadStart for the details.
+
+GC Threads
+----------
+
+When running in "concurrent" or "server" modes, the GC creates one or more background threads to perform various stages of garbage collection in parallel. These threads are wholly owned and managed by the GC, and never run managed code.
+
+Debugger Thread
+---------------
+
+The CLR maintains a single native thread in each managed process, which performs various tasks on behalf of attached managed debuggers.
+
+AppDomain-Unload Thread
+-----------------------
+
+This thread is responsible for unloading AppDomains. This is done on a separate, CLR-internal thread, rather than the thread that requests the AD-unload, to a) provide guaranteed stack space for the unload logic, and b) allow the thread that requested the unload to be unwound out of the AD, if needed.
+
+ThreadPool Threads
+------------------
+
+The CLR's ThreadPool maintains a collection of managed threads for executing user "work items." These managed threads are bound to native threads owned by the ThreadPool. The ThreadPool also maintains a small number of native threads to handle functions like "thread injection," timers, and "registered waits."
diff --git a/Documentation/botr/type-loader.md b/Documentation/botr/type-loader.md
new file mode 100644
index 0000000..60a13cd
--- /dev/null
+++ b/Documentation/botr/type-loader.md
@@ -0,0 +1,317 @@
+Type Loader Design
+===
+
+Author: Ladi Prosek - 2007
+
+# Introduction
+
+In a class-based object oriented system, types are templates
+describing the data that individual instances will contain, and the
+functionality that they will provide. It is not possible to create an
+object without first defining its type<sup>1</sup>. Two objects are said to
+be of the same type if they are instances of the same type. The fact
+that they define the exact same set of members does not make them
+related in any way.
+
+The previous paragraph could as well describe a typical C++
+system. One additional feature essential to CLR is the availability of
+full runtime type information. In order to "manage" the managed code
+and provide type safe environment, the runtime must know the type of
+any object at any time. Such a type information must be readily
+available without extensive computation because the type identity
+queries are expected to be rather frequent (e.g. any type-cast
+involves querying the type identity of the object to verify that the
+cast is safe and can be done).
+
+This performance requirement rules out any dictionary look up
+approaches and leaves us with the following high-level architecture.
+
+![Figure 1](../images/typeloader-fig1.png)
+
+Figure 1 The abstract high-level object design
+
+Apart from the actual instance data, each object contains a type id
+which is simply a pointer to the structure that represents the
+type. This concept is similar to C++ v-table pointers, but the
+structure, which we call TYPE now and will define it more precisely
+later, contains more than just a v-table. For instance, it has to
+contain information about the hierarchy so that "is-a" subsumption
+questions can be answered.
+
+<sup>1</sup> The C# 3.0 feature called "anonymous types" lets you define an
+object without explicit reference to a type - simply by directly
+listing its fields. Don't let this fool you, there is in fact a type
+created behind the scenes for you by the compiler.
+
+## 1.1 Related Reading
+
+[1] Martin Abadi, Luca Cardelli, A Theory of Objects, ISBN
+978-0387947754
+
+[2] Andrew Kennedy ([@andrewjkennedy](https://github.com/andrewjkennedy)), Don Syme ([@dsyme](https://github.com/dsyme)), [Design and Implementation of Generics
+for the .NET Common Language
+Runtime][generics-design]
+
+[generics-design]: http://research.microsoft.com/apps/pubs/default.aspx?id=64031
+
+[3] [ECMA Standard for the Common Language Infrastructure (CLI)](http://www.ecma-international.org/publications/standards/Ecma-335.htm)
+
+## 1.2 Design Goals
+
+The ultimate purpose of the type loader (sometimes referred to as the
+class loader, which is strictly speaking not correct, because classes
+constitute just a subset of types - namely reference types - and the
+loader loads value types as well) is to build data structures
+representing the type which it is asked to load. These are the
+properties that the loader should have:
+
+- Fast type lookup ([module, token] => handle and [assembly, name] => handle).
+- Optimized memory layout to achieve good working set size, cache hit rate, and JITted code performance.
+- Type safety - malformed types are not loaded and a TypeLoadException is thrown.
+- Concurrency - scales well in multi-threaded environments.
+
+# 2 Type Loader Architecture
+
+There is a relatively small number of entry-points to the loader. Although the signature of each individual entry-point is slightly different, they all have the similar semantics. They take a type/member designation in the form of a metadata **token** or a **name** string, a scope for the token (a **module** or an **assembly** ), and some additional information like flags. They return the loaded entity in the form of a **handle**.
+
+There are usually many calls to the type loader during JITting. Consider:
+
+ object CreateClass()
+ {
+ return new MyClass();
+ }
+
+In the IL, MyClass is referred to using a metadata token. In order to generate a call to the **JIT\_New** helper which takes care of the actual instantiation, the JIT will ask the type loader to load the type and return a handle to it. This handle will be then directly embedded in the JITted code as an immediate value. The fact that types and members are usually resolved and loaded at JIT time and not at run-time also explains the sometimes confusing behavior easily hit with code like this:
+
+ object CreateClass()
+ {
+ try {
+ return new MyClass();
+ } catch (TypeLoadException) {
+ return null;
+ }
+ }
+
+If **MyClass** fails to load, for example because it's supposed to be defined in another assembly and it was accidentally removed in the newest build, then this code will still throw **TypeLoadException**. The reason that the catch block did not catch it is that it never ran! The exception occurred during JITting and would only be catchable in the method that called **CreateClass** and caused it to be JITted. In addition, it may not be always obvious at which point the JITting is triggered due to inlining, so users should not expect and rely on deterministic behavior.
+
+## Key Data Structures
+
+The most universal type designation in the CLR is the **TypeHandle**. It's an abstract entity which encapsulates a pointer to either a **MethodTable** (representing "ordinary" types like **System.Object** or **List<string>** ) or a **TypeDesc** (representing byrefs, pointers, function pointers, arrays, and generic variables). It constitutes the identity of a type in that two handles are equal if and only if they represent the same type. To save space, the fact that a **TypeHandle** contains a **TypeDesc** is indicated by setting the second lowest bit of the pointer to 1 (i.e. (ptr | 2)) instead of using additional flags<sup>2</sup>. **TypeDesc** is "abstract" and has the following inheritance hierarchy.
+
+![Figure 2](../images/typeloader-fig2.png)
+
+Figure 2 The TypeDesc hierarchy
+
+**TypeDesc**
+
+Abstract type descriptor. The concrete descriptor type is determined by flags.
+
+**TypeVarTypeDesc**
+
+Represents a type variable, i.e. the **T** in **List<T>** or in **Array.Sort<T>** (see the part about generics below). Type variables are never shared between multiple types or methods so each variable has its one and only owner.
+
+**FnPtrTypeDesc**
+
+Represents a function pointer, essentially a variable-length list of type handles referring to the return type and parameters. It's not that common to see this descriptor because function pointers are not supported by C#. However, managed C++ uses them.
+
+**ParamTypeDesc**
+
+This descriptor represents a byref and pointer types. Byrefs are the results of the **ref** and **out** C# keywords applied to method parameters<sup>3</sup> whereas pointer types are unmanaged pointers to data used in unsafe C# and managed C++.
+
+**ArrayTypeDesc**
+
+Represents array types. It is derived from **ParamTypeDesc** because arrays are also parameterized by a single parameter (the type of their element). This is opposed to generic instantiations whose number of parameters is variable.
+
+**MethodTable**
+
+This is by far the central data structure of the runtime. It represents any type which does not fall into one of the categories above (this includes primitive types, and generic types, both "open" and "closed"). It contains everything about the type that needs to be looked up quickly, such as its parent type, implemented interfaces, and the v-table.
+
+**EEClass**
+
+**MethodTable** data are split into "hot" and "cold" structures to improve working set and cache utilization. **MethodTable** itself is meant to only store "hot" data that are needed in program steady state. **EEClass** stores "cold" data that are typically only needed by type loading, JITing or reflection. Each **MethodTable** points to one **EEClass**.
+
+Moreover, **EEClasse**s are shared by generic types. Multiple generic type **MethodTable**s can point to single **EEClass**. This sharing adds additional constrains on data that can be stored on **EEClass**.
+
+**MethodDesc**
+
+It is no surprise that this structure describes a method. It actually comes in a few flavors which have their corresponding **MethodDesc** subtypes but most of them really are out of the scope of this document. Suffice it to say that there is one subtype called **InstantiatedMethodDesc** which plays an important role for generics. For more information please see [**Method Descriptor Design**](method-descriptor.md).
+
+**FieldDesc**
+
+Analogous to **MethodDesc** , this structure describes a field. Except for certain COM interop scenarios, the EE does not care about properties and events at all because they boil down to methods and fields at the end of the day, and it's just compilers and reflection who generate and understand them in order to provide that syntactic sugar kind of experience.
+
+<sup>2</sup> This is useful for debugging. If the value of a **TypeHandle**
+ends with 2, 6, A, or E, then it's not a **MethodTable** and the extra
+bit has to be cleared in order to successfully inspect the
+**TypeDesc**.
+
+<sup>3</sup> Note that the difference between **ref** and **out** is just in a
+parameter attribute. As far as the type system is concerned, they are
+both the same type.
+
+## 2.1 Load Levels
+
+When the type loader is asked to load a specified type, identified for example by a typedef/typeref/typespec **token** and a **Module** , it does not do all the work atomically at once. The loading is done in phases instead. The reason for this is that the type usually depends on other types and requiring it to be fully loaded before it can be referred to by other types would result in infinite recursion and deadlocks. Consider:
+
+ classA<T> : C<B<T>>
+ { }
+
+ classB<T> : C<A<T>>
+ { }
+
+ classC<T>
+ { }
+
+These are valid types and apparently **A** depends on **B** and **B** depends on **A**.
+
+The loader initially creates the structure(s) representing the type and initializes them with data that can be obtained without loading other types. When this "no-dependencies" work is done, the structure(s) can be referred from other places, usually by sticking pointers to them into another structures. After that the loader progresses in incremental steps and fills the structure(s) with more and more information until it finally arrives at a fully loaded type. In the above example, the base types of **A** and **B** will be approximated by something that does not include the other type, and substituted by the real thing later.
+
+The exact half-loaded state is described by the so-called load level, starting with CLASS\_LOAD\_BEGIN, ending with CLASS\_LOADED, and having a couple of intermediate levels in between. There are rich and useful comments about individual load levels in the [classloadlevel.h](https://github.com/dotnet/coreclr/blob/master/src/vm/classloadlevel.h) source file. Notice that although types can be saved in NGEN images, the representing structures cannot be simply mapped or blitted into memory and used without additional work called "restoring". The fact that a type came from an NGEN image and needs to be restored is also captured by its load level.
+
+See [Design and Implementation of Generics
+for the .NET Common Language
+Runtime][generics-design] for more detailed explanation of load levels.
+
+## 2.2 Generics
+
+In the generics-free world, everything is nice and everyone is happy because every ordinary (not represented by a **TypeDesc**) type has one **MethodTable** pointing to its associated **EEClass** which in turn points back to the **MethodTable**. All instances of the type contain a pointer to the **MethodTable** as their first field at offset 0, i.e. at the address seen as the reference value. To conserve space, **MethodDescs** representing methods declared by the type are organized in a linked list of chunks pointed to by the **EEClass**<sup>4</sup>.
+
+![Figure 3](../images/typeloader-fig3.png)
+
+Figure 3 Non-generic type with non-generic methods
+
+<sup>4</sup> Of course, when managed code runs, it does not call methods by
+looking them up in the chunks. Calling a method is a very "hot"
+operation and normally needs to access only information in the
+**MethodTable**.
+
+### 2.2.1 Terminology
+
+**Generic Parameter**
+
+A placeholder to be substituted by another type; the **T** in the declaration of **List<T>**. Sometimes called formal type parameter. A generic parameter has a name and optional generic constraints.
+
+**Generic Argument**
+
+A type being substituted for a generic parameter; the **int** in **List<int>**. Note that a generic parameter can also be used as an argument. Consider:
+
+ List<T> GetList<T>()
+ {
+ return new List<T>();
+ }
+
+The method has one generic parameter **T** which is used as a generic argument for the generic list class.
+
+**Generic Constraint**
+
+An optional requirement placed by generic parameters on its potential generic arguments. Types that do not have the required properties may not be substituted for the generic parameter and it is enforced by the type loader. There are three kinds of generic constraints:
+
+1. Special constraints
+ - Reference type constraint - the generic argument must be a reference type (as opposed to a value type). The `class` keyword is used in C# to express this constraint.
+
+ public class A<T> where T : class
+
+ - Value type constraint - the generic argument must be a value type different from `System.Nullable<T>`. C# uses the `struct` keyword.
+
+ public class A<T> where T : struct
+
+ - Default constructor constraint - the generic argument must have a public parameterless constructor. This is expressed by `new()` in C#.
+
+ public class A<T> where T : new()
+
+2. Base type constraints - the generic argument must be derived from
+(or directly be of) the given non-interface type. It obviously makes
+sense to use only zero or one reference type as a base types
+constraint.
+
+ public class A<T> where T : EventArgs
+
+3. Implemented interface constraints - the generic argument must
+implement (or directly be of) the given interface type. Zero or more
+interfaces can be given.
+
+ public class A<T> where T : ICloneable, IComparable<T>
+
+The above constraints are combined with an implicit AND, i.e. a
+generic parameter can be constrained to be derived from a given type,
+implement several interfaces, and have the default constructor. All
+generic parameters of the declaring type can be used to express the
+constraints, introducing interdependencies among the parameters. For
+example:
+
+ public class A<S, T, U>
+ where S : T
+ where T : IList<U> {
+ void f<V>(V v) where V : S {}
+ }
+
+**Instantiation**
+
+The list of generic arguments that were substituted for generic
+parameters of a generic type or method. Each loaded generic type and
+method has its instantiation.
+
+**Typical Instantiation**
+
+An instantiation consisting purely of the type's or method's own type
+parameters and in the same order in which the parameters are
+declared. There exists exactly one typical instantiation for each
+generic type and method. Usually when one talks about an open generic
+type, they have the typical instantiation in mind. Example:
+
+ public class A<S, T, U> {}
+
+The C# `typeof(A<,,>)` compiles to ldtoken A\'3 which makes the
+runtime load **A`3** instantiated at **S** , **T** , **U**.
+
+**Canonical Instantiation**
+
+An instantiation where all generic arguments are
+**System.\_\_Canon**. **System.\_\_Canon** is an internal type defined
+in **mscorlib** and its task is just to be well-known and different
+from any other type which may be used as a generic
+argument. Types/methods with canonical instantiation are used as
+representatives of all instantiations and carry information shared by
+all instantiations. Since **System.\_\_Canon** can obviously not
+satisfy any constraints that the respective generic parameter may have
+on it, constraint checking is special-cased with respect to
+**System.\_\_Canon** and ignores these violations.
+
+### 2.2.2 Sharing
+
+With the advent of generics, the number of types loaded by the runtime
+tends to be higher. Although generic types with different
+instantiations (for example **List&lt;string>** and **List&lt;object>**)
+are different types each with its own **MethodTable** , it turns out
+that there is a considerable amount of information that they can
+share. This sharing has a positive impact on the memory footprint and
+consequently also performance.
+
+![Figure 4](../images/typeloader-fig4.png)
+
+Figure 4 Generic type with non-generic methods - shared EEClass
+
+Currently all instantiations containing reference types share the same
+**EEClass** and its **MethodDescs**. This is feasible because all
+references are of the same size - 4 or 8 bytes - and hence the layout
+of all these types is the same. The figure illustrates this for
+**List&lt;object>** and **List&lt;string>**. The canonical **MethodTable**
+was created automatically before the first reference type
+instantiation was loaded and contains data which is hot but not
+instantiation specific like non-virtual slots or
+**RemotableMethodInfo**. Instantiations containing only value types
+are not shared and every such instantiated type gets its own unshared
+**EEClass**.
+
+**MethodTables** representing generic types loaded so far are cached
+in a hash table owned by their loader module<sup>5</sup>. This hash table is
+consulted before a new instantiation is constructed, making sure
+that there will never be two or more **MethodTable** instances
+representing the same type.
+
+See [Design and Implementation of Generics
+for the .NET Common Language
+Runtime][generics-design] for more information about generic sharing.
+
+<sup>5</sup> Things get a bit more complicated for types loaded from NGEN
+images.
diff --git a/Documentation/botr/type-system.md b/Documentation/botr/type-system.md
new file mode 100644
index 0000000..ca5f234
--- /dev/null
+++ b/Documentation/botr/type-system.md
@@ -0,0 +1,233 @@
+Type System Overview
+====================
+
+Author: David Wrighton ([@davidwrighton](https://github.com/davidwrighton)) - 2010
+
+Introduction
+============
+
+The CLR type system is our representation the type system described in the ECMA specification + extensions.
+
+Overview
+--------
+
+The type sytem is composed of a series of data structures, some of which are described in other Book of the Runtime chapters, as well as a set of algorithms which operate on and create those data structures. It is NOT the type system exposed through reflection, although that one does depend on this system.
+
+The major data structures maintained by the type system are:
+
+- MethodTable
+- EEClass
+- MethodDesc
+- FieldDesc
+- TypeDesc
+- ClassLoader
+
+The major algorithms contained within the type system are:
+
+- **Type Loader:** Used to load types and create most of the primary data structures of the type system.
+- **CanCastTo and similar:** The functionality of comparing types.
+- **LoadTypeHandle:** Primarily used for finding types.
+- **Signature parsing:** Used to compare and gather information about methods and fields.
+- **GetMethod/FieldDesc:** Used to find/load methods/fields.
+- **Virtual Stub Dispatch:** Used to find the destination of virtual calls to interfaces.
+
+There are significantly more ancillary data structures and algorithms that provide various bits of information to the rest of the CLR, but they are less significant to the overall understanding of the system.
+
+Component Architecture
+----------------------
+
+The type system's data structures are generally used by all of the various algorithms. This document does not describe the type system algorithms (as there are or should be other book of the runtime documents for those), but it does attempt to describe the various major data structures below.
+
+Dependencies
+------------
+
+The type system is generally a service provided to many parts of the CLR, and most core components have some form of dependency on the behavior of the type system. This diagram describes the general dataflow that effects the type system. It is not exhaustive, but calls out the major information flows.
+
+![dependencies](../images/type-system-dependencies.png)
+
+### Component Dependencies
+
+The primary dependencies of the type system follow:
+
+- The **loader** needed to get the correct metadata to work with.
+- The **metadata system** provides a metadata api to gather information.
+- The **security system** informs the type system whether or not certain type system structures are permitted (e.g. inheritance).
+- The **AppDomain** provides a LoaderAllocator to handle allocation behavior for the type system data structures.
+
+### Components Dependent on this Component
+
+The type system has 3 primary components which depend on it.
+
+- The **Jit interface**, and the jit helpers primarily depends on the type, method, and field searching functionality. Once the type system object is found, the data structures returned have been tailored to provide the information needed by the jit.
+- **Reflection** uses the type system to provide relatively simple access to ECMA standardized concepts which we happen to capture in the CLR type system data structures.
+- **General managed code execution** requires the use of the type system for type comparison logic, and virtual stub dispatch.
+
+Design of Type System
+=====================
+
+The core type system data structures are the data structures that represent the actual loaded types (e.g. TypeHandle, MethodTable, MethodDesc, TypeDesc, EEClass) and the data structure that allow types to be found once they are loaded (e.g. ClassLoader, Assembly, Module, RIDMaps).
+
+The data structures and algorithms for loading types are discussed in the [Type Loader](type-loader.md) and [MethodDesc](method-descriptor.md) Book of the Runtime chapters.
+
+Tying those data structures together is a set of functionality that allows the JIT/Reflection/TypeLoader/stackwalker to find existing types and methods. The general idea is that these searches should be easily driven by the metadata tokens/signatures that are specified in the ECMA CLI specification.
+
+And finally, when the appropriate type system data structure is found, we have algorithms to gather information from a type, and/or compare two types. A particularly complicated example of this form of algorithm may be found in the [Virtual Stub Dispatch](virtual-stub-dispatch.md) Book of the Runtime chapter.
+
+Design Goals and Non-goals
+--------------------------
+
+### Goals
+
+- Accessing information needed at runtime from executing (non-reflection) code is very fast.
+- Accessing information needed at compilation time for generating code is straightforward.
+- The garbage collector/stackwalker is able to access necessary information without taking locks, or allocating memory.
+- Minimal amounts of types are loaded at a time.
+- Minimal amounts of a given type are loaded at type load time.
+- Type system data structures must be storable in NGEN images.
+
+### Non-Goals
+
+- All information in the metadata is directly reflected in the CLR data structures.
+- All uses of reflection are fast.
+
+Design of a typical algorithm used at runtime during execution of managed code
+------------------------------------------------------------------------------
+
+The casting algorithm is typical of algorithms in the type system that are heavily used during the execution of managed code.
+
+There are at least 4 separate entrypoints into this algorithm. Each entrypoint is chosen to provide a different fast path, in the hopes that the best performance possible will be achieved.
+
+- Can an object be cast to a particular non-type equivalent non-array type?
+- Can an object be cast to an interface type that does not implement generic variance?
+- Can an object be cast to an array type?
+- Can an object of a type be cast to an arbitrary other managed type?
+
+Each of these implementations with the exception of the last one is optimized to perform better at the expense of not being fully general.
+
+For instance, the "Can a type be cast to a parent type" which is a variant of "Can an object be cast to a particular non-type equivalent non-array type?" code is implemented with a single loop that walks a singly linked list. This is only able to search a subset of possible casting operations, but it is possible to determine if that is the appropriate set by examining the type the cast is trying to enforce. This algorithm is implemented in the jit helper JIT\_ChkCastClass\_Portable.
+
+Assumptions:
+
+- Special purpose implementations of algorithms are a performance improvement in general.
+- Extra versions of algorithms do not provide an insurmountable maintenance problem.
+
+Design of typical search algorithm in the Type System
+-----------------------------------------------------
+
+There are a number of algorithms in the type system which follow this common pattern.
+
+The type system is commonly used to find a type. This may be triggered via any number of inputs such as the JIT, reflection, serialization, remoting, etc.
+
+The basic input to the type system in these cases is
+
+- The context from which the search shall begin (a Module or assembly pointer).
+- An identifier that describes the sought after type in the initial context. This is typically a token, or a string (if an assembly is the search context).
+
+The algorithm must first decode the identifier.
+
+For the search for a type scenario, the token may be either a TypeDef token, a TypeRef token, a TypeSpec token, or a string. Each of these different identifiers will cause a different form of lookup.
+
+- A **typedef token** will cause a lookup in the RidMap of the Module. This is a simple array index.
+- A **typeref token** will cause a lookup to find the assembly which this typeref token refers to, and then the type finding algorithm is begun anew with the found assembly pointer, and a string gathered from the typeref table.
+- A **typespec token** indicates that a signature must be parsed to find the signature. Parse the signature to find the information necessary to load the type. This will recursively trigger more type finding.
+- A **name** is used to bind between assemblies. The TypeDef/ExportedTypes table is searched for matches. Note: This search is optimized by hashtables on the manifest module object.
+
+From this design a number of common characteristics of search algorithms in the type system are evident.
+
+- Searches use input that is tightly coupled to metadata. In particular, metadata tokens and string names are commonly passed around. Also, these searches are tied to Modules, which directly map to .dll and .exe files.
+- Use of cached information to improve performance. The RidMap and hash tables are data structures optimized to improve these lookups.
+- The algorithms typically have 3-4 different paths based on their input.
+
+In addition to this general design, there are a number of extra requirements that are layered onto this.
+
+- **ASSUMPTION:** Searching for types that are already loaded is safe to perform while stopped in the GC.
+- **INVARIANT:** A type which has already been loaded will always be found if searched for.
+- **ISSUE:** Search routines rely on metadata reading. This can yield inadequate performance in some scenarios.
+
+This search algorithm is typical of the routines used during JITing. It has a number of common characteristics.
+
+- It uses metadata.
+- It requires looking for data in many places.
+- There is relatively little duplication of data in our data structures.
+- It typically does not recurse deeply, and does not have loops.
+
+This allows us to meet the performance requirements, and characteristics necessary for working with an IL based JIT.
+
+Garbage Collector Requirements on the Type System
+-------------------------------------------------
+
+The garbage collector requires information about instances of types allocated in the GC heap. This is done via a pointer to a type system data structure (MethodTable) at the head of every managed object. Attached to the MethodTable, is a data structure that describes the GC layout of instances of types. There are two forms of this layout (one for normal types, and object arrays, and another for arrays of valuetypes).
+
+- **ASSUMPTION:** Type system data structures have a lifetime that exceeds that of managed objects that are of types described in the type system data structure.
+- **REQUIREMENT:** The garbage collector has a requirement to execute the stack walker while the runtime is suspended. This will be discussed next.
+
+Stackwalker requirements on the Type System
+-------------------------------------------
+
+The stack walker/ GC stack walker requires type system input in 2 cases.
+
+- For finding the size of valuetypes on the stack.
+- For finding GC roots to report within valuetypes on the stack.
+
+For various reasons involving the desire to delay load types, and the avoidance of generating multiple versions of code (that only differ via associated gc info) the CLR currently requires the walking of signatures of methods that are on the stack. This need is rarely exercised, as it requires the stack walker to execute at very particular moments in time, but in order to meet our reliability goals, the signature walker must be able to function while stackwalking.
+
+The stack walker executes in approximately 3 modes.
+
+- To walk the stack of the current thread for security or exception processing reasons.
+- To walk the stack of all threads for GC purposes (all threads are suspended by the EE).
+- To walk the stack of a particular thread for a profiler (that specific thread is suspended).
+
+In the GC stack walking case, and in the profiler stack walking case, due to thread suspension, it is not safe to allocate memory or take most locks.
+
+This has led us to develop a path through the type system which may be relied upon to follow the above requirement.
+
+The rule required for the type system to achieve this goal is:
+
+- If a method has been called, then all valuetype parameters of the called method will have been loaded into some appdomain in the process.
+- The assembly reference from the assembly with the signature to the assembly implementing the type must be resolved before a walk of the signature is necessary as part of a stack walk.
+
+This is enforced via an extensive and complicated set of enforcements within the type loader, NGEN image generation process, and JIT.
+
+- **ISSUE:** Stackwalker requirements on the type system are HIGHLY fragile.
+- **ISSUE:** Implementation of stack walker requirements in the type system requires a set of contract violations at every function in the type system that may be touched while searching for types which are loaded.
+- **ISSUE:** The signature walks performed are done with the normal signature walking code. This code is designed to load types as it walks the signature, but in this case the type load functionality is used with the assumption that no type load will actually be triggered.
+- **ISSUE:** Stackwalker requirements require support from not just the type system, but also the assembly loader. The Loader has had a number of issues meeting the needs of the type system here.
+
+Type System and NGEN
+--------------------
+
+The type system data structures are a core part of what is saved into NGEN images. Unfortunately, these data structures logically have pointers within them that point to other NGEN images. In order to handle this situation, the type system data structures implement a concept known as restoration.
+
+In restoration, when a type system data structure is first needed, the data structure is fixed up with correct pointers. This is tied into the type loading levels described in the [Type Loader](type-loader.md) Book of the Runtime chapter.
+
+There also exists the concept of pre-restored data structures. This means that the data structure is sufficiently correct at ngen image load time (after intra-module pointer fixups and eager load type fixups), that the data structure may be used as is. This optimization requires that the ngen image be "hard bound" to its dependent assemblies. See NGEN documentation for further details.
+
+Type System and Domain Neutral Loading
+--------------------------------------
+
+The type system is a core part of the implementation of domain neutral loading. This is exposed to customers through the LoaderOptimization options available at AppDomain creation. Mscorlib is always loaded as domain neutral. The core requirement of this feature is that the type system data structures must not require pointers to domain specific state. Primarily this manifests itself in requirements around static fields and class constructors. In particular, whether or not a class constructor has been run is not a part of the core MethodTable data structure for this reason, and there is a mechanism for storing static data attached to the DomainFile data structure instead of the MethodTable data structure.
+
+Physical Architecture
+=====================
+
+Major parts of the type system are found in:
+
+- Class.cpp/inl/h – EEClass functions, and BuildMethodTable
+- MethodTable.cpp/inl/h – Functions for manipulating methodtables.
+- TypeDesc.cpp/inl/h – Functions for examining TypeDesc
+- MetaSig.cpp SigParser – Signature code
+- FieldDesc /MethodDesc – Functions for examining these data structures
+- Generics – Generics specific logic.
+- Array – Code for handling the special cases required for array processing
+- VirtualStubDispatch.cpp/h/inl – Code for virtual stub dispatch
+- VirtualCallStubCpu.hpp – Processor specific code for virtual stub dispatch.
+
+Major entry points are BuildMethodTable, LoadTypeHandleThrowing, CanCastTo\*, GetMethodDescFromMemberDefOrRefOrSpecThrowing, GetFieldDescFromMemberRefThrowing, CompareSigs, and VirtualCallStubManager::ResolveWorkerStatic.
+
+Related Reading
+===============
+
+- [ECMA CLI Specification](../project-docs/dotnet-standards.md)
+- [Type Loader](type-loader.md) Book of the Runtime Chapter
+- [Virtual Stub Dispatch](virtual-stub-dispatch.md) Book of the Runtime Chapter
+- [MethodDesc](method-descriptor.md) Book of the Runtime Chapter
diff --git a/Documentation/botr/virtual-stub-dispatch.md b/Documentation/botr/virtual-stub-dispatch.md
new file mode 100644
index 0000000..8d5a52c
--- /dev/null
+++ b/Documentation/botr/virtual-stub-dispatch.md
@@ -0,0 +1,188 @@
+Virtual Stub Dispatch
+=====================
+
+Author: Simon Hall ([@snwbrdwndsrf](https://github.com/snwbrdwndsrf)) - 2006
+
+Introduction
+============
+
+Virtual stub dispatching (VSD) is the technique of using stubs for virtual method invocations instead of the traditional virtual method table. In the past, interface dispatch required that interfaces had process-unique identifiers, and that every loaded interface was added to a global interface virtual table map. This requirement meant that all interfaces and all classes that implemented interfaces had to be restored at runtime in NGEN scenarios, causing significant startup working set increases. The motivation for stub dispatching was to eliminate much of the related working set, as well as distribute the remaining work throughout the lifetime of the process.
+
+Although it is possible for VSD to dispatch both virtual instance and interface method calls, it is currently used only for interface dispatch.
+
+Dependencies
+------------
+
+### Component Dependencies
+
+The stub dispatching code exists relatively independently of the rest of the runtime. It provides an API that allows dependent components to use it, and the dependencies listed below comprise a relatively small surface area.
+
+#### Code Manager
+
+VSD effectively relies on the code manager to provide information about state of a method, in particular, whether or not any particular method has transitioned to its final state in order that VSD may decide on details such as stub generation and target caching.
+
+#### Types and Methods
+
+MethodTables hold pointers to the dispatch maps used to determine the target code address for any given VSD call site.
+
+#### Special Types
+
+Calls on COM interop types must be custom dispatched, as they both have specialized target resolution.
+
+### Components Dependent on this Component
+
+#### Code Manager
+
+The code manager relies on VSD for providing the JIT compiler with call site targets for interface calls.
+
+#### Class Builder
+
+The class builder uses the API exposed by the dispatch mapping code to create dispatch maps during type building that will be used at dispatch type by the VSD code.
+
+Design Goals and Non-goals
+--------------------------
+
+### Goals
+
+#### Working Set Reduction
+
+Interface dispatch was previously implemented using a large, somewhat sparse vtable lookup map dealing with process-wide interface identifiers. The goal was to reduce the amount of cold working set by generating dispatch stubs as they were required, in theory keeping related call sites and their dispatch stubs close to each other and increasing the working set density.
+
+It is important to note that the initial working set involved with VSD is higher per call site due to the data structures required to track the various stubs that are created and collected as the system runs; however, as an application reaches steady state, these data structures are not needed for simple dispatching and so gets paged out. Unfortunately, for client applications this equated to a slower startup time, which is one of the factors that led to disabling VSD for virtual methods.
+
+#### Throughput Parity
+
+It was important to keep interface and virtual method dispatch at an amortized parity with the previous vtable dispatch mechanism.
+
+While it was immediately obvious that this was achievable with interface dispatch, it turned out to be somewhat slower with virtual method dispatch, one of the factors that led to disabling VSD for virtual methods.
+
+Design of Token Representation and Dispatch Map
+-----------------------------------------------
+
+Dispatch tokens are native word-sized values that are allocated at runtime, consisting internally of a tuple that represents an interface and slot.
+
+The design uses a combination of assigned type identifier values and slot numbers. Dispatch tokens consist of a combination of these two values. To facilitate integration with the runtime, the implementation also assigns slot numbers in the same way as the classic v-table layout. This means that the runtime can still deal with MethodTables, MethodDescs, and slot numbers in exactly the same way, except that the v-table must be accessed via helper methods instead of being directly accessed in order to handle this abstraction.
+
+The term _slot_ will always be used in the context of a slot index value in the classic v-table layout world and as created and interpreted by the mapping mechanism. What this means is that this is the slot number if you were to picture the classic method table layout of virtual method slots followed by non-virtual method slots, as previously implemented in the runtime. It's important to understand this distinction because within the runtime code, slot means both an index into the classic v-table structure, as well as the address of the pointer in the v-table itself. The change is that slot is now only an index value, and the code pointer addresses are contained in the implementation table (discussed below).
+
+The dynamically assigned type identifier values will be discussed later on.
+
+### Method Table
+
+#### Implementation Table
+
+This is an array that, for each method body introduced by the type, has a pointer to the entrypoint to that method. Its members are arranged in the following order:
+
+- Introduced (newslot) virtual methods.
+- Introduced non-virtual (instance and static) methods.
+- Overriding virtual methods.
+
+The reason for this format is that it provides a natural extension to the classic v-table layout. As a result many entries in the slot map (described below) can be inferred by this order and other details such as the total number of virtuals and non-virtuals for the class.
+
+When stub dispatch for virtual instance methods is disabled (as it is currently), the implementation table is non-existent and is substituted with a true vtable. All mapping results are expressed as slots for the vtable rather than an implementation table. Keep this in mind when implementation tables are mentioned throughout this document.
+
+#### Slot Map
+
+The slot map is a table of zero or more <_type_, [<_slot_, _scope_, (_index | slot_)>]> entries. _type_ is the dynamically assigned identification number mentioned above, and is either a sentinel value to indicate the current class (a call to a virtual instance method), or is an identifier for an interface implemented by the current class (or implicitly by one if its parents). The sub-map (contained in brackets) has one or more entries. Within each entry, the first element always indicates a slot within _type_. The second element, _scope_, specifies whether or not the third element is an implementation _index_ or a _slot_ number. _scope_ can be a known sentinel value that indicates that the next number is to be interpreted as a virtual slot number, and should be resolved virtually as _this.slot_. _scope_ can also identify a particular class in the inheritance hierarchy of the current class, and in such a case the third argument is an _index_ into the implementation table of the class indicated by _scope_, and is the final method implementation for _type.slot_.
+
+#### Example
+
+The following is a small class structure (modeled in C#), and what the resulting implementation table and slot map would be for each class.
+
+![Figure 1](../images/virtualstubdispatch-fig1.png)
+
+Thus, looking at this map, we see that the first column of the sub-maps of the slot maps correspond to the slot number in the classic virtual table view (remember that System.Object contributes four virtual methods of its own, which are omitted for clarity). Searches for method implementations are always bottom-up. Thus, if I had an object of type _B_ and I wished to invoke _I.Foo_, I would look for a mapping of _I.Foo_ starting at _B_'s slot map. Not finding it there, I would look in _A_'s slot map and find it there. It states that virtual slot 0 of _I_ (corresponding to _I.Foo_) is implemented by virtual slot 0. Then I return to _B_'s slot map and search for an implementation for slot 0, and find that it is implemented by slot 1 in its own implementation table.
+
+### Additional Uses
+
+It is important to note that this mapping technique can be used to implement methodimpl re-mapping of virtual slots (i.e., a virtual slot mapping in the map for the current class, similar to how an interface slot is mapped to a virtual slot). Because of the scoping capabilities of the map, non-virtual methods may also be referenced. This may be useful if ever the runtime wants to support the implementation of interfaces with non-virtual methods.
+
+### Optimizations
+
+The slot maps are bit-encoded and take advantage of typical interface implementation patterns using delta values, thus reducing the map size significantly. In addition, new slots (both virtual and non-) can be implied by their order in the implementation table. If the table contains new virtual slots followed by new instance slots, then followed by overrides, then the appropriate slot map entries can be implied by their index in the implementation table combined with the number of virtuals inherited by the parent class. All such implied map entries have been indicated with a (\*). The current layout of data structures uses the following pattern, where the DispatchMap is only present when mappings cannot be fully implied by ordering in the implementation table.
+
+ MethodTable -> [DispatchMap ->] ImplementationTable
+
+Type ID Map
+-----------
+
+This will map types to IDs, which are allocated as monotonically increasing values as each previously unmapped type is encountered. Currently, all such types are interfaces.
+
+Currently, this is implemented using a HashMap, and contains entries for both lookup directions.
+
+Dispatch Tokens
+---------------
+
+Dispatch tokens will be <_typeID_,_slot_> tuples. For interfaces, the type will be the interface ID assigned to that type. For virtual methods, this will be a constant value to indicate that the slot should just be resolved virtually within the type to be dispatched on (a virtual method call on _this_). This value pair will in most cases fit into the platform's native word size. On x86, this will likely be the lower 16 bits of each value, concatenated. This can be generalized to handle overflow issues similar to how a _TypeHandle_ in the runtime can be either a _MethodTable_ pointer or a <_TypeHandle,TypeHandle_> pair, using a sentinel bit to differentiate the two cases. It has yet to be determined if this is necessary.
+
+Design of Virtual Stub Dispatch
+===============================
+
+Dispatch Token to Implementation Resolution
+-------------------------------------------
+
+Given a token and type, the implementation is found by mapping the token to an implementation table index for the type. The implementation table is reachable from the type's MethodTable. This map is created in BuildMethodTable: it enumerates all interfaces implemented by the type for which it is building a MethodTable and determines every interface method that the type implements or overrides. By keeping track of this information, at interface dispatch time it is possible to determine the target code given the token and the target object (from which the MethodTable and token mapping can be obtained).
+
+Stubs
+-----
+
+Interface dispatch calls go through stubs. These stubs are all generated on demand, and all have the ultimate purpose of matching a token and object with an implementation, and forwarding the call to that implementation.
+
+There are currently three types of stubs. The below diagram shows the general control flow between these stubs, and will be explained below.
+
+![Figure 2](../images/virtualstubdispatch-fig2.png)
+
+### Generic Resolver
+
+This is in fact just a C function that serves as the final failure path for all stubs. It takes a <_token_, _type_> tuple and returns the target. The generic resolver is also responsible for creating dispatch and resolver stubs when they are required, patching indirection cells when better stubs become available, caching results, and all bookkeeping.
+
+### Lookup Stubs
+
+These stubs are the first to be assigned to an interface dispatch call site, and are created when the JIT compiles an interface call site. Since the JIT has no knowledge of the type being used to satisfy a token until the first call is made, this stub passes the token and type as arguments to the generic resolver. If necessary, the generic resolver will also create dispatch and resolve stubs, and will then back patch the call site to the dispatch stub so that the lookup stub is no longer used.
+
+One lookup stub is created for each unique token (i.e., call sites for the same interface slot will use the same lookup stub).
+
+### Dispatch Stubs
+
+These stubs are used when a call site is believed to be monomorphic in behaviour. This means that the objects used at a particular call site are typically the same type (i.e. most of the time the object being invoked is the same as the last object invoked at the same site.) A dispatch stub takes the type (MethodTable) of the object being invoked and compares it with its cached type, and upon success jumps to its cached target. On x86, this is typically results in a "comparison, conditional failure jump, jump to target" sequence and provides the best performance of any stub. If a stub's type comparison fails, it jumps to its corresponding resolve stub (see below).
+
+One dispatch stub is created for each unique <_token_,_type_> tuple, but only lazily when a call site's lookup stub is invoked.
+
+### Resolve Stubs
+
+Polymorphic call sites are handled by resolve stubs. These stubs use the key pair <_token_, _type_> to resolve the target in a global cache, where _token_ is known at JIT time and _type_ is determined at call time. If the global cache does not contain a match, then the final step of the resolve stub is to call the generic resolver and jump to the returned target. Since the generic resolver will insert the <_token_, _type_, _target_> tuple into the cache, a subsequent call with the same <_token_,_ type_> tuple will successfully find the target in the cache.
+
+When a dispatch stub fails frequently enough, the call site is deemed to be polymorphic and the resolve stub will back patch the call site to point directly to the resolve stub to avoid the overhead of a consistently failing dispatch stub. At sync points (currently the end of a GC), polymorphic sites will be randomly promoted back to monomorphic call sites under the assumption that the polymorphic attribute of a call site is usually temporary. If this assumption is incorrect for any particular call site, it will quickly trigger a backpatch to demote it to polymorphic again.
+
+One resolve stub is created per token, but they all use a global cache. A stub-per-token allows for a fast, effective hashing algorithm using a pre-calculated hash derived from the unchanging components of the <_token_, _type_> tuple.
+
+### Code Sequences
+
+The former interface virtual table dispatch mechanism results in a code sequence similar to this:
+
+![Figure 3](../images/virtualstubdispatch-fig3.png)
+
+And the typical stub dispatch sequence is:
+
+![Figure 1](../images/virtualstubdispatch-fig4.png)
+
+where expectedMT, failure and target are constants encoded in the stub.
+
+The typical stub sequence has the same number of instructions as the former interface dispatch mechanism, and fewer memory indirections may allow it to execute faster with a smaller working set contribution. It also results in smaller JITed code, since the bulk of the work is in the stub instead of the call site. This is only advantageous if a callsite is rarely invoked. Note that the failure branch is arranged so that x86 branch prediction will follow the success case.
+
+Current State
+=============
+
+Currently, VSD is enabled only for interface method calls but not virtual instance method calls. There were several reasons for this:
+
+- **Startup:** Startup working set and speed were hindered because of the need to generate a great deal of initial stubs.
+- **Throughput:** While interface dispatches are generally faster with VSD, virtual instance method calls suffer an unacceptable speed degradation.
+
+As a result of disabling VSD for virtual instance method calls, every type has a vtable for virtual instance methods and the implementation table described above is disabled. Dispatch maps are still present to enable interface method dispatching.
+
+Physical Architecture
+=====================
+
+For dispatch token and map implementation details, please see [clr/src/vm/contractImpl.h](https://github.com/dotnet/coreclr/blob/master/src/vm/contractimpl.h) and [clr/src/vm/contractImpl.cpp](https://github.com/dotnet/coreclr/blob/master/src/vm/contractimpl.cpp).
+
+For virtual stub dispatch implementation details, please see [clr/src/vm/virtualcallstub.h](https://github.com/dotnet/coreclr/blob/master/src/vm/virtualcallstub.h) and [clr/src/vm/virtualcallstub.cpp](https://github.com/dotnet/coreclr/blob/master/src/vm/virtualcallstub.cpp).
diff --git a/Documentation/building/buildinglldb.md b/Documentation/building/buildinglldb.md
new file mode 100644
index 0000000..053c64f
--- /dev/null
+++ b/Documentation/building/buildinglldb.md
@@ -0,0 +1,87 @@
+Building LLDB
+=============
+
+1. Clone the llvm, clang, and lldb repos like this:
+
+ llvm
+ |
+ `-- tools
+ |
+ +-- clang
+ |
+ `-- lldb
+
+ ```
+ cd $HOME
+ git clone http://llvm.org/git/llvm.git
+ cd $HOME/llvm/tools
+ git clone http://llvm.org/git/clang.git
+ git clone http://llvm.org/git/lldb.git
+ ```
+
+2. Checkout the "release_38" branches in llvm/clang/lldb:
+
+ ```
+ cd $HOME/llvm
+ git checkout release_38
+ cd $HOME/llvm/tools/clang
+ git checkout release_38
+ cd $HOME/llvm/tools/lldb
+ git checkout release_38
+ ```
+
+3. Install the prerequisites:
+
+ For Linux (Debian or Ubuntu):
+ ```
+ sudo apt-get install build-essential subversion swig python2.7-dev libedit-dev libncurses5-dev
+ ```
+
+ For OSX, the latest Xcode needs to be installed and I use Homebrew to install the rest:
+ ```
+ brew install python swig doxygen ocaml
+ ```
+
+ There may be more prerequisites required, when building the cmake files it should let
+ you know if there are any I missed.
+
+ See http://lldb.llvm.org/build.html for more details on these preliminaries.
+
+4. If building on OSX, carefully following the signing directions (before you build)
+ here: $HOME/llvm/tools/lldb/docs/code-signing.txt. Even though those build directions
+ say to use Xcode to build lldb, I never got it to work, but cmake/make works.
+
+5. Building the cmake files (you can build either debug or release or both).
+
+ For debug:
+ ```
+ mkdir -p $HOME/build/debug
+ cd $HOME/build/debug
+ cmake -DCMAKE_BUILD_TYPE=debug $HOME/llvm
+ ```
+ For release:
+ ```
+ mkdir -p $HOME/build/release
+ cd $HOME/build/release
+ cmake -DCMAKE_BUILD_TYPE=release $HOME/llvm
+ ```
+6. Build lldb (release was picked in this example, but can be replaced with "debug"):
+ ```
+ cd $HOME/build/release/tools/lldb
+ make -j16
+ ```
+ When you build with -j16 (parallel build with 16 jobs), sometimes it fails. Just start again with just make.
+
+ For OS X, building in remote ssh shell won't sign properly, use a terminal window on the machine itself.
+
+7. To use the newly built lldb and to build the coreclr SOS plugin for it, set these environment variables in your .profile:
+ ```
+ export LLDB_INCLUDE_DIR=$HOME/llvm/tools/lldb/include
+ export LLDB_LIB_DIR=$HOME/build/release/lib
+ PATH=$HOME/build/release/bin:$PATH
+ ```
+ For OS X also set:
+ ```
+ export LLDB_DEBUGSERVER_PATH=$HOME/build/release/bin/debugserver
+ ```
+ It also seems to be necessary to run lldb as superuser e.g. `sudo -E $HOME/build/release/bin/lldb` (the -E is necessary so the above debug server environment variable is passed) if using a remote ssh, but it isn't necessary if run it in a local terminal session.
diff --git a/Documentation/building/cross-building.md b/Documentation/building/cross-building.md
new file mode 100644
index 0000000..ab5897a
--- /dev/null
+++ b/Documentation/building/cross-building.md
@@ -0,0 +1,137 @@
+Cross Compilation for ARM on Linux
+==================================
+
+Through cross compilation, on Linux it is possible to build CoreCLR for arm or arm64.
+
+Requirements
+------------
+
+You need a Debian based host and the following packages needs to be installed:
+
+ ben@ubuntu ~/git/coreclr/ $ sudo apt-get install qemu qemu-user-static binfmt-support debootstrap
+
+In addition, to cross compile CoreCLR the binutils for the target are required. So for arm you need:
+
+ ben@ubuntu ~/git/coreclr/ $ sudo apt-get install binutils-arm-linux-gnueabihf
+
+and conversely for arm64:
+
+ ben@ubuntu ~/git/coreclr/ $ sudo apt-get install binutils-aarch64-linux-gnu
+
+
+Generating the rootfs
+---------------------
+The `cross\build-rootfs.sh` script can be used to download the files needed for cross compilation. It will generate an Ubuntu 14.04 rootfs as this is what CoreCLR targets.
+
+ Usage: build-rootfs.sh [BuildArch] [UbuntuCodeName]
+ BuildArch can be: arm, arm-softfp, arm64
+ UbuntuCodeName - optional, Code name for Ubuntu, can be: trusty(default), vivid, wily
+
+The `build-rootfs.sh` script must be run as root as it has to make some symlinks to the system, it will by default generate the rootfs in `cross\rootfs\<BuildArch>` however this can be changed by setting the `ROOTFS_DIR` environment variable.
+
+For example, to generate an arm rootfs:
+
+ ben@ubuntu ~/git/coreclr/ $ sudo ./cross/build-rootfs.sh arm
+
+You can choose Ubuntu code name to match your target, give `vivid` for `15.04`, `wily` for `15.10`. Default is `trusty`, version `14.04`.
+
+ ben@ubuntu ~/git/coreclr/ $ sudo ./cross/build-rootfs.sh arm wily
+
+and if you wanted to generate the rootfs elsewhere:
+
+ ben@ubuntu ~/git/coreclr/ $ sudo ROOTFS_DIR=/home/ben/coreclr-cross/arm ./cross/build-rootfs.sh arm
+
+
+Cross compiling CoreCLR
+-----------------------
+Once the rootfs has been generated, it will be possible to cross compile CoreCLR. If `ROOTFS_DIR` was set when generating the rootfs, then it must also be set when running `build.sh`.
+
+So, without `ROOTFS_DIR`:
+
+ ben@ubuntu ~/git/coreclr/ $ ./build.sh arm debug verbose cross -rebuild
+
+And with:
+
+ ben@ubuntu ~/git/coreclr/ $ ROOTFS_DIR=/home/ben/coreclr-cross/arm ./build.sh arm debug verbose cross -rebuild
+
+As usual the resulting binaries will be found in `bin/Product/BuildOS.BuildArch.BuildType/`
+
+
+Compiling mscorlib for ARM Linux
+================================
+
+It is also possible to use a Windows and a Linux machine to build the managed components of CoreCLR for ARM Linux. This can be useful when the build on the target platform fails, for example due to Mono issues.
+
+Build mscorlib on Windows
+-------------------------
+The following instructions assume you are on a Windows machine with a clone of the CoreCLR repo that has a correctly configured [environment](https://github.com/dotnet/coreclr/wiki/Windows-instructions#environment).
+
+To build mscorlib for Linux, run the following command:
+
+```
+D:\git\coreclr> build.cmd linuxmscorlib arm
+```
+
+The arguments `freebsdmscorlib` and `osxmscorlib` can be used instead to build mscorlib for FreeBSD or OS X.
+
+The output is at bin\Product\<BuildOS>.arm.Debug\mscorlib.dll.
+
+
+Build mscorlib on Ubuntu
+-------------------------
+The following instructions assume you are on a Linux machine such as Ubuntu 14.04 x86 64bit.
+
+To build mscorlib for Linux, run the following command:
+
+```
+ lgs@ubuntu ~/git/coreclr/ $ build.sh arm debug verbose -rebuild
+```
+
+The output is at bin/Product/<BuildOS>.arm.Debug/mscorlib.dll.
+
+```
+ lgs@ubuntu ~/git/coreclr/ $ file ./bin/Product/Linux.arm.Debug/mscorlib.dll
+ ./bin/Product/Linux.arm.Debug/mscorlib.dll: PE32 executable (DLL)
+ (console) ARMv7 Thumb Mono/.Net assembly, for MS Windows
+```
+
+Building coreclr for Linux ARM Emulator
+=======================================
+
+It is possible to build coreclr binaries (native and mscorlib.dll) and run coreclr unit tests on the Linux ARM Emulator (latest version provided here: [#3805](https://github.com/dotnet/coreclr/issues/3805)).
+The `tests/scripts/arm32_ci_script.sh` script does this.
+
+The following instructions assume that:
+* You have set up the extracted emulator at `/opt/linux-arm-emulator` (such that `/opt/linux-arm-emulator/platform/rootfs-t30.ext4` exists)
+The emulator rootfs is of 4GB size by default. But to enable testing of coreclr binaries on the emulator, you need to resize the rootfs (to atleast 7GB) using the instructions given in the `doc/RESIZE-IMAGE.txt` file of the extracted emulator.
+* The mount path for the emulator rootfs is `/opt/linux-arm-emulator-root` (change this path if you have a working directory at this path).
+
+All the following instructions are for the Release mode. Change the commands and files accordingly for the Debug mode.
+
+To just build libcoreclr and mscorlib for the Linux ARM Emulator, run the following command:
+```
+prajwal@ubuntu ~/coreclr $ ./tests/scripts/arm32_ci_script.sh \
+ --emulatorPath=/opt/linux-arm-emulator \
+ --mountPath=/opt/linux-arm-emulator-root \
+ --buildConfig=Release \
+ --skipTests
+```
+
+The Linux ARM Emulator is based on soft floating point and thus the native binaries in coreclr are built for the arm-softfp architecture. The coreclr binaries generated by the above command (native and mscorlib) can be found at `~/coreclr/bin/Product/Linux.arm-softfp.Release`.
+
+To build libcoreclr and mscorlib, and run selected coreclr unit tests on the emulator, do the following:
+* Download the latest Coreclr unit test binaries (or build on Windows) from here: [Debug](http://dotnet-ci.cloudapp.net/job/dotnet_coreclr/job/master/job/debug_windows_nt_bld/lastSuccessfulBuild/artifact/bin/tests/tests.zip) and [Release](http://dotnet-ci.cloudapp.net/job/dotnet_coreclr/job/master/job/release_windows_nt_bld/lastSuccessfulBuild/artifact/bin/tests/tests.zip).
+Setup the binaries at `~/coreclr/bin/tests/Windows_NT.x64.Release`.
+* Build corefx binaries for the Emulator as given [here](https://github.com/dotnet/corefx/blob/master/Documentation/building/cross-building.md#building-corefx-for-linux-arm-emulator).
+Setup these binaries at `~/corefx/bin/Linux.arm-softfp.Release`, `~/corefx/bin/Linux.AnyCPU.Release`, `~/corefx/bin/Unix.AnyCPU.Release`, and `~/corefx/bin/AnyOS.AnyCPU.Release`.
+* Run the following command (change value of `--testDirFile` argument to the file containing your selection of tests):
+```
+prajwal@ubuntu ~/coreclr $ ./tests/scripts/arm32_ci_script.sh \
+ --emulatorPath=/opt/linux-arm-emulator \
+ --mountPath=/opt/linux-arm-emulator-root \
+ --buildConfig=Release \
+ --testRootDir=~/coreclr/bin/tests/Windows_NT.x64.Release \
+ --coreFxNativeBinDir=~/corefx/bin/Linux.arm-softfp.Release \
+ --coreFxBinDir="~/corefx/bin/Linux.AnyCPU.Release;~/corefx/bin/Unix.AnyCPU.Release;~/corefx/bin/AnyOS.AnyCPU.Release" \
+ --testDirFile=~/coreclr/tests/testsRunningInsideARM.txt
+```
diff --git a/Documentation/building/crossgen.md b/Documentation/building/crossgen.md
new file mode 100644
index 0000000..c74a5e6
--- /dev/null
+++ b/Documentation/building/crossgen.md
@@ -0,0 +1,66 @@
+Using CrossGen to Create Native Images
+======================================
+
+Introduction
+------------
+
+When you create a .NET assembly using C# compiler, your assembly contains only MSIL code.
+When the app runs, the JIT compiler translates the MSIL code into native code, before the CPU can execute them.
+This execution model has some advantages. For example, your assembly code can be portable across all platforms and architectures that support .NET Core.
+However, this portability comes with a performance penalty. Your app starts up slower, because the JIT compiler has to spend time to translate the code.
+
+To help make your app start up faster, CoreCLR includes a tool called CrossGen, which can pre-compile the MSIL code into native code.
+
+Getting CrossGen
+----------------
+
+If you build CoreCLR yourself, the CrossGen tool (`crossgen.exe` on Windows, or `crossgen` on other platforms) is created as part of the build, and stored in the same output directory as other CoreCLR binaries.
+If you install CoreCLR using a NuGet package, you can find CrossGen in the `tools` folder of the NuGet package.
+
+Regardless of how you obtain CrossGen, it is very important that it must match other CoreCLR binaries.
+- If you build CrossGen yourself, you should use it with coreclr and mscorlib generated from the same build. Do not attempt to mix CrossGen from one build with binaries generated from another build.
+- If you install CrossGen from NuGet, make sure you use CrossGen from exactly the same NuGet package as the rest of your CoreCLR binaries. Do not attempt to mix binaries from multiple NuGet packages.
+
+If you do not follow the above rules, you are likely to encounter errors while running CrossGen.
+
+Using CrossGen
+--------------
+
+In most cases, the build script automatically runs CrossGen to create the native image for mscorlib.
+When this happens, you will find both `mscorlib.dll` and `mscorlib.ni.dll` in your output directory.
+`mscorlib.dll` is the MSIL assembly created by the C# compiler, while `mscorlib.ni.dll` is the native image that contains CPU-specific code.
+Once the build is done, you only need `mscorlib.ni.dll` to use CoreCLR.
+As a matter of fact, most CoreCLR NuGet packages contain only `mscorlib.ni.dll`, without `mscorlib.dll`
+
+If for some reason you did not get `mscorlib.ni.dll` with the rest of your CoreCLR, you can easily create it yourself using CrossGen.
+First, make sure you have `crossgen.exe` (on Windows) or `crossgen` (other platforms) in the same directory as `mscorlib.dll`.
+Then, run one of the following two commands (first command for Windows, second command for other platforms):
+
+ .\crossgen.exe mscorlib.dll
+ ./crossgen mscorlib.dll
+
+To create native images for other assemblies, the command line is slightly more complex:
+
+ .\crossgen.exe /Platform_Assemblies_Paths "path1;path2;..." assemblyName.dll
+ ./crossgen /Platform_Assemblies_Paths "path1:path2:..." assemblyName.dll
+
+The /Platform_Assemblies_Paths is used to specify the location of all the dependencies of the input assembly.
+You should use full paths for this locations. Relative paths do not always work.
+If there are multiple paths, separate them with semicolons (`;`) on Windows, or colons (`:`) on non-Windows platforms.
+It is generally a good idea to enclose the path list in quotes to protect any special characters from the shell.
+
+Using native images
+-------------------
+
+Running CrossGen on an assembly creates a "native image" file, with the extension of `.ni.dll` or `.ni.exe`.
+You should include the native images in your app, at the same location where you normally install the MSIL assemblies.
+Once you have included native images, you do not need to include the original MSIL assemblies in your apps.
+
+Common errors
+-------------
+
+The following are some of the command errors while creating or using native images:
+- "Could not load file or assembly 'mscorlib.dll' or one of its dependencies. The native image could not be loaded, because it was generated for use by a different version of the runtime. (Exception from HRESULT: 0x80131059)": This error indicates that there is a mismatch between CrossGen and mscorlib.ni.dll. Make sure to use CrossGen and mscorlib.ni.dll from the same build or NuGet package.
+- "Error: Could not load file or assembly '...' or one of its dependencies. The system cannot find the file specified. (Exception from HRESULT: 0x80070002)": CrossGen wasn't able to find a particular dependency that it needs. Verify that you have the assembly specified in the error message, and make sure its location is included in `/Platform_Assemblies_Paths`.
+- CoreCLR unable to initialize: While there are many possible causes of this error, one possibility is a mismatch between mscorlib.ni.dll and coreclr.dll (or libcoreclr.so). Make sure they come from the same build or NuGet package.
+- "Unable to load Jit Compiler": Please get a copy of `clrjit.dll` (or `libclrjit.so` or `libclrjit.dylib`, depending on your platform), and place it in the same directory as CrossGen. You can either build `clrjit.dll` yourself, or get it from `Microsoft.NETCore.Jit` NuGet package. To avoid possible issues, please use `clrjit.dll` from the same build as `crossgen.exe` if possible.
diff --git a/Documentation/building/debugging-instructions.md b/Documentation/building/debugging-instructions.md
new file mode 100644
index 0000000..4c33f9a
--- /dev/null
+++ b/Documentation/building/debugging-instructions.md
@@ -0,0 +1,145 @@
+Debugging CoreCLR
+=================
+
+These instructions will lead you through debugging CoreCLR on Windows and Linux. They will be expanded to support OS X when we have good instructions for that.
+
+Debugging CoreCLR on Windows
+============================
+
+1. Perform a build of the repo.
+2. Open \<repo_root\>\bin\obj\Windows_NT.\<platform\>.\<configuration\>\CoreCLR.sln in VS. \<platform\> and \<configurtion\> are based
+ on type of build you did. By default they are 'x64' and 'Debug'.
+3. Right click the INSTALL project and choose ‘Set as StartUp Project’
+4. Bring up the properties page for the INSTALL project
+5. Select Configuration Properties->Debugging from the left side tree control
+6. Set Command=`$(SolutionDir)..\..\product\Windows_NT.$(Platform).$(Configuration)\corerun.exe`
+ 1. This points to the folder where the built runtime binaries are present.
+7. Set Command Arguments=`<managed app you wish to run>` (e.g. HelloWorld.exe)
+8. Set Working Directory=`$(SolutionDir)..\..\product\Windows_NT.$(Platform).$(Configuration)`
+ 1. This points to the folder containing CoreCLR binaries.
+9. Press F11 to start debugging at wmain in corerun (or set a breakpoint in source and press F5 to run to it)
+ 1. As an example, set a breakpoint for the EEStartup function in ceemain.cpp to break into CoreCLR startup.
+
+Steps 1-8 only need to be done once, and then (9) can be repeated whenever you want to start debugging. The above can be done with Visual Studio 2013.
+
+Debugging CoreCLR on OS X
+==========================
+
+To use lldb on OS X, you first need to build it and the SOS plugin on the machine you intend to use it. See the instructions in [building lldb](buildinglldb.md). The rest of instructions on how to use lldb for Linux on are the same.
+
+Debugging CoreCLR on Linux
+==========================
+
+Only lldb is supported by the SOS plugin. gdb can be used to debug the coreclr code but with no SOS support. Visual Studio 2015 RTM remote debugging isn't currently supported.
+
+1. Perform a build of the coreclr repo.
+2. Install the corefx managed assemblies to the binaries directory.
+3. cd to build's binaries: `cd ~/coreclr/bin/Product/Linux.x64.Debug`
+4. Start lldb (the version the plugin was built with, currently 3.6): `lldb-3.6 corerun HelloWorld.exe linux`
+5. Now at the lldb command prompt, load SOS plugin: `plugin load libsosplugin.so`
+6. Launch program: `process launch -s`
+7. To stop annoying breaks on SIGUSR1/SIGUSR2 signals used by the runtime run: `process handle -s false SIGUSR1 SIGUSR2`
+8. Get to a point where coreclr is initialized by setting a breakpoint (i.e. `breakpoint set -n LoadLibraryExW` and then `process continue`) or stepping into the runtime.
+9. Run a SOS command like `sos ClrStack` or `sos VerifyHeap`. The command name is case sensitive.
+
+You can combine steps 4-8 and pass everything on the lldb command line:
+
+`lldb-3.6 -o "plugin load libsosplugin.so" -o "process launch -s" -o "process handle -s false SIGUSR1 SIGUSR2" -o "breakpoint set -n LoadLibraryExW" corerun HelloWorld.exe linux`
+
+### SOS commands ###
+
+This is the full list of commands currently supported by SOS. LLDB is case-sensitive unlike windbg.
+
+ Type "soshelp <functionname>" for detailed info on that function.
+
+ Object Inspection Examining code and stacks
+ ----------------------------- -----------------------------
+ DumpObj (dumpobj) Threads (clrthreads)
+ DumpArray ThreadState
+ DumpStackObjects (dso) IP2MD (ip2md)
+ DumpHeap (dumpheap) u (clru)
+ DumpVC DumpStack (dumpstack)
+ GCRoot (gcroot) EEStack (eestack)
+ PrintException (pe) ClrStack (clrstack)
+ GCInfo
+ EHInfo
+ bpmd (bpmd)
+
+ Examining CLR data structures Diagnostic Utilities
+ ----------------------------- -----------------------------
+ DumpDomain VerifyHeap
+ EEHeap (eeheap) FindAppDomain
+ Name2EE (name2ee) DumpLog (dumplog)
+ DumpMT (dumpmt)
+ DumpClass (dumpclass)
+ DumpMD (dumpmd)
+ Token2EE
+ DumpModule (dumpmodule)
+ DumpAssembly
+ DumpRuntimeTypes
+ DumpIL (dumpil)
+ DumpSig
+ DumpSigElem
+
+ Examining the GC history Other
+ ----------------------------- -----------------------------
+ HistInit (histinit) FAQ
+ HistRoot (histroot) Help (soshelp)
+ HistObj (histobj)
+ HistObjFind (histobjfind)
+ HistClear (histclear)
+
+###Aliases###
+
+By default you can reach all the SOS commands by using: _sos [command\_name]_
+However the common commands have been aliased so that you don't need the SOS prefix:
+
+ bpmd -> sos bpmd
+ clrstack -> sos ClrStack
+ clrthreads -> sos Threads
+ clru -> sos U
+ dso -> sos DumpStackObjects
+ dumpclass -> sos DumpClass
+ dumpheap -> sos DumpHeap
+ dumpil -> sos DumpIL
+ dumplog -> sos DumpLog
+ dumpmd -> sos DumpMD
+ dumpmodule -> sos DumpModule
+ dumpmt -> sos DumpMT
+ dumpobj -> sos DumpObj
+ dumpstack -> sos DumpStack
+ eeheap -> sos EEHeap
+ eestack -> sos EEStack
+ gcroot -> sos GCRoot
+ histinit -> sos HistInit
+ histroot -> sos HistRoot
+ histobj -> sos HistObj
+ histobjfind -> sos HistObjFind
+ histclear -> sos HistClear
+ ip2md -> sos IP2MD
+ name2ee -> sos Name2EE
+ pe -> sos PrintException
+ soshelp -> sos Help
+
+
+### Problems and limitations of lldb and SOS ###
+
+Many of the SOS commands like clrstack or dso don't work on core dumps because lldb doesn't
+return the actual OS thread id for a native thread. The "setsostid" command can be used to work
+around this lldb bug. Use the "clrthreads" to find the os tid and the lldb command "thread list"
+to find the thread index (#1 for example) for the current thread (* in first column). The first
+setsostid argument is the os tid and the second is the thread index: "setsosid ecd5 1".
+
+The "gcroot" command either crashes lldb 3.6 or returns invalid results. Works fine with lldb 3.7 and 3.8.
+
+Loading Linux core dumps with lldb 3.7 doesn't work. lldb 3.7 loads OS X and FreeBSD core dumps
+just fine. lldb 3.8 loads all the platform's core dumps without problem.
+
+For more information on SOS commands see: https://msdn.microsoft.com/en-us/library/bb190764(v=vs.110).aspx
+
+Debugging Mscorlib and/or managed application
+=============================================
+
+To step into and debug managed code of Mscorlib.dll (or the managed application being executed by the runtime you built), using Visual Studio, is something that will be supported with Visual Studio 2015. We are actively working to enable this support.
+
+Until then, you can use [WinDbg](https://msdn.microsoft.com/en-us/library/windows/hardware/ff551063(v=vs.85).aspx) and [SOS](https://msdn.microsoft.com/en-us/library/bb190764(v=vs.110).aspx) (an extension to WinDbg to support managed debugging) to step in and debug the generated managed code. This is what we do on the .NET Runtime team as well :)
diff --git a/Documentation/building/freebsd-instructions.md b/Documentation/building/freebsd-instructions.md
new file mode 100644
index 0000000..c190ce7
--- /dev/null
+++ b/Documentation/building/freebsd-instructions.md
@@ -0,0 +1,282 @@
+Build CoreCLR on FreeBSD
+======================
+
+This guide will walk you through building CoreCLR on FreeBSD and running Hello World. We'll start by showing how to set up your environment from scratch.
+
+Environment
+===========
+
+These instructions are written assuming FreeBSD 10.1-RELEASE, since that's the release the team uses.
+
+These instructions assume you use the binary package tool `pkg` (analog to `apt-get` or `yum` on Linux) to install the environment. Compiling the dependencies from source using the ports tree might work too, but is untested.
+
+Minimum RAM required to build is 1GB. The build is known to fail on 512 MB VMs ([Issue 536](https://github.com/dotnet/coreclr/issues/536)).
+
+Toolchain Setup
+---------------
+
+Install the following packages for the toolchain:
+
+- bash
+- cmake
+- llvm37 (includes LLVM 3.7, Clang 3.7 and LLDB 3.7)
+- libunwind
+- gettext
+- icu
+
+To install the packages you need:
+
+```sh
+janhenke@freebsd-frankfurt:~ % sudo pkg install bash cmake libunwind gettext llvm37 icu
+```
+
+The command above will install Clang and LLVM 3.7. For information on building CoreCLR with other versions, see section on [Clang/LLVM versions](#note-on-clangllvm-versions).
+
+Debugging CoreCLR (Optional)
+----------------------------
+
+Note: This step is not required to build CoreCLR itself. If you intend on hacking or debugging the CoreCLR source code, you need to follow these steps. You must follow these steps *before* starting the build itself.
+
+In order to debug CoreCLR you will also need to install [LLDB](http://lldb.llvm.org/), the LLVM debugger.
+
+To build with clang 3.7 from coreclr project root:
+
+```sh
+LLDB_LIB_DIR=/usr/local/llvm37/lib LLDB_INCLUDE_DIR=/usr/local/llvm37/include ./build.sh clang3.7 debug
+```
+
+Run tests:
+
+```sh
+./src/pal/tests/palsuite/runpaltests.sh $PWD/bin/obj/FreeBSD.x64.Debug $PWD/bin/paltestout
+```
+
+Git Setup
+---------
+
+This guide assumes that you've cloned the corefx and coreclr repositories into `~/git/corefx` and `~/git/coreclr` on your FreeBSD machine and the corefx and coreclr repositories into `D:\git\corefx` and `D:\git\coreclr` on Windows. If your setup is different, you'll need to pay careful attention to the commands you run. In this guide, I'll always show what directory I'm in on both the FreeBSD and Windows machine.
+
+Build the Runtime
+=================
+
+To build the runtime on FreeBSD, run build.sh from the root of the coreclr repository:
+
+```sh
+janhenke@freebsd-frankfurt:~/git/coreclr % ./build.sh
+```
+
+Note: FreeBSD 10.1-RELEASE system's Clang/LLVM is 3.4, the minimum version to compile CoreCLR runtime is 3.5. See [Note on Clang/LLVM versions](#note-on-clangllvm-versions).
+
+If the build fails with errors about resolving LLVM-components, the default Clang-version assumed (3.5) may not be appropriate for your system.
+Override it using the following syntax. In this example LLVM 3.6 is used:
+
+```sh
+janhenke@freebsd-frankfurt:~/git/coreclr % ./build.sh clang3.6
+```
+
+
+After the build is completed, there should some files placed in `bin/Product/FreeBSD.x64.Debug`. The ones we are interested in are:
+
+* `corerun`: The command line host. This program loads and starts the CoreCLR runtime and passes the managed program you want to run to it.
+* `libcoreclr.so`: The CoreCLR runtime itself.
+* `libcoreclrpal.so`: The platform abstraction library for the CoreCLR runtime. This library is temporary and the functionality will be merged back into `libcoreclr.so`
+
+In order to keep everything tidy, let's create a new directory for the runtime and copy the runtime and corerun into it.
+
+```sh
+janhenke@freebsd-frankfurt:~/git/coreclr % mkdir -p ~/coreclr-demo/runtime
+janhenke@freebsd-frankfurt:~/git/coreclr % cp bin/Product/FreeBSD.x64.Debug/corerun ~/coreclr-demo/runtime
+janhenke@freebsd-frankfurt:~/git/coreclr % cp bin/Product/FreeBSD.x64.Debug/libcoreclr*.so ~/coreclr-demo/runtime
+```
+
+Build the Framework Native Components
+======================================
+
+```sh
+janhenke@freebsd-frankfurt:~/git/corefx$ ./build-native.sh
+janhenke@freebsd-frankfurt:~/git/corefx$ cp bin/FreeBSD.x64.Debug/Native/*.so ~/coreclr-demo/runtime
+```
+
+Build the Framework Managed Components
+======================================
+
+We don't _yet_ have support for building managed code on FreeBSD, so you'll need a Windows machine with clones of both the CoreCLR and CoreFX projects.
+
+You will build `mscorlib.dll` out of the coreclr repository and the rest of the framework that out of the corefx repository. For mscorlib (from a regular command prompt window) run:
+
+```
+D:\git\coreclr> build.cmd freebsdmscorlib
+```
+
+The output is placed in `bin\Product\FreeBSD.x64.Debug\mscorlib.dll`. You'll want to copy this to the runtime folder on your FreeBSD machine. (e.g. `~/coreclr-demo/runtime`)
+
+For the rest of the framework, you need to pass some special parameters to build.cmd when building out of the CoreFX repository.
+
+```
+D:\git\corefx> build-managed.cmd -os=Linux -target-os=Linux -SkipTests
+```
+
+Note: We are using the Linux build currently, as CoreFX does not yet know about FreeBSD.
+
+It's also possible to add `/t:rebuild` to the build.cmd to force it to delete the previously built assemblies.
+
+For the purposes of Hello World, you need to copy over both `bin\Linux.AnyCPU.Debug\System.Console\System.Console.dll` and `bin\Linux.AnyCPU.Debug\System.Diagnostics.Debug\System.Diagnostics.Debug.dll` into the runtime folder on FreeBSD. (e.g `~/coreclr-demo/runtime`).
+
+After you've done these steps, the runtime directory on FreeBSD should look like this:
+
+```
+janhenke@freebsd-frankfurt:~/git/coreclr % ls ~/coreclr-demo/runtime/
+System.Console.dll System.Diagnostics.Debug.dll corerun libcoreclr.so libcoreclrpal.so mscorlib.dll
+```
+
+Download Dependencies
+=====================
+
+The rest of the assemblies you need to run are presently just facades that point to mscorlib. We can pull these dependencies down via NuGet (which currently requires Mono).
+
+Create a folder for the packages:
+
+```sh
+janhenke@freebsd-frankfurt:~/git/coreclr % mkdir ~/coreclr-demo/packages
+janhenke@freebsd-frankfurt:~/git/coreclr % cd ~/coreclr-demo/packages
+```
+
+Install Mono
+------------
+
+If you don't already have Mono installed on your system, use the pkg tool again:
+
+```sh
+janhenke@freebsd-frankfurt:~/coreclr-demo/packages % sudo pkg install mono
+```
+
+Download the NuGet Client
+-------------------------
+
+Grab NuGet (if you don't have it already)
+
+```sh
+janhenke@freebsd-frankfurt:~/coreclr-demo/packages % curl -L -O https://nuget.org/nuget.exe
+```
+Download NuGet Packages
+-----------------------
+
+With Mono and NuGet in hand, you can use NuGet to get the required dependencies.
+
+Make a `packages.config` file with the following text. These are the required dependencies of this particular app. Different apps will have different dependencies and require a different `packages.config` - see [Issue #480](https://github.com/dotnet/coreclr/issues/480).
+
+```xml
+<?xml version="1.0" encoding="utf-8"?>
+<packages>
+ <package id="System.Console" version="4.0.0-beta-22703" />
+ <package id="System.Diagnostics.Contracts" version="4.0.0-beta-22703" />
+ <package id="System.Diagnostics.Debug" version="4.0.10-beta-22703" />
+ <package id="System.Diagnostics.Tools" version="4.0.0-beta-22703" />
+ <package id="System.Globalization" version="4.0.10-beta-22703" />
+ <package id="System.IO" version="4.0.10-beta-22703" />
+ <package id="System.IO.FileSystem.Primitives" version="4.0.0-beta-22703" />
+ <package id="System.Reflection" version="4.0.10-beta-22703" />
+ <package id="System.Resources.ResourceManager" version="4.0.0-beta-22703" />
+ <package id="System.Runtime" version="4.0.20-beta-22703" />
+ <package id="System.Runtime.Extensions" version="4.0.10-beta-22703" />
+ <package id="System.Runtime.Handles" version="4.0.0-beta-22703" />
+ <package id="System.Runtime.InteropServices" version="4.0.20-beta-22703" />
+ <package id="System.Text.Encoding" version="4.0.10-beta-22703" />
+ <package id="System.Text.Encoding.Extensions" version="4.0.10-beta-22703" />
+ <package id="System.Threading" version="4.0.10-beta-22703" />
+ <package id="System.Threading.Tasks" version="4.0.10-beta-22703" />
+</packages>
+
+```
+
+And restore your packages.config file:
+
+```sh
+janhenke@freebsd-frankfurt:~/coreclr-demo/packages % mono nuget.exe restore -Source https://www.myget.org/F/dotnet-corefx/ -PackagesDirectory .
+```
+
+NOTE: This assumes you already installed the default CA certs. If you have problems downloading the packages please see [Issue #602](https://github.com/dotnet/coreclr/issues/602#issuecomment-88203778). The command for FreeBSD is:
+
+```sh
+janhenke@freebsd-frankfurt:~/coreclr-demo/packages % mozroots --import --sync
+```
+
+Finally, you need to copy over the assemblies to the runtime folder. You don't want to copy over System.Console.dll or System.Diagnostics.Debug however, since the version from NuGet is the Windows version. The easiest way to do this is with a little find magic:
+
+```sh
+janhenke@freebsd-frankfurt:~/coreclr-demo/packages % find . -wholename '*/aspnetcore50/*.dll' -exec cp -n {} ~/coreclr-demo/runtime \;
+```
+
+Compile an App
+==============
+
+Now you need a Hello World application to run. You can write your own, if you'd like. Personally, I'm partial to the one on corefxlab which will draw Tux for us.
+
+```sh
+janhenke@freebsd-frankfurt:~/coreclr-demo/packages % cd ~/coreclr-demo/runtime
+janhenke@freebsd-frankfurt:~/coreclr-demo/runtime % curl -O https://raw.githubusercontent.com/dotnet/corefxlab/master/demos/CoreClrConsoleApplications/HelloWorld/HelloWorld.cs
+```
+
+Then you just need to build it, with `mcs`, the Mono C# compiler. FYI: The Roslyn C# compiler will soon be available on FreeBSD. Because you need to compile the app against the .NET Core surface area, you need to pass references to the contract assemblies you restored using NuGet:
+
+```sh
+janhenke@freebsd-frankfurt:~/coreclr-demo/runtime % mcs /nostdlib /noconfig /r:../packages/System.Console.4.0.0-beta-22703/lib/contract/System.Console.dll /r:../packages/System.Runtime.4.0.20-beta-22703/lib/contract/System.Runtime.dll HelloWorld.cs
+```
+
+Run your App
+============
+
+You're ready to run Hello World! To do that, run corerun, passing the path to the managed exe, plus any arguments. The HelloWorld from corefxlab will print a daemon if you pass "freebsd" as an argument, so:
+
+```sh
+janhenke@freebsd-frankfurt:~/coreclr-demo/runtime % ./corerun HelloWorld.exe freebsd
+```
+
+If all works, you should be greeted by a friendly daemon you know well.
+
+Over time, this process will get easier. We will remove the dependency on having to compile managed code on Windows. For example, we are working to get our NuGet packages to include both the Windows and FreeBSD versions of an assembly, so you can simply nuget restore the dependencies.
+
+Pull Requests to enable building CoreFX and mscorlib on FreeBSD via Mono would be very welcome. A sample that builds Hello World on FreeBSD using the correct references but via XBuild or MonoDevelop would also be great! Some of our processes (e.g. the mscorlib build) rely on Windows specific tools, but we want to figure out how to solve these problems for FreeBSD as well. There's still a lot of work ahead, so if you're interested in helping, we're ready for you!
+
+
+Run the test suite
+==================
+
+If you've made changes to the CoreCLR PAL code, you might want to run the PAL tests directly to validate your changes.
+This can be done after a clean build, without any other dependencies.
+
+From the coreclr project directory:
+
+```sh
+janhenke@freebsd-frankfurt:~/coreclr % ./src/pal/tests/palsuite/runpaltests.sh ~/coreclr/bin/obj/FreeBSD.x64.Debug ~/coreclr/bin/paltestout
+```
+
+This should run all the tests associated with the PAL.
+
+Note on Clang/LLVM versions
+===========================
+
+The minimum version to build CoreCLR is Clang 3.5 or above.
+
+FreeBSD 10.X releases ship with Clang 3.4.
+
+If you intend on building CoreCLR with LLDB debug support, pick llvm37 or llvm-devel.
+
+To install clang 3.5: `sudo pkg install clang35`
+
+To install clang 3.6: `sudo pkg install clang36`
+
+To install clang 3.7: `sudo pkg install llvm37`
+
+To install clang development snapshot: `sudo pkg install llvm-devel`
+
+clang35 and clang36 download llvm35 and llvm36 packages as a dependency.
+
+llvm37 and llvm-devel include clang and lldb. Since clang is included with llvm 3.7 and onward, there is no clang37 package.
+
+After you have installed your desired version of LLVM you will need to specify the version to the build.sh script.
+
+For example if you chose to install llvm37 you would add the clangX.X to your build command as follows.
+```sh
+janhenke@freebsd-frankfurt:~/git/coreclr % ./build.sh clang3.7
+```
diff --git a/Documentation/building/linux-instructions.md b/Documentation/building/linux-instructions.md
new file mode 100644
index 0000000..c948ecd
--- /dev/null
+++ b/Documentation/building/linux-instructions.md
@@ -0,0 +1,220 @@
+Build CoreCLR on Linux
+======================
+
+This guide will walk you through building CoreCLR on Linux. We'll start by showing how to set up your environment from scratch.
+
+Environment
+===========
+
+These instructions are written assuming the Ubuntu 14.04 LTS, since that's the distro the team uses. Pull Requests are welcome to address other environments as long as they don't break the ability to use Ubuntu 14.04 LTS.
+
+There have been reports of issues when using other distros or versions of Ubuntu (e.g. [Issue 95](https://github.com/dotnet/coreclr/issues/95)). If you're on another distribution, consider using docker's `ubuntu:14.04` image.
+
+Minimum RAM required to build is 1GB. The build is known to fail on 512 MB VMs ([Issue 536](https://github.com/dotnet/coreclr/issues/536)).
+
+Toolchain Setup
+---------------
+
+Install the following packages for the toolchain:
+
+- cmake
+- llvm-3.5
+- clang-3.5
+- lldb-3.6
+- lldb-3.6-dev
+- libunwind8
+- libunwind8-dev
+- gettext
+- libicu-dev
+- liblttng-ust-dev
+- libcurl4-openssl-dev
+- libssl-dev
+- uuid-dev
+
+In order to get lldb-3.6 on Ubuntu 14.04, we need to add an additional package source:
+
+```
+ellismg@linux:~$ echo "deb http://llvm.org/apt/trusty/ llvm-toolchain-trusty-3.6 main" | sudo tee /etc/apt/sources.list.d/llvm.list
+ellismg@linux:~$ wget -O - http://llvm.org/apt/llvm-snapshot.gpg.key | sudo apt-key add -
+ellismg@linux:~$ sudo apt-get update
+```
+
+Then install the packages you need:
+
+```
+ellismg@linux:~$ sudo apt-get install cmake llvm-3.5 clang-3.5 lldb-3.6 lldb-3.6-dev libunwind8 libunwind8-dev gettext libicu-dev liblttng-ust-dev libcurl4-openssl-dev libssl-dev uuid-dev
+```
+
+You now have all the required components.
+
+Git Setup
+---------
+
+This guide assumes that you've cloned the corefx and coreclr repositories into `~/git/corefx` and `~/git/coreclr` on your Linux machine. If your setup is different, you'll need to pay careful attention to the commands you run. In this guide, I'll always show what directory I'm in.
+
+Set the maximum number of file-handles
+--------------------------------------
+
+To ensure that your system can allocate enough file-handles for the corefx build run `sysctl fs.file-max`. If it is less than 100000, add `fs.file-max = 100000` to `/etc/sysctl.conf`, and then run `sudo sysctl -p`.
+
+Build the Runtime and Microsoft Core Library
+=============================================
+
+To build the runtime on Linux, run build.sh from the root of the coreclr repository:
+
+```
+ellismg@linux:~/git/coreclr$ ./build.sh
+```
+
+After the build is completed, there should some files placed in `bin/Product/Linux.x64.Debug`. The ones we are interested in are:
+
+* `corerun`: The command line host. This program loads and starts the CoreCLR runtime and passes the managed program you want to run to it.
+* `libcoreclr.so`: The CoreCLR runtime itself.
+* `mscorlib.dll`: Microsoft Core Library.
+
+Build the Framework
+===================
+
+```
+ellismg@linux:~/git/corefx$ ./build.sh
+```
+
+After the build is complete you will be able to find the output in the `bin` folder.
+
+Build for ARM/Linux
+===================
+
+Libunwind-arm requires fixes that are not included in Ubuntu 14.04, yet. The fix allows libunwind-arm not to break when it is ordered to access unaccessible memory locations.
+
+First, import the patch from the libunwind upstream: http://git.savannah.gnu.org/gitweb/?p=libunwind.git;a=commit;h=770152268807e460184b4152e23aba9c86601090
+
+Then, expand the coverage of the upstream patch by:
+
+```
+diff --git a/src/arm/Ginit.c b/src/arm/Ginit.c
+index 1ed3dbf..c643032 100644
+--- a/src/arm/Ginit.c
++++ b/src/arm/Ginit.c
+@@ -128,6 +128,11 @@ access_mem (unw_addr_space_t as, unw_word_t addr, unw_word_t *val, int write,
+ {
+ if (write)
+ {
++ /* validate address */
++ const struct cursor *c = (const struct cursor *) arg;
++ if (c && validate_mem(addr))
++ return -1;
++
+ Debug (16, "mem[%x] <- %x\n", addr, *val);
+ *(unw_word_t *) addr = *val;
+ }
+```
+
+How to enable -O3 optimization level for ARM/Linux
+==================================================
+
+Currently, we can build coreclr with -O1 flag of clang in release build mode for Linux/ARM without any bugfix of llvm-3.6. This instruction is to enable -O3 optimization level of clang on Linux/ARM by fixing the bug of llvm.
+
+First, download latest version from the clang-3.6/llvm-3.6 upstream:
+```
+lgs@ubuntu cd /work/dotnet/
+lgs@ubuntu wget http://llvm.org/releases/3.6.2/llvm-3.6.2.src.tar.xz
+lgs@ubuntu tar xJf llvm-3.6.2.src.tar.xz
+lgs@ubuntu cd ./llvm-3.6.2.src/tools/
+lgs@ubuntu wget http://llvm.org/releases/3.6.2/cfe-3.6.2.src.tar.xz
+lgs@ubuntu tar xJf cfe-3.6.2.src.tar.xz
+lgs@ubuntu mv cfe-3.6.2.src clang
+```
+
+Second, expand the coverage of the upstream patch by:
+https://bugs.launchpad.net/ubuntu/+source/llvm-defaults/+bug/1584089
+
+Third, build clang-3.6/llvm-3.6 source as following:
+```
+lgs@ubuntu cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="all" -DCMAKE_INSTALL_PREFIX=~/llvm-3.6.2 \
+-DLLVM_BUILD_LLVM_DYLIB=1 -DLLDB_DISABLE_LIBEDIT=1 -DLLDB_DISABLE_CURSES=1 -DLLDB_DISABLE_PYTHON=1 \
+-DLLVM_ENABLE_DOXYGEN=0 -DLLVM_ENABLE_TERMINFO=0 -DLLVM_INCLUDE_EXAMPLES=0 -DLLVM_BUILD_RUNTIME=0 \
+-DLLVM_INCLUDE_TESTS=0 -DPYTHON_INCLUDE_DIR=/usr/include/python2.7 /work/dotnet/llvm-3.6.2.src
+lgs@ubuntu
+lgs@ubuntu sudo ln -sf /usr/bin/ld /usr/bin/ld.gold
+lgs@ubuntu time make -j8
+lgs@ubuntu time make -j8 install
+lgs@ubuntu
+lgs@ubuntu sudo apt-get remove clang-3.6 llvm-3.6
+lgs@ubuntu vi ~/.bashrc (or /etc/profile)
+# Setting new clang/llvm version
+export PATH=$HOME/llvm-3.6.2/bin/:$PATH
+export LD_LIBRARY_PATH=$HOME/llvm-3.6.2/lib:$LD_LIBRARY_PATH
+```
+
+For Ubuntu 14.04 X64 users, they can easily install the fixed clang/llvm3.6 package with "apt-get" command from the "ppa:leemgs/dotnet" Ubuntu repository, without the need to execute the above 1st, 2nd, and 3rd step.
+```
+lgs@ubuntu sudo add-apt-repository ppa:leemgs/dotnet
+lgs@ubuntu sudo apt-get update
+lgs@ubuntu sudo apt-get install clang-3.6 llvm-3.6 lldb-3.6
+```
+
+Finally, let's build coreclr with updated clang/llvm. If you meet a lldb related error message at build-time, try to build coreclr with "skipgenerateversion" option.
+```
+lgs@ubuntu time ROOTFS_DIR=/work/dotnet/rootfs-coreclr/arm ./build.sh arm release clean cross
+```
+
+Additional optimization levels for ARM/Linux: -Oz and -Ofast
+============================================================
+
+This instruction is to enable additional optimization levels such as -Oz and -Ofast on ARM/Linux. The below table shows what we have to enable for the code optimization of the CoreCLR run-time either the size or speed on embedded devices.
+
+| **Content** | **Build Mode** | **Clang/LLVM (Linux)** |
+| --- | --- | --- |
+| -O0 | Debug | Disable optimization to generate the most debuggable code |
+| -O1 | - | Optimize for code size and execution time |
+| -O2 | Checked | Optimize more for code size and execution time |
+| -O3 | Release | Optimize more for code size and execution time to make program run faster |
+| -Oz | - | Optimize more to reduce code size further |
+| -Ofast | - | Enable all the optimizations from O3 along with other aggressive optimizations |
+
+If you want to focus on the size reduction for low-end devices, you have to modify clang-compile-override.txt to enable -Oz flag in the release build as following:
+
+```
+--- a/src/pal/tools/clang-compiler-override.txt
++++ b/src/pal/tools/clang-compiler-override.txt
+@@ -3,13 +3,13 @@ SET (CMAKE_C_FLAGS_DEBUG_INIT "-g -O0")
+ SET (CLR_C_FLAGS_CHECKED_INIT "-g -O2")
+ # Refer to the below instruction to support __thread with -O2/-O3 on Linux/ARM
+ # https://github.com/dotnet/coreclr/blob/master/Documentation/building/linux-instructions.md
+-SET (CMAKE_C_FLAGS_RELEASE_INIT "-g -O3")
++SET (CMAKE_C_FLAGS_RELEASE_INIT "-g -Oz")
+ SET (CMAKE_C_FLAGS_RELWITHDEBINFO_INIT "-g -O2")
+
+ SET (CMAKE_CXX_FLAGS_INIT "-Wall -Wno-null-conversion -std=c++11")
+ SET (CMAKE_CXX_FLAGS_DEBUG_INIT "-g -O0")
+ SET (CLR_CXX_FLAGS_CHECKED_INIT "-g -O2")
+-SET (CMAKE_CXX_FLAGS_RELEASE_INIT "-g -O3")
++SET (CMAKE_CXX_FLAGS_RELEASE_INIT "-g -Oz")
+ SET (CMAKE_CXX_FLAGS_RELWITHDEBINFO_INIT "-g -O2")
+
+ SET (CLR_DEFINES_DEBUG_INIT DEBUG _DEBUG _DBG URTBLDENV_FRIENDLY=Checked BUILDENV_
+```
+
+
+If you want to focus on the speed optimization for high-end devices, you have to modify clang-compile-override.txt to enable -Ofast flag in the release build as following:
+```
+--- a/src/pal/tools/clang-compiler-override.txt
++++ b/src/pal/tools/clang-compiler-override.txt
+@@ -3,13 +3,13 @@ SET (CMAKE_C_FLAGS_DEBUG_INIT "-g -O0")
+ SET (CLR_C_FLAGS_CHECKED_INIT "-g -O2")
+ # Refer to the below instruction to support __thread with -O2/-O3 on Linux/ARM
+ # https://github.com/dotnet/coreclr/blob/master/Documentation/building/linux-instructions.md
+-SET (CMAKE_C_FLAGS_RELEASE_INIT "-g -O3")
++SET (CMAKE_C_FLAGS_RELEASE_INIT "-g -Ofast")
+ SET (CMAKE_C_FLAGS_RELWITHDEBINFO_INIT "-g -O2")
+
+ SET (CMAKE_CXX_FLAGS_INIT "-Wall -Wno-null-conversion -std=c++11")
+ SET (CMAKE_CXX_FLAGS_DEBUG_INIT "-g -O0")
+ SET (CLR_CXX_FLAGS_CHECKED_INIT "-g -O2")
+-SET (CMAKE_CXX_FLAGS_RELEASE_INIT "-g -O3")
++SET (CMAKE_CXX_FLAGS_RELEASE_INIT "-g -Ofast")
+ SET (CMAKE_CXX_FLAGS_RELWITHDEBINFO_INIT "-g -O2")
+
+ SET (CLR_DEFINES_DEBUG_INIT DEBUG _DEBUG _DBG URTBLDENV_FRIENDLY=Checked BUILDENV_
+```
+
diff --git a/Documentation/building/netbsd-instructions.md b/Documentation/building/netbsd-instructions.md
new file mode 100644
index 0000000..f678286
--- /dev/null
+++ b/Documentation/building/netbsd-instructions.md
@@ -0,0 +1,129 @@
+Build CoreCLR on NetBSD
+=======================
+
+This guide will walk you through building CoreCLR on NetBSD. We'll start by showing how to set up your environment from scratch.
+
+Environment
+===========
+
+These instructions are written on NetBSD 7.x on the amd64 platform, since that's the release the team uses.
+
+Older releases aren't supported because building CoreCLR requires the modern LLVM stack (Clang, libunwind, and LLDB) that is developed against the NetBSD-7.x branch.
+
+Pull Requests are welcome to address other ports (like i386 or evbarm) as long as they don't break the ability to use NetBSD/amd64.
+
+Minimum RAM required to build is 1GB.
+
+The pkgsrc framework is required to build .NET projects on NetBSD. Minimal pkgsrc version required is 2016Q1.
+
+pkgsrc setup
+------------
+
+Fetch pkgsrc and install to the system. By default it's done in the /usr directory as root:
+
+```
+ftp -o- ftp://ftp.netbsd.org/pub/pkgsrc/stable/pkgsrc.tar.gz | tar -zxpf- -C /usr
+```
+
+The .NET projects are tracked in pkgsrc-wip.
+
+In order to use pkgsrc-wip, git must be installed:
+
+
+```
+cd /usr/pkgsrc/devel/git-base && make install
+```
+
+To access resources over SSL link, mozilla-rootcerts must be installed:
+
+```
+cd /usr/pkgsrc/security/mozilla-rootcerts && make install
+```
+
+And follow the MESSAGE commands to finish the installation.
+
+
+Installing pkgsrc-wip
+---------------------
+
+Type the following command to fetch the pkgsrc-wip sources:
+
+
+```
+cd /usr/pkgsrc
+git clone --depth 1 git://wip.pkgsrc.org/pkgsrc-wip.git wip
+```
+
+Then install the CoreCLR package you need:
+
+```
+cd /usr/pkgsrc/wip/coreclr-git
+make install
+```
+
+CoreCLR is installed in `/usr/pkg/CoreCLR` subdirectory by default.
+
+
+PAL tests
+=========
+
+To run PAL tests on NetBSD, use the `make test` in the coreclr-git package from pkgsrc-wip:
+
+```
+cd /usr/pkgsrc/wip/coreclr-git
+make test
+```
+
+Build CoreFX
+============
+
+The CoreFX package is located in pkgsrc-wip as corefx-git. In order to build it you need to perform the following command:
+
+```
+cd /usr/pkgsrc/wip/corefx-git
+make
+```
+
+At the moment there is no install or test target in the pkgsrc framework.
+
+CoreFX tests
+============
+
+The steps to run CoreFX managed code tests:
+
+Build CoreCLR (with pkgsrc-wip/coreclr-git) on NetBSD x64, Debug and install the Product dir to /usr/pkg/CoreCLR:
+
+```
+cd /usr/pkgsrc/wip/coreclr-git && make install
+```
+
+Build CoreFX native x64 Debug and the work (build) dir is in /usr/pkgsrc/wip/corefx-git/work/corefx:
+
+```
+cd /usr/pkgsrc/wip/corefx-git && make
+```
+
+Build CoreCLR Debug x64 on Linux and copy mscorlib.dll from ./bin/Product/Linux.x64.Debug/mscorlib.dll to NetBSD machine under /usr/pkg/CoreCLR:
+
+```
+./build.sh mscorlib Debug
+```
+
+Build CoreFX Debug x64 on Linux and copy bin/ to NetBSD machine under /public/bin:
+
+```
+./build-native.sh -os=NetBSD
+./build-managed.sh NetBSD -SkipTests
+```
+
+Run ./run-test.sh:
+
+```
+$ pwd
+/usr/pkgsrc/wip/corefx-git/work/corefx
+$ ./run-test.sh \
+--coreclr-bins /usr/pkg/CoreCLR/ \
+--mscorlib-bins /usr/pkg/CoreCLR/ \
+--corefx-tests /public/bin/tests/NetBSD.AnyCPU.Debug/ \
+--corefx-native-bins ./bin/NetBSD.x64.Debug/Native/
+```
diff --git a/Documentation/building/osx-instructions.md b/Documentation/building/osx-instructions.md
new file mode 100644
index 0000000..6d3469b
--- /dev/null
+++ b/Documentation/building/osx-instructions.md
@@ -0,0 +1,80 @@
+Build CoreCLR on OS X
+=====================
+
+This guide will walk you through building CoreCLR on OS X. We'll start by showing how to set up your environment from scratch.
+
+Environment
+===========
+
+These instructions were validated on OS X Yosemite, although they probably work on earlier versions. Pull Requests are welcome to address other environments.
+
+If your machine has Command Line Tools for XCode 6.3 installed, you'll need to update them to the 6.3.1 version or higher in order to successfully build. There was an issue with the headers that shipped with version 6.3 that was subsequently fixed in 6.3.1.
+
+Git Setup
+---------
+
+Clone the CoreCLR and CoreFX repositories (either upstream or a fork).
+
+```sh
+dotnet-mbp:git richlander$ git clone https://github.com/dotnet/coreclr
+# Cloning into 'coreclr'...
+
+dotnet-mbp:git richlander$ git clone https://github.com/dotnet/corefx
+# Cloning into 'corefx'...
+```
+
+This guide assumes that you've cloned the coreclr and corefx repositories into `~/git/coreclr` and `~/git/corefx` on your OS X machine. If your setup is different, you'll need to pay careful attention to the commands you run. In this guide, I'll always show what directory I'm in.
+
+CMake
+-----
+
+CoreCLR has a dependency on CMake for the build. You can download it from [CMake downloads](http://www.cmake.org/download/).
+
+Alternatively, you can install CMake from [Homebrew](http://brew.sh/).
+
+```sh
+dotnet-mbp:~ richlander$ brew install cmake
+```
+
+ICU
+---
+ICU (International Components for Unicode) is also required to build and run. It can be obtained via [Homebrew](http://brew.sh/).
+
+```sh
+brew install icu4c
+brew link --force icu4c
+```
+
+OpenSSL
+-------
+The CoreFX cryptography libraries are built on OpenSSL. The version of OpenSSL included on OS X (0.9.8) has gone out of support, and a newer version is required. A supported version can be obtained via [Homebrew](http://brew.sh).
+
+```sh
+brew install openssl
+brew link --force openssl
+```
+
+Build the Runtime and Microsoft Core Library
+============================================
+
+To Build CoreCLR, run build.sh from the root of the coreclr repo.
+
+```sh
+dotnet-mbp:~ richlander$ cd ~/git/coreclr
+dotnet-mbp:coreclr richlander$ ./build.sh
+```
+
+After the build is completed, there should some files placed in `bin/Product/OSX.x64.Debug`. The ones we are interested in are:
+
+- `corerun`: The command line host. This program loads and starts the CoreCLR runtime and passes the managed program you want to run to it.
+- `libcoreclr.dylib`: The CoreCLR runtime itself.
+- `mscorlib.dll`: Microsoft Core Library.
+
+Build the Framework
+===================
+
+```sh
+dotnet-mbp:corefx richlander$ ./build.sh
+```
+
+After the build is complete you will be able to find the output in the `bin` folder.
diff --git a/Documentation/building/test-configuration.md b/Documentation/building/test-configuration.md
new file mode 100644
index 0000000..931a540
--- /dev/null
+++ b/Documentation/building/test-configuration.md
@@ -0,0 +1,41 @@
+## General Test Infrastructure Notes ##
+
+### Kinds of Build Properties ###
+* Build Only
+> `<CLRTestKind>BuildOnly</CLRTestKind>`
+
+ * Builds an executable.
+ * Will not execute it.
+
+* Run Only
+> `<CLRTestKind>RunOnly</CLRTestKind>`
+
+ * Can use Ouput of Build and Run Project with different command line arguments.
+* Build and Run
+> `<CLRTestKind>BuildAndRun</CLRTestKind>`
+
+ * Builds an executable.
+ * Will execute said executable.
+* Shared Libraries
+> `<CLRTestKind>SharedLibrary</CLRTestKind>`
+
+ * For building libraries common to zero or more tests.
+
+
+By default (i.e. if not specified explicitly) a project file is BuildAndRun.
+
+### Priority ###
+Testcases are categorized by their priority levels. The most important subset should be and is the smallest subset. This subset is called priority 0.
+ * By default, a testcase is priority 0. You must elect to de-prioritize a test.
+ * To de-prioritize a test, add a property _CLRTestPriority_ to the test's project file.
+> `<CLRTestPriority>2</CLRTestPriority>`
+ * Lower priority values are always run in conjunction when running higher priority value tests. I.e. if a developer elects to do a priority 2 test run, then all priority 0, 1 and 2 tests are run.
+
+### Adding Tests ###
+#### Converting an existing C# project ####
+ * Remove AssemblyName
+ * Swap in dir.props
+ * Swap in dir.targets
+ * Assign a CLRTestKind
+ * (optional) Assign a priority value
+
diff --git a/Documentation/building/testing-with-corefx.md b/Documentation/building/testing-with-corefx.md
new file mode 100644
index 0000000..6609a0a
--- /dev/null
+++ b/Documentation/building/testing-with-corefx.md
@@ -0,0 +1,16 @@
+Testing with CoreFX
+===================
+
+It may be valuable to use CoreFX tests to validate your changes to CoreCLR or mscorlib.
+
+**Windows**
+
+As part of building tests, CoreFX restores a copy of the runtime from myget, in order to update the runtime that is deployed, a special build property `BUILDTOOLS_OVERRIDE_RUNTIME` can be used. If this is set, the CoreFX testing targets will copy all the files in the folder it points to into the test folder, overwriting any files that exist.
+
+To run tests, follow the procedure for [running tests in CoreFX](https://github.com/dotnet/corefx/blob/master/Documentation/building/windows-instructions.md). You can pass `/p:BUILDTOOLS_OVERRIDE_RUNTIME=<path-to-coreclr>\bin\Product\Windows_NT.x64.Release` to build.cmd to set this property.
+
+**FreeBSD, Linux, NetBSD, OS X**
+
+Refer to the procedure for [running tests in CoreFX](https://github.com/dotnet/corefx/blob/master/Documentation/building/cross-platform-testing.md)
+- Note the --coreclr-bins and --mscorlib-bins arguments to [run-test.sh](https://github.com/dotnet/corefx/blob/master/run-test.sh)
+- Pass in paths to your private build of CoreCLR
diff --git a/Documentation/building/unix-test-instructions.md b/Documentation/building/unix-test-instructions.md
new file mode 100644
index 0000000..9cf7507
--- /dev/null
+++ b/Documentation/building/unix-test-instructions.md
@@ -0,0 +1,81 @@
+Building and running tests on Linux, OS X, and FreeBSD
+======================================================
+
+CoreCLR tests
+-------------
+
+**Building**
+
+Build CoreCLR and CoreFX. Refer to building instructions in the respective repository.
+
+To build only the tests, on the Windows machine:
+
+> `C:\coreclr>build-test.cmd -rebuild`
+
+**Running tests**
+
+The following instructions assume that on the Unix machine:
+- The CoreCLR repo is cloned at `~/coreclr`
+- The CoreFX repo is cloned at `~/corefx`
+- The Windows clone of the CoreCLR repo is mounted at `/media/coreclr`
+- The Windows clone of the CoreFX repo is mounted at `/media/corefx`
+
+Tests currently need to be built on Windows and copied over to the Unix machine for testing. Copy the test build over to the Unix machine:
+
+> `cp --recursive /media/coreclr/bin/tests/Windows_NT.x64.Debug ~/test/`
+
+See runtest.sh usage information:
+
+> `~/coreclr$ tests/runtest.sh --help`
+
+Run tests (`Debug` may be replaced with `Release` or `Checked`, depending on which Configuration you've built):
+
+> ```bash
+> ~/coreclr$ tests/runtest.sh
+> --testRootDir=~/test/Windows_NT.x64.Debug
+> --testNativeBinDir=~/coreclr/bin/obj/Linux.x64.Debug/tests
+> --coreClrBinDir=~/coreclr/bin/Product/Linux.x64.Debug
+> --mscorlibDir=/media/coreclr/bin/Product/Linux.x64.Debug
+> --coreFxBinDir="~/corefx/bin/Linux.AnyCPU.Debug;~/corefx/bin/Unix.AnyCPU.Debug;~/corefx/bin/AnyOS.AnyCPU.Debug"
+> --coreFxNativeBinDir=~/corefx/bin/Linux.x64.Debug
+> ```
+
+The method above will copy dependencies from the set of directories provided to create an 'overlay' directory.
+If you already have an overlay directory prepared with the dependencies you need, you can specify `--coreOverlayDir`
+instead of `--coreClrBinDir`, `--mscorlibDir`, `--coreFxBinDir`, and `--coreFxNativeBinDir`. It would look something like:
+
+
+> ```bash
+> ~/coreclr$ tests/runtest.sh
+> --testRootDir=~/test/Windows_NT.x64.Debug
+> --testNativeBinDir=~/coreclr/bin/obj/Linux.x64.Debug/tests
+> --coreOverlayDir=/path/to/directory/containing/overlay
+> ```
+
+
+Test results will go into:
+
+> `~/test/Windows_NT.x64.Debug/coreclrtests.xml`
+
+**Unsupported and temporarily disabled tests**
+
+These tests are skipped by default:
+- Tests that are not supported outside Windows, are listed in:
+>> `~/coreclr/tests/testsUnsupportedOutsideWindows.txt`
+- Tests that are temporarily disabled outside Windows due to unexpected failures (pending investigation), are listed in:
+>> `~/coreclr/tests/testsFailingOutsideWindows.txt`
+
+To run only the set of temporarily disabled tests, pass in the `--runFailingTestsOnly` argument to `runtest.sh`.
+
+PAL tests
+---------
+
+Build CoreCLR on the Unix machine.
+
+Run tests:
+
+> `~/coreclr$ src/pal/tests/palsuite/runpaltests.sh ~/coreclr/bin/obj/Linux.x64.Debug`
+
+Test results will go into:
+
+> `/tmp/PalTestOutput/default/pal_tests.xml`
diff --git a/Documentation/building/viewing-jit-dumps.md b/Documentation/building/viewing-jit-dumps.md
new file mode 100644
index 0000000..5303b47
--- /dev/null
+++ b/Documentation/building/viewing-jit-dumps.md
@@ -0,0 +1,173 @@
+# Viewing JIT Dumps
+
+This document is intended for people interested in seeing the disassembly, GC info, or other details the JIT generates for a managed program.
+
+To make sense of the results, it is recommended you also read the [Reading a JitDump](../botr/ryujit-overview.md#reading-a-jitdump) section of the RyuJIT Overview.
+
+## Setting up our environment
+
+The first thing we want to do is setup the .NET Core app we want to dump. Here are the steps to do this, if you don't have one ready:
+
+* Perform a debug build of the CoreCLR repo
+* Install the [.NET CLI](http://microsoft.com/net/core), which we'll use to compile/publish our app
+* `cd` to where you want your app to be placed, and run `dotnet new`
+* Modify your `project.json` file so that it contains a RID (runtime ID) corresponding to the OS you're using in the `runtimes` section. For example, I have a Windows 10 x64 machine, so here's my project file:
+
+```json
+{
+ "buildOptions": {
+ "emitEntryPoint": true
+ },
+ "dependencies": {
+ "Microsoft.NETCore.App": "1.0.0-*"
+ },
+ "frameworks": {
+ "netcoreapp1.0": {
+ "imports": [
+ "dnxcore50",
+ "portable-net45+win8"
+ ]
+ }
+ },
+ "runtimes": {
+ "win10-x64": {}
+ }
+}
+```
+
+You can find a list of RIDs and their corresponding OSes [here](http://dotnet.github.io/docs/core-concepts/rid-catalog.html).
+
+* Edit `Program.cs`, and call the method(s) you want to dump in there. Make sure they are, directly or indirectly, called from `Main`. In this example, we'll be looking at the disassembly of our custom function `InefficientJoin`:
+
+```cs
+using System;
+using System.Collections.Generic;
+using System.Runtime.CompilerServices;
+
+namespace ConsoleApplication
+{
+ public class Program
+ {
+ public static void Main(string[] args)
+ {
+ Console.WriteLine(InefficientJoin(args));
+ }
+
+ // Add NoInlining to prevent this from getting
+ // mixed up with the rest of the code in Main
+ [MethodImpl(MethodImplOptions.NoInlining)]
+ private static string InefficientJoin(IEnumerable<string> args)
+ {
+ var result = string.Empty;
+ foreach (var arg in args) result += (arg + ' ');
+ return result.Substring(0, result.Length - 1);
+ }
+ }
+}
+```
+
+* After you've finished editing the code, run `dotnet publish -c Release`. This should drop all of the binaries needed to run your app in `bin/Release/<configuration>/<rid>/publish`.
+* Overwrite the CLR dlls with the ones you've built locally. If you're a fan of the command line, here are some shell commands for doing this:
+
+```shell
+# Windows
+robocopy /e <coreclr path>\bin\Product\Windows_NT.<arch>.Debug <app root>\bin\Release\netcoreapp1.0\<rid>\publish > NUL
+
+# Unix
+cp -rT <coreclr path>/bin/Product/<OS>.<arch>.Debug <app root>/bin/Release/netcoreapp1.0/<rid>/publish
+```
+
+* Set the configuration knobs you need (see below) and run your published app. The info you want should be dumped to stdout.
+
+Here's some sample output on my machine showing the disassembly for `InefficientJoin`:
+
+```asm
+G_M2530_IG01:
+ 55 push rbp
+ 4883EC40 sub rsp, 64
+ 488D6C2440 lea rbp, [rsp+40H]
+ 33C0 xor rax, rax
+ 488945F8 mov qword ptr [rbp-08H], rax
+ 488965E0 mov qword ptr [rbp-20H], rsp
+
+G_M2530_IG02:
+ 49BB60306927E5010000 mov r11, 0x1E527693060
+ 4D8B1B mov r11, gword ptr [r11]
+ 4C895DF8 mov gword ptr [rbp-08H], r11
+ 49BB200058F7FD7F0000 mov r11, 0x7FFDF7580020
+ 3909 cmp dword ptr [rcx], ecx
+ 41FF13 call gword ptr [r11]System.Collections.Generic.IEnumerable`1[__Canon][System.__Canon]:GetEnumerator():ref:this
+ 488945F0 mov gword ptr [rbp-10H], rax
+
+; ...
+```
+
+## Setting configuration variables
+
+The behavior of the JIT can be controlled via a number of configuration variables. These are declared in [inc/clrconfigvalues.h](https://github.com/dotnet/coreclr/blob/master/src/inc/clrconfigvalues.h). When used as an environment variable, the string name generally has “COMPlus_” prepended. When used as a registry value name, the configuration name is used directly.
+
+These can be set in one of three ways:
+
+* Setting the environment variable `COMPlus_<flagname>`. For example, the following will set the `JitDump` flag so that the compilation of all methods named ‘Main’ will be dumped:
+
+```shell
+# Windows
+set COMPlus_JitDump=Main
+
+# Unix
+export COMPlus_JitDump=Main
+```
+
+* *Windows-only:* Setting the registry key `HKCU\Software\Microsoft\.NETFramework`, Value `<flagName>`, type `REG_SZ` or `REG_DWORD` (depending on the flag).
+* *Windows-only:* Setting the registry key `HKLM\Software\Microsoft\.NETFramework`, Value `<flagName>`, type `REG_SZ` or `REG_DWORD` (depending on the flag).
+
+## Specifying method names
+
+The complete syntax for specifying a single method name (for a flag that takes a method name, such as `COMPlus_JitDump`) is:
+
+```
+[[<Namespace>.]<ClassName>::]<MethodName>[([<types>)]
+```
+
+For example
+
+```
+System.Object::ToString(System.Object)
+```
+
+The namespace, class name, and argument types are optional, and if they are not present, default to a wildcard. Thus stating:
+
+```
+Main
+```
+
+will match all methods named Main from any class and any number of arguments.
+
+<types> is a comma separated list of type names. Note that presently only the number of arguments and not the types themselves are used to distinguish methods. Thus, `Main(Foo, Bar)` and `Main(int, int)` will both match any main method with two arguments.
+
+The wildcard character ‘*’ can be used for <ClassName> and <MethodName>. In particular * by itself indicates every method.
+
+## Useful COMPlus variables
+
+Below are some of the most useful `COMPlus` variables. Where {method-list} is specified in the list below, you can supply a space-separated list of either fully-qualified or simple method names (the former is useful when running something that has many methods of the same name), or you can specific ‘*’ to mean all methods.
+
+* `COMPlus_JitDump`={method-list} – dump lots of useful information about what the JIT is doing. See [Reading a JitDump](../botr/ryujit-overview.md#reading-a-jitdump) for more on how to analyze this data.
+* `COMPlus_JitDisasm`={method-list} – dump a disassembly listing of each method.
+* `COMPlus_JitDiffableDasm` – set to 1 to tell the JIT to avoid printing things like pointer values that can change from one invocation to the next, so that the disassembly can be more easily compared.
+* `COMPlus_JitGCDump`={method-list} – dump the GC information.
+* `COMPlus_JitUnwindDump`={method-list} – dump the unwind tables.
+* `COMPlus_JitEHDump`={method-list} – dump the exception handling tables.
+* `COMPlus_JitTimeLogFile`={file name} – this specifies a log file to which timing information is written.
+* `COMPlus_JitTimeLogCsv`={file name} – this specifies a log file to which summary timing information can be written, in CSV form.
+
+See also: [CLR Configuration Knobs](../project-docs/clr-configuration-knobs.md)
+
+## Dumping native images
+
+If you followed the tutorial above and ran the sample app, you may be wondering why the disassembly for methods like `Substring` didn't show up in the output. This is because `Substring` lives in mscorlib, which (by default) is compiled ahead-of-time to a native image via [crossgen](../building/crossgen.md). Telling crossgen to dump the info works slightly differently.
+
+* First, perform a debug build of the native parts of the repo: `build skipmscorlib skiptests`.
+ * This should produce the binaries for crossgen in `bin/Product/<OS>.<arch>.Debug`.
+* Next, set the appropriate configuration knob for the info you want to dump. Usually, this is just the same as the corresponding JIT knob, except prefixed with `Ngen`; for example, to show the disassembly listing of a particular method you would `set COMPlus_NgenDisasm=Foo`.
+* Run crossgen on the assembly you want to dump: `crossgen MyLibrary.dll`
+ * If you want to see the output of crossgen specifically for mscorlib, invoke `build skipnative skiptests` from the repo root. The dumps should be written to a file in `bin/Logs` that you can just view.
diff --git a/Documentation/building/windows-instructions.md b/Documentation/building/windows-instructions.md
new file mode 100644
index 0000000..f68ed2c
--- /dev/null
+++ b/Documentation/building/windows-instructions.md
@@ -0,0 +1,188 @@
+Build CoreCLR on Windows
+========================
+
+These instructions will lead you through building CoreCLR and running a "Hello World" demo on Windows.
+
+Environment
+===========
+
+You must install several components to build the CoreCLR and CoreFX repos. These instructions were tested on Windows 7+.
+
+Visual Studio
+-------------
+
+Visual Studio must be installed. Supported versions:
+
+- [Visual Studio 2015](https://www.visualstudio.com/downloads/visual-studio-2015-downloads-vs) (Community, Professional, Enterprise)
+
+To debug managed code, ensure you have installed atleast [Visual Studio 2015 Update 3](https://www.visualstudio.com/en-us/news/releasenotes/vs2015-update3-vs).
+
+Make sure that you install "VC++ Tools". By default, they will not be installed.
+
+To build for Arm32, you need to have [Windows SDK for Windows 10](https://developer.microsoft.com/en-us/windows/downloads) installed.
+
+Visual Studio Express is not supported.
+
+CMake
+-----
+
+The CoreCLR repo build has been validated using CMake 3.5.2.
+
+- Install [CMake](http://www.cmake.org/download) for Windows.
+- Add it to the PATH environment variable.
+
+Python
+---------
+Python is used in the build system. We are currently using python 2.7.9, although
+any recent (2.4+) version of Python should work, including Python 3.
+- Install [Python](https://www.python.org/downloads/) for Windows.
+- Add it to the PATH environment variable.
+
+PowerShell
+----------
+PowerShell is used in the build system. Ensure that it is accessible via the PATH environment variable. Typically this is %SYSTEMROOT%\System32\WindowsPowerShell\v1.0\.
+
+Powershell version must be 3.0 or higher. This should be the case for Windows 8 and later builds.
+- Windows 7 SP1 can install Powershell version 4 [here](https://www.microsoft.com/en-us/download/details.aspx?id=40855).
+
+Git Setup
+---------
+
+Clone the CoreCLR and CoreFX repositories (either upstream or a fork).
+
+```bat
+C:\git>git clone https://github.com/dotnet/coreclr
+C:\git>git clone https://github.com/dotnet/corefx
+```
+
+This guide assumes that you've cloned the CoreCLR and CoreFX repositories into C:\git using the default repo names. If your setup is different, you'll need to pay attention to the commands you run. The guide will always show you the current directory.
+
+The repository is configured to allow Git to make the right decision about handling CRLF. Specifically, if you are working on **Windows**, please ensure that **core.autocrlf** is set to **true**. On **non-Windows** platforms, please set it to **input**.
+
+Demo directory
+--------------
+
+In order to keep everything tidy, create a new directory for the files that you will build or acquire.
+
+```bat
+c:\git>mkdir \coreclr-demo\runtime
+c:\git>mkdir \coreclr-demo\ref
+```
+
+Build the Runtime
+=================
+
+To build CoreCLR, run `build.cmd` from the root of the coreclr repository. This will do a x64/Debug build of CoreCLR, its native components, mscorlib.dll, and the tests.
+
+ C:\git\coreclr>build -rebuild
+
+ [Lots of build spew]
+
+ Repo successfully built.
+
+ Product binaries are available at C:\git\coreclr\bin\Product\Windows_NT.x64.debug
+ Test binaries are available at C:\git\coreclr\bin\tests\Windows_NT.x64.debug
+
+**Note:** To avoid building the tests, pass the 'skiptestbuild' option to build.
+
+**build -?** will list supported parameters.
+
+Check the build output.
+
+- Product binaries will be dropped in `bin\Product\<OS>.<arch>.<flavor>` folder.
+- A NuGet package, Microsoft.Dotnet.CoreCLR, will be created under `bin\Product\<OS>.<arch>.<flavor>\.nuget` folder.
+- Test binaries will be dropped under `bin\Tests\<OS>.<arch>.<flavor>` folder
+
+You will see several files. The interesting ones are:
+
+- `corerun`: The command line host. This program loads and starts the CoreCLR runtime and passes the managed program you want to run to it.
+- `coreclr.dll`: The CoreCLR runtime itself.
+- `mscorlib.dll`: The core managed library for CoreCLR, which contains all of the fundamental data types and functionality.
+
+Copy these files into the demo directory.
+
+```bat
+C:\git\coreclr>copy bin\Product\Windows_NT.x64.debug\clrjit.dll \coreclr-demo\runtime
+C:\git\coreclr>copy bin\Product\Windows_NT.x64.debug\CoreRun.exe \coreclr-demo\runtime
+C:\git\coreclr>copy bin\Product\Windows_NT.x64.debug\coreclr.dll \coreclr-demo\runtime
+C:\git\coreclr>copy bin\Product\Windows_NT.x64.debug\mscorlib.dll \coreclr-demo\runtime
+C:\git\coreclr>copy bin\Product\Windows_NT.x64.debug\System.Private.CoreLib.dll \coreclr-demo\runtime
+```
+
+Build the Framework
+===================
+
+Build the framework out of the corefx directory.
+
+ c:\git\corefx>build.cmd
+
+ [Lots of build spew]
+
+ 0 Warning(s)
+ 0 Error(s)
+ Time Elapsed 00:03:14.53
+ Build Exit Code = 0
+
+It's also possible to add -rebuild to build.cmd to force it to delete the previously built assemblies.
+
+For the purposes of this demo, you need to copy a few required assemblies to the demo folder.
+
+```bat
+C:\git\corefx>copy bin\Windows_NT.AnyCPU.Debug\System.Console\System.Console.dll \coreclr-demo\runtime
+C:\git\corefx>copy bin\Windows_NT.AnyCPU.Debug\System.Diagnostics.Debug\System.Diagnostics.Debug.dll \coreclr-demo\runtime
+C:\git\corefx>copy bin\AnyOS.AnyCPU.Debug\System.IO\System.IO.dll \coreclr-demo\runtime
+C:\git\corefx>copy bin\AnyOS.AnyCPU.Debug\System.IO.FileSystem.Primitives\System.IO.FileSystem.Primitives.dll \coreclr-demo\runtime
+C:\git\corefx>copy bin\AnyOS.AnyCPU.Debug\System.Runtime\System.Runtime.dll \coreclr-demo\runtime
+C:\git\corefx>copy bin\AnyOS.AnyCPU.Debug\System.Runtime.InteropServices\System.Runtime.InteropServices.dll \coreclr-demo\runtime
+C:\git\corefx>copy bin\AnyOS.AnyCPU.Debug\System.Text.Encoding\System.Text.Encoding.dll \coreclr-demo\runtime
+C:\git\corefx>copy bin\AnyOS.AnyCPU.Debug\System.Text.Encoding.Extensions\System.Text.Encoding.Extensions.dll \coreclr-demo\runtime
+C:\git\corefx>copy bin\AnyOS.AnyCPU.Debug\System.Threading\System.Threading.dll \coreclr-demo\runtime
+C:\git\corefx>copy bin\AnyOS.AnyCPU.Debug\System.Threading.Tasks\System.Threading.Tasks.dll \coreclr-demo\runtime
+```
+
+You also need to copy reference assemblies, which will be used during compilation.
+
+```bat
+C:\git\corefx>copy bin\ref\System.Runtime\4.0.0.0\System.Runtime.dll \coreclr-demo\ref
+C:\git\corefx>copy bin\ref\System.Console\4.0.0.0\System.Console.dll \coreclr-demo\ref
+```
+
+Compile the Demo
+================
+
+Now you need a Hello World application to run. You can write your own, if you'd like. Here's a very simple one:
+
+```C#
+using System;
+
+public class Program
+{
+ public static void Main()
+ {
+ Console.WriteLine("Hello, Windows");
+ Console.WriteLine("Love from CoreCLR.");
+ }
+}
+```
+
+Personally, I'm partial to the one on corefxlab which will print a picture for you. Download the [corefxlab demo](https://raw.githubusercontent.com/dotnet/corefxlab/master/demos/CoreClrConsoleApplications/HelloWorld/HelloWorld.cs) to `\coreclr-demo`.
+
+Then you just need to build it, with csc, the .NET Framework C# compiler. It may be easier to do this step within the "Developer Command Prompt for VS2015", if csc is not in your path. Because you need to compile the app against the .NET Core surface area, you need to pass references to the contract assemblies you restored using NuGet:
+
+```bat
+csc /nostdlib /noconfig /r:ref\System.Runtime.dll /r:ref\System.Console.dll /out:runtime\hello.exe hello.cs
+```
+
+Run the demo
+============
+
+You're ready to run Hello World! To do that, run corerun, passing the path to the managed exe, plus any arguments. In this case, no arguments are necessary.
+
+```bat
+C:\coreclr-demo>cd runtime
+C:\coreclr-demo\runtime>CoreRun.exe hello.exe
+```
+
+If `CoreRun.exe` fails for some reason, you will see an empty output. To diagnose the issue, you can use `/v` to switch verbose mode on: `CoreRun.exe /v hello.exe`.
+
+Over time, this process will get easier. Thanks for trying out CoreCLR. Feel free to try a more interesting demo.
diff --git a/Documentation/building/windows-test-instructions.md b/Documentation/building/windows-test-instructions.md
new file mode 100644
index 0000000..8e8949e
--- /dev/null
+++ b/Documentation/building/windows-test-instructions.md
@@ -0,0 +1,106 @@
+Building and running tests on Windows
+=====================================
+
+**Building Tests**       
+
+To build the tests simply navigate to the tests directory above the repo and run,
+
+ C:\git\coreclr>build-test.cmd
+
+*Cleaning Tests*
+
+**Note:** Cleaning should be done before all tests to be sure that the test assets are initialized correctly. To do a clean build of the tests, in a clean command prompt, issue the following command:
+
+ C:\git\coreclr>build-test.cmd -rebuild
+
+*Building tests that will be precompiled*
+
+ C:\git\coreclr>build-test.cmd crossgen
+
+This will use crossgen.exe to precompile the test executables before they are executed.
+
+*Building Other Priority Tests*
+
+ C:\git\coreclr>build-test.cmd -priority=2
+
+The number '2' is just an example. The default value (if no priority is specified) is 0. To clarify, if '2' is specified, all tests with CLRTestPriorty 0, 1 AND 2 will be built and consequently run.
+
+**Example**
+
+To run a clean, priority 1, crossgen test pass:
+
+ C:\git\coreclr>build-test.cmd -rebuild crossgen -priority=1
+
+**buildtest /?** will list additional supported parameters.
+
+Additionally, there is a Visual Studio solution, `<repo_root>\tests\src\AllTestProjects.sln`, where users can build a particular testcase, or all priority 0 testcases that are within it.
+
+**Building Individual Tests**
+
+Note: buildtest.cmd or build.cmd skipnative skipmscorlib needs to be run atleast once
+
+* Native Test: Build the generated Visual Studio solution or make file corresponding to Test cmake file.
+
+* Managed Test: You can invoke msbuild on the project directly from Visual Studio Command Prompt.
+
+**Running Tests**
+
+In a clean command prompt: `<repo_root>\tests\runtest.cmd`
+
+**runtest /?** will list supported parameters.
+
+This will generate the report named as `TestRun_<arch>_<flavor>.html` (e.g. `TestRun_x64__release.html`) in the current folder. It will also copy all the test dependencies to the folder passed at the command line.
+
+**Investigating Test Failures**
+
+Upon completing a test run, you may find one or more tests have failed.
+
+The output of the Test will be available in Test reports directory, but the default the directory would be something like is `<repo_root>\binaries\tests\x64\debug\Reports\Exceptions\Finalization`.
+
+There are 2 files of interest:
+
+- `Finalizer.output.txt` - Contains all the information logged by the test.
+- `Finalizer.error.txt` - Contains the information reported by CoreRun.exe (which executed the test) when the test process crashes.
+
+**Rerunning a failed test**
+
+If you wish to re-run a failed test, please follow the following steps:
+
+1. Set an environment variable, `CORE_ROOT`, pointing to the path to product binaries that was passed to runtest.cmd. The command to set this environment variable is also specified in the test report for a failed test.
+2. Next, run the failed test, the command to which is also present in the test report for a failed test. It will be something like `<repo_root>\binaries\tests\x64\debug\Exceptions\Finalization\Finalizer.cmd`.
+
+If you wish to run the test under a debugger (e.g. [WinDbg](http://msdn.microsoft.com/en-us/library/windows/hardware/ff551063(v=vs.85).aspx)), append `-debug <debuggerFullPath>` to the test command. For example:
+
+
+ <repo_root>\binaries\tests\x64\debug\Exceptions\Finalization\Finalizer.cmd -debug <debuggerFullPath>
+
+**Modifying a test**
+
+If test changes are needed, make the change and build the test project. This will binplace the binaries in test binaries folder (e.g. `<repo_root>\binaries\tests\x64\debug\Exceptions\Finalization`). At this point, follow the steps to re-run a failed test to re-run the modified test.
+
+**Authoring Tests (in VS)**
+
+
+1. Use an existing test such as `<repo_root>\tests\src\Exceptions\Finalization\Finalizer.csproj` as a template and copy it to a new folder under `<repo_root>\tests\src`.
+2. Be sure that the AssemblyName has been removed (this causes confusion with the way tests are generally handled behind the scenes by the build system).
+3. [Assign a CLRTestKind/CLRTestPriority.](test-configuration.md)
+4. Add the project of the new test to `<repo_root>\tests\src\AllTestProjects.sln` in VS
+5. Add source files to this newly added project.
+6. Indicate the success of the test by returning `100`.
+8. Add any other projects as a dependency, if needed.
+9. Build the test.
+10. Follow the steps to re-run a failed test to validate the new test.
+
+Note:
+
+1. You can disable building of a test per architecture or configuration by using DisableProjectBuild tag in the project. for example:
+
+ ``<PropertyGroup>``
+
+ ``<DisableProjectBuild Condition=" '$(Platform)' == 'arm64' ">true</DisableProjectBuild>``
+
+ ``</PropertyGroup>``
+
+2. To Add Nuget\MyGet Refernces use this (project.json)[https://github.com/dotnet/coreclr/blob/master/tests/src/Common/test_dependencies/project.json]
+
+3. To Build against the mscorlib facade add ``<ReferenceLocalMscorlib>true</ReferenceLocalMscorlib>`` to your project
diff --git a/Documentation/coding-guidelines/EventLogging.md b/Documentation/coding-guidelines/EventLogging.md
new file mode 100644
index 0000000..a53d6e9
--- /dev/null
+++ b/Documentation/coding-guidelines/EventLogging.md
@@ -0,0 +1,19 @@
+# CoreClr Event Logging Design
+
+##Introduction
+
+Event Logging is a mechanism by which CoreClr can provide a variety of information on it's state. This Logging works by inserting explicit logging calls by the developer within the VM . The Event Logging mechanism is largely based on [ETW- Event Tracing For Windows](https://msdn.microsoft.com/en-us/library/windows/desktop/bb968803(v=vs.85).aspx)
+
+# Adding Events to the Runtime
+
+- Edit the [Event manifest](../../src/vm/ClrEtwAll.man) to add a new event. For guidelines on adding new events, take a look at the existing events in the manifest and this guide for [ETW Manifests](https://msdn.microsoft.com/en-us/library/dd996930%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396).
+- The build system should automatically generate the required artifacts for the added events.
+- Add entries in the [exclusion list](../../src/vm/ClrEtwAllMeta) if necessary
+- The Event Logging Mechanism provides the following two functions, which can be used within the VM:
+ - **FireEtw**EventName, this is used to trigger the event
+ - **EventEnabled**EventName, this is used to see if any consumer has subscribed to this event
+
+
+# Adding New Logging System
+
+Though the the Event logging system was designed for ETW, the build system provides a mechanism, basically an [adapter script- genXplatEventing.py](../../src/scripts/genXplatEventing.py) so that other Logging System can be added and used by CoreClr. An Example of such an extension for [LTTng logging system](https://lttng.org/) can be found in [genXplatLttng.py](../../src/scripts/genXplatLttng.py )
diff --git a/Documentation/coding-guidelines/clr-code-guide.md b/Documentation/coding-guidelines/clr-code-guide.md
new file mode 100644
index 0000000..84ba5f2
--- /dev/null
+++ b/Documentation/coding-guidelines/clr-code-guide.md
@@ -0,0 +1,1269 @@
+What Every CLR Developer Must Know Before Writing Code
+===
+
+Written in 2006, by:
+
+- Rick Byers ([@RByers](https://github.com/RByers))
+- Jan Kotas ([@jkotas](https://github.com/jkotas))
+- Mike Stall ([@mikestall](https://github.com/mikestall))
+- Rudi Martin ([@Rudi-Martin](https://github.com/Rudi-Martin))
+
+# Contents
+
+* [1 Why you must read this document](#1)
+ * [1.1 Rules of the Code](#1.1)
+ * [1.2 How do I &lt;insert common task&gt;?](#1.2)
+* [2 Rules of the Code (Unmanaged)](#2)
+ * [2.1 Is your code GC-safe?](#2.1)
+ * [2.1.1 How GC holes are created](#2.1.1)
+ * [2.1.2 Your First GC hole](#2.1.2)
+ * [2.1.3 Use GCPROTECT_BEGIN to keep your references up to date](#2.1.3)
+ * [2.1.4 Don't do nonlocal returns from within GCPROTECT blocks](#2.1.4)
+ * [2.1.5 Do not GCPROTECT the same location twice](#2.1.5)
+ * [2.1.6 Protecting multiple OBJECTREF's](#2.1.6)
+ * [2.1.7 Use OBJECTHANDLES for non-scoped protection](#2.1.7)
+ * [2.1.8 Use the right GC Mode – Preemptive vs. Cooperative](#2.1.8)
+ * [2.1.9 Use OBJECTREF to refer to object references as it does automatic sanity checking](#2.1.9)
+ * [2.1.10 How to know if a function can trigger a GC](#2.1.10)
+ * [2.1.10.1 GC_NOTRIGGER/TRIGGERSGC on a scope](#2.1.10.1)
+ * [2.2 Are you using holders to track your resources?](#2.2)
+ * [2.2.1 What are holders and we are they important?](#2.2.1)
+ * [2.2.2 An example of holder usage:](#2.2.2)
+ * [2.2.3 Common Features of Holders](#2.2.3)
+ * [2.2.4 Where do I find a holder?](#2.2.4)
+ * [2.2.5 Can I bake my own holder?](#2.2.5)
+ * [2.2.6 What if my backout code throws an exception?](#2.2.6)
+ * [2.2.7 Pay attention to holder initialization semantics](#2.2.7)
+ * [2.2.8 Some generally useful prebaked holders](#2.2.8)
+ * [2.2.8.1 New'ed memory](#2.2.8.1)
+ * [2.2.8.2 New'ed array](#2.2.8.2)
+ * [2.2.8.3 COM Interface Holder](#2.2.8.3)
+ * [2.2.8.4 Critical Section Holder](#2.2.8.4)
+ * [2.3 Does your code follow our OOM rules?](#2.3)
+ * [2.3.1 What is OOM and why is it important?](#2.3.1)
+ * [2.3.2 Documenting where OOM's can happen](#2.3.2)
+ * [2.3.2.1 Functions that handle OOM's internally](#2.3.2.1)
+ * [2.3.2.2 OOM state control outside of contracts](#2.3.2.2)
+ * [2.3.2.3 Remember...](#2.3.2.3)
+ * [2.4 Are you using SString and/or the safe string manipulation functions?](#2.4)
+ * [2.4.1 SString](#2.4.1)
+ * [2.5 Are you using safemath.h for pointer and memory size allocations?](#2.5)
+ * [2.6 Are you using the right type of Critical Section?](#2.6)
+ * [2.6.1 Use only the official synchronization mechanisms](#2.6.1)
+ * [2.6.2 Using Crsts](#2.6.2)
+ * [2.6.3 Creating Crsts](#2.6.3)
+ * [2.6.4 Entering and Leaving Crsts](#2.6.4)
+ * [2.6.5 Other Crst Operations](#2.6.5)
+ * [2.6.6 Advice on picking a level for your Crst](#2.6.6)
+ * [2.6.7 Can waiting on a Crst generate an exception?](#2.6.7)
+ * [2.6.8 CRITSECT_UNSAFE Flags](#2.6.8)
+ * [2.6.9 Bypassing leveling (CRSTUNORDEREDnordered)](#2.6.9)
+ * [2.6.10 So what are the prerequisites and side-effects of entering a Crst?](#2.6.10)
+ * [2.6.11 Using Events and Waitable Handles](#2.6.11)
+ * [2.6.12 Do not get clever with "lockless" reader-writer data structures](#2.6.12)
+ * [2.6.13 Yes, your thread could be running non-preemptively!](#2.6.13)
+ * [2.6.14 Dos and Don'ts for Synchronization](#2.6.14)
+ * [2.7 Are you making hidden assumptions about the order of memory writes?](#2.7)
+ * [2.8 Is your code compatible with managed debugging?](#2.8)
+ * [2.9 Does your code work on 64-bit?](#2.9)
+ * [2.9.1 Primitive Types](#2.9.1)
+ * [2.10 Does your function declare a CONTRACT?](#2.10)
+ * [2.10.1 What can be said in a contract?](#2.10.1)
+ * [2.10.1.1 THROWS/NOTHROW](#2.10.1.1)
+ * [2.10.1.2 INJECT_FAULT(handler-stmt)/FORBID_FAULT](#2.10.1.2)
+ * [2.10.1.3 GC_TRIGGERS/GC_NOTRIGGER](#2.10.1.3)
+ * [2.10.1.4 MODE_PREEMPTIVE/ MODE_COOPERATIVE/ MODE_ANY](#2.10.1.4)
+ * [2.10.1.5 LOADS_TYPE(loadlevel)](#2.10.1.5)
+ * [2.10.1.6 CAN_TAKE_LOCK / CANNOT_TAKE_LOCK](#2.10.1.6)
+ * [2.10.1.7 EE_THREAD_REQUIRED / EE_THREAD_NOT_REQUIRED](#2.10.1.7)
+ * [2.10.1.8 SO_TOLERANT/SO_INTOLERANT](#2.10.1.8)
+ * [2.10.1.9 PRECONDITION(expr)](#2.10.1.9)
+ * [2.10.1.10 POSTCONDITION(expr)](#2.10.1.10)
+ * [2.10.2 Is order important?](#2.10.2)
+ * [2.10.3 Using the right form of contract](#2.10.3)
+ * [2.10.4 When is it safe to use a runtime contract?](#2.10.4)
+ * [2.10.5 Do not make unscoped changes to the ClrDebugState](#2.10.5)
+ * [2.10.6 For more details...](#2.10.6)
+ * [2.11 Is your code DAC compliant?](#2.11)
+
+# <a name="1"/>1 Why you must read this document
+
+Like most large codebases, the CLR codebase has many internal invariants and an extensive debug build infrastructure for detecting problems. Clearly, it is important that developers working on the CLR understand these rules and conventions.
+
+The information contained here is considered the minimum set of knowledge required of developers who work on any part of the CLR. This is the document we wished we had all throughout the CLR's history, especially when fixing a bug that could have been prevented had this information been more readily available.
+
+This document is divided into the following sections.
+
+## <a name="1.1"/>1.1 Rules of the Code
+
+This is the most important section. Think of the chapter headings as a checklist to use while designing and writing your code. This section is divided into sections for managed and unmanaged code as they face quite different issues.
+
+Rules can either be imposed by invariants or team policy.
+
+"Invariants" are actual semantic rules imposed by the architecture, e.g. the GC-safe use of managed object references in unmanaged code. There's nothing discretionary about these. Violate these and you've introduced a customer-visible bug.
+
+"Team Policy" rules are rules we've established as "good practices" – for example, the rule that every function must declare a contract. While a missing contract here or there isn't a shipstopper, violating these rules is still heavily frowned upon and you should expect a bug filed against you unless you can supply a very compelling reason why your code needs an exemption.
+
+Team policy rules are not necessarily less important than invariants. For example, the rule to use [safemath.h][safemath.h] rather that coding your own integer overflow check is a policy rule. But because it deals with security, we'd probably treat it as higher priority than a very obscure (non-security) related bug.
+
+[safemath.h]: https://github.com/dotnet/coreclr/blob/master/src/inc/safemath.h
+
+One type of rule you won't find here are purely syntactic "code formatting" rules such as brace placement. While there is value in uniform stylistic conventions, we don't want to "lay down the law" on these to the extent that we do for the more semantic-oriented issues covered here. The rules included in this document are here because breaking them would do one of the following:
+
+- Introduce an actual bug.
+- Significantly increase the risk of a serious bug slipping through.
+- Frustrate our automated bug-detection infrastructure.
+
+## <a name="1.2"/>1.2 How do I &lt;insert common task&gt;?
+
+The chapter headings in this section can be regarded as a FAQ. If you have a specific need, look here for "best practices" guidance on how to get something. Also, if you're thinking of adding yet another hash table implementation to the code base, check here first as there's a good chance there's already existing code that can be adapted or used as is.
+
+This section will also be divided into managed and unmanaged sections.
+
+# <a name="2"/>2 Rules of the Code (Unmanaged)
+
+## <a name="2.1"/>2.1 Is your code GC-safe?
+
+### <a name="2.1.1"/>2.1.1 How GC holes are created
+
+The term "GC hole" refers to a special class of bugs that bedevils the CLR. The GC hole is a pernicious bug because it is easy to introduce by accident, repros rarely and is very tedious to debug. A single GC hole can suck up weeks of dev and test time.
+
+One of the major features of the CLR is the Garbage Collection system. That means that allocated objects, as seen by a managed application, are never freed explicitly by the programmer. Instead, the CLR periodically runs a Garbage Collector (GC). The GC discards objects that are no longer in use. Also, the GC compacts the heap to avoid unused holes in memory. Therefore, a managed object does not have a fixed address. Objects move around according to the whims of the garbage collector.
+
+To do its job, the GC must be told about every reference to every GC object. The GC must know about every stack location, every register and every non-GC data structure that holds a pointer to a GC object. These external pointers are called "root references."
+
+Armed with this information, the GC can find all objects directly referenced from outside the GC heap. These objects may in turn, reference other objects – which in turn reference other objects and so on. By following these references, the GC finds all reachable ("live") objects. All other objects are, by definition, unreachable and therefore discarded. After that, the GC may move the surviving objects to reduce memory fragmentation. If it does this, it must, of course, update all existing references to the moved object.
+
+Any time a new object is allocated, a GC may occur. GC can also be explicitly requested by calling the GarbageCollect function directly. GC's do not happen asynchronously outside these events but since other running threads can trigger GC's, your thread must act as if GC's _are_ asynchronous unless you take specific steps to synchronize with the GC. More on that later.
+
+A GC hole occurs when code inside the CLR creates a reference to a GC object, neglects to tell the GC about that reference, performs some operation that directly or indirectly triggers a GC, then tries to use the original reference. At this point, the reference points to garbage memory and the CLR will either read out a wrong value or corrupt whatever that reference is pointing to.
+
+### <a name="2.1.2"/>2.1.2 Your First GC hole
+
+The code fragment below is the simplest way to introduce a GC hole into the system.
+
+ //OBJECTREF is a typedef for Object*.
+
+ {
+ MethodTable *pMT = g_pObjectClass->GetMethodTable();
+
+ OBJECTREF a = AllocateObject(pMT);
+ OBJECTREF b = AllocateObject(pMT);
+
+ //WRONG!!! "a" may point to garbage if the second
+ //"AllocateObject" triggered a GC.
+ DoSomething (a, b);
+ }
+
+All it does is allocate two managed objects, and then does something with them both.
+
+This code compiles fine, and if you run simple pre-checkin tests, it will probably "work." But this code will crash eventually.
+
+Why? If the second call to AllocateObject() triggers a GC, that GC discards the object instance you just assigned to "a". This code, like all C++ code inside the CLR, is compiled by a non-managed compiler and the GC cannot know that "a" holds a root reference to an object you want kept live.
+
+This point is worth repeating. The GC has no intrinsic knowledge of root references stored in local variables or non-GC data structures maintained by the CLR itself. You must explicitly tell the GC about them.
+
+### <a name="2.1.3"/>2.1.3 Use GCPROTECT_BEGIN to keep your references up to date
+
+Here's how to fix our buggy code fragment.
+
+ #include "frames.h"
+ {
+ MethodTable *pMT = g_pObjectClass->GetMethodTable();
+
+ //RIGHT
+ OBJECTREF a = AllocateObject(pMT);
+
+ GCPROTECT_BEGIN(a);
+ OBJECTREF b = AllocateObject(pMT);
+
+ DoSomething (a, b);
+
+ GCPROTECT_END();
+ }
+
+Notice the addition of the line GCPROTECT_BEGIN(a). GCPROTECT_BEGIN is a macro whose argument is any OBJECTREF-typed storage location (it has to be an expression that can you can legally apply the address-of (&) operator to.) GCPROTECT_BEGIN tells the GC two things:
+
+- The GC is not to discard any object referred to by the reference stored in local "a".
+- If the GC moves the object referred to by "a", it must update "a" to point to the new location.
+
+Now, if the second AllocateObject() triggers a GC, the "a" object will still be around afterwards, and the local variable "a" will still point to it. "a" may not contain the same address as before, but it will point to the same object. Hence, DoSomething() receives the correct data.
+
+Note that we didn't similarly protect 'b" because the caller has no use for "b" after DoSomething returns. Furthermore, there's no point in keeping "b" updated because DoSomething receives a copy of the reference (don't confuse with "copy of the object"), not the reference itself. If DoSomething internally causes GC as well, it is DoSomething's responsibility to protect its own copies of "a" and "b".
+
+Having said that, no one should complain if you play it safe and GCPROTECT "b" as well. You never know when someone might add code later that makes the protection necessary.
+
+Every GCPROTECT_BEGIN must have a matching GCPROTECT_END, which terminates the protected status of "a". As an additional safeguard, GCPROTECT_END overwrites "a" with garbage so that any attempt to use "a" afterward will fault. GCPROTECT_BEGIN introduces a new C scoping level that GCPROTECT_END closes, so if you use one without the other, you'll probably experience severe build errors.
+
+### <a name="2.1.4"/>2.1.4 Don't do nonlocal returns from within GCPROTECT blocks
+
+Never do a "return", "goto" or other non-local return from between a GCPROTECT_BEGIN/END pair. This will leave the thread's frame chain corrupted.
+
+One exception: it is explicitly allowed to leave a GCPROTECT block by throwing a managed exception (usually via the COMPlusThrow() function). The exception subsystem knows about GCPROTECT and correctly fixes up the frame chain as it unwinds.
+
+Why is GCPROTECT not implemented via a C++ smart pointer? The GCPROTECT macro originates in .NET Framework v1. All error handling was done explicitly at that time, without any use C++ exception handling or stack allocated holders.
+
+### <a name="2.1.5"/>2.1.5 Do not GCPROTECT the same location twice
+
+The following is illegal and will cause some sort of crash:
+
+ // WRONG: Can't GCPROTECT twice.
+ OBJECTREF a = AllocateObject(...);
+ GCPROTECT_BEGIN(a);
+ GCPROTECT_BEGIN(a);
+
+It'd be nice if the GC was robust enough to ignore the second, unnecessary GCPROTECT but I've been assured many times that this isn't possible.
+
+Don't confuse the reference with a copy of the reference. It's not illegal to protect the same reference twice. What is illegal is protecting the same _copy_ of the reference twice. Hence, the following is legal:
+
+ OBJECTREF a = AllocateObject(...);
+ GCPROTECT_BEGIN(a);
+ DoSomething(a);
+ GCPROTECT_END();
+
+ void DoSomething(OBJECTREF a)
+ {
+ GCPROTECT_BEGIN(a);
+ GCPROTECT_END();
+ }
+
+### <a name="2.1.6"/>2.1.6 Protecting multiple OBJECTREF's
+
+You can protect multiple OBJECTREF locations using one GCPROTECT. Group them all into a structure and pass the structure to GCPROTECT_BEGIN. GCPROTECT_BEGIN applies a sizeof to determine how many locations you want to protect. Do not mix any non-OBJECTREF fields into the struct!
+
+### <a name="2.1.7"/>2.1.7 Use OBJECTHANDLES for non-scoped protection
+
+GCPROTECT_BEGIN is very handy, as we've seen, but its protection is limited to a C++ nesting scope. Suppose you need to store a root reference inside a non-GC data structure that lives for an arbitrary amount of time?
+
+The solution is the OBJECTHANDLE. OBJECTHANDLE allocates a location from special blocks of memory that are known explicitly to the GC. Any root reference stored in this location will automatically keep the object live and be updated to reflect object moves. You can retrieve the correct reference by indirecting the location.
+
+Handles are implemented through several layers of abstraction – the "official" interface for public use is the one described here and is exposed through [objecthandle.h][objecthandle.h]. Don't confuse this with [handletable.h][handletable.h] which contains the internals. The CreateHandle() api allocates a new location. ObjectFromHandle() dereferences the handle and returns an up-to-date reference. DestroyHandle() frees the location.
+
+[objecthandle.h]: https://github.com/dotnet/coreclr/blob/master/src/gc/objecthandle.h
+[handletable.h]: https://github.com/dotnet/coreclr/blob/master/src/gc/handletable.h
+
+The following code fragment shows how handles are used. In practice, of course, people use GCPROTECT rather than handles for situations this simple.
+
+ {
+ MethodTable *pMT = g_pObjectClass->GetMethodTable();
+
+ //Another way is to use handles. Handles would be
+ // wasteful for a case this simple but are useful
+ // if you need to protect something for the long
+ // term.
+ OBJECTHANDLE ah;
+ OBJECTHANDLE bh;
+
+ ah = CreateHandle(AllocateObject(pMT));
+ bh = CreateHandle(AllocateObject(pMT));
+
+ DoSomething (ObjectFromHandle(ah),
+ ObjectFromhandle(bh));
+
+ DestroyHandle(bh);
+ DestroyHandle(ah);
+ }
+
+There are actually several flavors of handles. This section lists the most common ones. ([objecthandle.h][objecthandle.h] contains the complete list.)
+
+- **HNDTYPE_STRONG**: This is the default and acts like a normal reference. Created by calling CreateHandle(OBJECTREF).
+- **HNDTYPE_WEAK_LONG**: Tracks an object as long as one strong reference to its exists but does not itself prevent the object from being GC'd. Created by calling CreateWeakHandle(OBJECTREF).
+- **HNDTYPE_PINNED**: Pinned handles are strong handles which have the added property that they prevent an object from moving during a garbage collection cycle. This is useful when passing a pointer to object innards out of the runtime while GC may be enabled.
+
+NOTE: PINNING AN OBJECT IS EXPENSIVE AS IT PREVENTS THE GC FROM ACHIEVING OPTIMAL PACKING OF OBJECTS DURING EPHEMERAL COLLECTIONS. THIS TYPE OF HANDLE SHOULD BE USED SPARINGLY!
+
+### <a name="2.1.8"/>2.1.8 Use the right GC Mode – Preemptive vs. Cooperative
+
+Earlier, we implied that GC doesn't occur spontaneously. This is true... for a given thread. But the CLR is multithreaded. Even if your thread does all the right things, it has no control over other threads.
+
+Consider two possible ways to schedule GC:
+
+- **Preemptive**: Any thread that needs to do a GC can do one without regard for the state of other threads. The other threads run concurrently with the GC.
+- **Cooperative**: A thread can only start a GC once all other threads agree to allow the GC. The thread attempting the GC is blocked until all other threads reach a state of agreement.
+
+Both have their strengths and drawbacks. Preemptive mode sounds attractive and efficient except for one thing: it completely breaks our previously discussed GC-protection mechanism. Consider the following code fragment:
+
+ OBJECTREF a = AllocateObject(...)
+ GCPROTECT_BEGIN(a);
+ DoSomething(a);
+
+Now, while the compiler can generate any valid code for this, it's very likely it will look something like this:
+
+ call AllocateObject
+ mov [A],eax ;;store result in "a"
+ ... code for GCPROTECT_BEGIN omitted...
+ push [A] ;push argument to DoSomething
+ call DoSomething
+
+This is supposed to be work correctly in every case, according to the semantics of GCPROTECT. However, suppose just after the "push" instruction, another thread gets the time-slice, starts a GC and moves the object A. The local variable A will be correctly updated – but the copy of A which we just pushed as an argument to DoSomething() will not. Hence, DoSomething() will receive a pointer to the old location and crash. Clearly, preemptive GC alone will not suffice for the CLR.
+
+How about the alternative: cooperative GC? With cooperative GC, the above problem doesn't occur and GCPROTECT works as intended. Unfortunately, the CLR has to interop with legacy unmanaged code as well. Suppose a managed app calls out to the Win32 MessageBox api which waits for the user to click a button before returning. Until the user does this, all managed threads in the same process are blocked from GC. Not good.
+
+Because neither policy alone suffices for the CLR, the CLR supports both: and you, as a developer, are responsible for switching the threads accordingly. Note that the GC-scheduling mode is a property of an individual thread; not a global system property.
+
+Put precisely: as long as a thread is in cooperative mode, it is guaranteed that a GC will only occur when your thread triggers an object allocation, calls out to interruptible managed code or explicitly requests a GC. All other threads are blocked from GC. As long as your thread is in preemptive mode, then you must assume that a GC can be started any time (by some other thread) and is running concurrently with your thread.
+
+A good rule of thumb is this: a CLR thread runs in cooperative mode any time it is running managed code or any time it needs to manipulate object references in any way. An Execution Engine (EE) thread that is running in preemptive mode is usually running unmanaged code; i.e. it has left the managed world. Process threads that have never entered CLR in any way are effectively running in preemptive mode. Much of the code inside CLR runs in cooperative mode.
+
+While you are running in preemptive mode, OBJECTREF's are strictly hands-off; their values are completely unreliable. In fact, the checked build asserts if you even touch an OBJECTREF in preemptive mode. In cooperative mode, you are blocking other threads from GC so you must avoid long or blocking operations. Also be aware of any critical sections or semaphores you wait on. They must not guard sections that themselves trigger GC.
+
+**Setting the GC mode:** The preferred way to set the GC mode are the GCX_COOP and GCX_PREEMP macros. These macros operate as holders. That is, you declare them at the start of the block of code you want to execute in a certain mode. Upon any local or non-local exit out of that scope, a destructor automatically restores the original mode.
+
+ { // always open a new C++ scope to switch modes
+ GCX_COOP();
+ Code you want run in cooperative mode
+ } // leaving scope automatically restores original mode
+
+It's perfectly legal to invoke GCX_COOP() when the thread is already in cooperative mode. GCX_COOP will be a NOP in that case. Likewise for GCX_PREEMP.
+
+GCX_COOP and GCX_PREEMP will never throw an exception and return no error status.
+
+There is a special case for purely unmanaged threads (threads that have no Thread structure created for them.) Such threads are considered "permanently preemptive." Hence, GCX_COOP will assert if called on such a thread while GCX_PREEMP will succeed as a NOP.
+
+There are a couple of variants for special situations:
+
+- **GCX_MAYBE_\*(BOOL)**: This version only performs the switch if the boolean parameter is TRUE. Note that the mode restore at the end of the scope still occurs whether or not you passed TRUE. (Of course, this is only important if the mode got switched some other way inside the scope. Usually, this shouldn't happen.)
+- **GCX_\*_THREAD_EXISTS(Thread\*)**: If you're concerned about the repeated GetThread() and null Thread checks inside this holder, use this "performance" version which lets you cache the Thread pointer and pass it to all the GCX_\* calls. You cannot use this to change the mode of another thread. You also cannot pass NULL here.
+
+To switch modes multiple times in a function, you must introduce a new scope for each switch. You can also call GCX_POP(), which performs a mode restore prior to the end of the scope. (The mode restore will happen again at the end of the scope, however. Since mode restore is idempotent, this shouldn't matter.) Do not, however, do this:
+
+ {
+ GCX_COOP();
+ ...
+ GCX_PREEMP(): //WRONG!
+ }
+
+You will get a compile error due to a variable being redeclared in the same scope.
+
+While the holder-based macros are the preferred way to switch modes, sometimes one needs to leave a mode changed beyond the end of the scope. For those situations, you may use the "raw" unscoped functions:
+
+ GetThread()->DisablePreemptiveGC(); // switch to cooperative mode
+ GetThread()->EnablePreemptiveGC(); // switch to preemptive mode
+
+There is no automatic mode-restore with these functions so the onus is on you to manage the lifetime of the mode. Also, mode changes cannot be nested. You will get an assert if you try to change to a mode you're already in. The "this" argument must be the currently executing thread. You cannot use this to change the mode of another thread.
+
+**Key Takeaway:** Use GCX_COOP/PREEMP rather than unscoped calls to DisablePreemptiveGC() whenever possible.
+
+**Testing/asserting the GC mode:**
+
+You can assert the need to be in a particular mode in the contract by using one of the following:
+
+ CONTRACTL
+ {
+ MODE_COOPERATIVE
+ }
+ CONTRACTL_END
+
+ CONTRACTL
+ {
+ MODE_PREEMPTIVE
+ }
+ CONTRACTL_END
+
+There are also standalone versions:
+
+ {
+ GCX_ASSERT_COOP();
+ }
+
+ {
+ GCX_ASSERT_PREEMP();
+ }
+
+You'll notice that the standalone versions are actually holders rather than simple statements. The intention was that these holders would assert again on scope exit to ensure that any backout holders are correctly restoring the mode. However, that exit check was disabled initially with the idea of enabling it eventually once all the backout code was clean. Unfortunately, the "eventually" has yet to arrive. As long as you use the GCX holders to manage mode changes, this shouldn't really be a problem.
+
+### <a name="2.1.9"/>2.1.9 Use OBJECTREF to refer to object references as it does automatic sanity checking
+
+The checked build inserts automatic sanity-checking every single time an OBJECTREF is manipulated. Under the retail build, OBJECTREF is defined as a pointer exactly as you'd expect. But under the checked build, OBJECTREF is defined as a "smart-pointer" class that sanity-checks the pointer on every operation. Also, the current thread is validated to be in cooperative GC mode.
+
+Thus, the following code fragment:
+
+ OBJECTREF uninitialized;
+ DoSomething(uninitialized);
+
+will produce the following assert:
+
+ "Detected use of a corrupted OBJECTREF. Possible GC hole."
+
+This is because the default constructor for OBJECTREF initializes to 0xcccccccc. When you pass "uninitialized" to DoSomething(), this invokes the OBJECT copy constructor which notices that the source of the copy is a bad pointer (0xcccccccc). This causes the assert.
+
+OBJECTREF's pointer mimicry isn't perfect. In certain cases, the checked build refuses to build legal-seeming constructs. We just have to work around this. A common case is casting an OBJECTREF to either a void* or a STRINGREF (we actually define a whole family of OBJECTREF-like pointers, for various interesting subclasses of objects.) The construct:
+
+ //WRONG
+ OBJECTREF o = ...;
+ LPVOID pv = (LPVOID)o;
+
+compiles fine under retail but breaks under checked. The usual workaround is something like this:
+
+ pv = (LPVOID)OBJECTREFToObject(o);
+
+### <a name="2.1.10"/>2.1.10 How to know if a function can trigger a GC
+
+The GC behavior of every function in the source base must be documented in its contract. Every function must have a contract that declares one of the following:
+
+ // If you call me, assume a GC can happen
+ void Noisy()
+ {
+ CONTRACTL
+ {
+ GC_TRIGGERS;
+ }
+ CONTRACTL_END
+ }
+
+or
+
+ // If you call me and the thread is in cooperative mode, I guarantee no GC
+ // will occur.
+ void Quiet()
+ {
+ CONTRACTL
+ {
+ GC_NOTRIGGER;
+ }
+ CONTRACTL_END
+ }
+
+A GC_NOTRIGGER function cannot:
+
+- Allocate managed memory
+- Call managed code
+- Enter a GC-safe point
+- Toggle the GC mode <sup>[1]</sup>
+- Block for long periods of time
+- Synchronize with the GC
+- Explicitly trigger a GC (duh)
+- Call any other function marked GC_TRIGGERS
+- Call any other code that does these things
+
+[1] With one exception: GCX_COOP (which effects a preemp->coop->preemp roundtrip) is permitted. The rationale is that GCX_COOP becomes a NOP if the thread was cooperative to begin with so it's safe to allow this (and necessary to avoid some awkward code in our product.)
+
+**Note that for GC to be truly prevented, the caller must also ensure that the thread is in cooperative mode.** Otherwise, all the precautions above are in vain since any other thread can start a GC at any time. Given that, you might be wondering why cooperative mode is not part of the definition of GC_NOTRIGGER. In fact, there is a third thread state called GC_FORBID which is exactly that: GC_TRIGGERS plus forced cooperative mode. As its name implies, GC_FORBID _guarantees_ that no GC will occur on any thread.
+
+Why do we use GC_NOTRIGGERS rather than GC_FORBID? Because forcing every function to choose between GC_TRIGGERS and GC_FORBID is too inflexible given that some callers don't actually care about GC. Consider a simple class member function that returns the value of a field. How should it be declared? If you choose GC_TRIGGERS, then the function cannot be legally called from a GC_NOTRIGGER function even though this is perfectly safe. If you choose GC_FORBID, then every caller must switch to cooperative mode to invoke the function just to prevent an assert. Thus, GC_NOTRIGGER was created as a middle ground and has become far more pervasive and useful than GC_FORBID. Callers who actually need GC stopped will have put themselves in cooperative mode anyway and in those cases, GC_NOTRIGGER actually becomes GC_FORBID. Callers who don't care can just call the function and not worry about modes.
+
+**Note:** There is no GC_FORBID keyword defined for contracts but you can simulate it by combining GC_NOTRIGGER and MODE_COOPERATIVE.
+
+**Important:** The notrigger thread state is implemented as a counter rather than boolean. This is unfortunate as this should not be necessary and exposes us to nasty ref-counting style bugs. What is important that contracts intentionally do not support unscoped trigger/notrigger transitions. That is, a GC_NOTRIGGER inside a contract will **increment** the thread's notrigger count on entry to the function but on exit, **it will not decrement the count , instead it will restore the count from a saved value.** Thus, any _net_ changes in the trigger state caused within the body of the function will be wiped out. This is good unless your function was designed to make a net change to the trigger state. If you have such a need, you'll just have to work around it somehow because we actively discourage such things in the first place. Ideally, we'd love to replace that counter with a Boolean at sometime.
+
+#### <a name="2.1.10.1"/>2.1.10.1 GC_NOTRIGGER/TRIGGERSGC on a scope
+
+Sometimes you want to mark a scope rather than a function. For that purpose, GC_TRIGGERS and TRIGGERSGC also exist as standalone holders. These holders are also visible to the static contract scanner.
+
+ {
+ TRIGGERSGC();
+ }
+
+ {
+ GCX_NOTRIGGER();
+ }
+
+One difference between the standalone TRIGGERSGC and the contract GC_TRIGGERS: the standalone version also performs a "phantom" GC that poisons all unreachable OBJECTREFs. The contract version does not do this mainly for checked build perf concerns.
+
+## <a name="2.2"/>2.2 Are you using holders to track your resources?
+
+### <a name="2.2.1"/>2.2.1 What are holders and we are they important?
+
+The CLR team has coined the name **holder** to refer to the infrastructure that encapsulates the common grunt work of writing robust **backout code**. **Backout code** is code that deallocate resources or restore CLR data structure consistency when we abort an operation due to an error or an asynchronous event. Oftentimes, the same backout code will execute in non-error paths for resources allocated for use of a single scope, but error-time backout is still needed even for longer lived resources.
+
+Way back in V1, error paths were _ad-hoc._ Typically, they flowed through "fail:" labels where the backout code was accumulated.
+
+Due to the no-compromise robustness requirements that the CLR Hosting model (with SQL Server as the initial customer) imposed on us in the .NET Framework v2 release, we have since become much more formal about backout. One reason is that we like to write backout that will execute if you leave the scope because of an exception. We also want to centralize policy regarding exceptions occurring inside backout. Finally, we want an infrastructure that will discourage developer errors from introducing backout bugs in the first place.
+
+Thus, we have centralized cleanup around C++ destructor technology. Instead of declaring a HANDLE, you declare a HandleHolder. The holder wraps a HANDLE and its destructor closes the handle no matter how control leaves the scope. We have already implemented standard holders for common resources (arrays, memory allocated with C++ new, Win32 handles and locks.) The Holder mechanism is extensible so you can add new types of holders as you need them.
+
+### <a name="2.2.2"/>2.2.2 An example of holder usage
+
+The following shows explicit backout vs. holders:
+
+**Wrong**
+
+ HANDLE hFile = ClrCreateFile(szFileName, GENERIC_READ, ...);
+ if (hFile == INVALID_HANDLE_VALUE) {
+ COMPlusThrow(...);
+ }
+
+ DWORD dwFileLen = SafeGetFileSize(hFile, 0);
+ if (dwFileLen == 0xffffffff) {
+ CloseHandle(hFile);
+ COMPlusThrow(...);
+ }
+ CloseHandle(hFile);
+ return S_OK;
+
+**Right**
+
+ HandleHolder hFile(ClrCreateFile(szFileName, GENERIC_READ, ...));
+ if (hFile == INVALID_HANDLE_VALUE)
+ COMPlusThrow(...);
+
+ DWORD dwFileLen = SafeGetFileSize(hFile, 0);
+ if (dwFileLen == 0xffffffff)
+ COMPlusThrow(...);
+
+ return S_OK;
+
+The difference is that hFile is now a HandleHolder rather than a HANDLE and that there are no more explicit CloseHandle calls. That call is now implicit in the holder's destructor and executes no matter how control leaves the scope.
+
+HandleHolder exposes operator overloads so it can be passed to APIs expecting HANDLEs without casting and be compared to INVALID_HANDLE_VALUE. The wrapper knows that INVALID_HANDLE_VALUE is special and won't attempt to close it. The holder also has some safety features. If you declare it without initializing it, it will be autoinitialized to INVALID_HANDLE_VALUE. If you assign a new value to the holder, the current value will be destructed before it is overwritten.
+
+Suppose you want to auto-close the handle if an error occurs but keep the handle otherwise? Call SuppressRelease() on the holder object. The underlying handle can still be pulled out of the holder but the destructor will no longer close it.
+
+**Wrong:**
+
+ HANDLE hFile = ClrCreateFile(szFileName, GENERIC_READ, ...);
+ if (hFile == INVALID_HANDLE_VALUE) {
+ COMPlusThrow(...);
+ }
+ if (FAILED(SomeOperation())) {
+ CloseHandle(hFile);
+ COMPlusThrow(...);
+ }
+ return hFile;
+
+**Right:**
+
+ HandleHolder hFile = ClrCreateFile(szFileName, GENERIC_READ, ...);
+ if (hFile == INVALID_HANDLE_VALUE) {
+ COMPlusThrow(...);
+ }
+ if (FAILED(SomeOperation())) {
+ COMPlusThrow(...);
+ }
+ // No failures allowed after this!
+ hFile.SuppressRelease();
+ return hFile;
+
+### <a name="2.2.3"/>2.2.3 Common Features of Holders
+
+All holders, no matter how complex or simple, offer these basic services:
+
+- When the holder goes out of scope, via an exception or normal flow, it invokes a RELEASE function supplied by the holder's designer. The RELEASE function is responsible for the cleanup.
+- A holder declared without an explicit initializer will be initialized to a default value. The precise value of the default is supplied by the holder's designer.
+- Holders know about "null" values. The holder guarantees never to call RELEASE or ACQUIRE on a null value. The designer can specify any number of null values or no null value at all.
+- Holders expose a public SuppressRelease() method which eliminates the auto-release in the destructor. Use this for conditional backout.
+- Holders also support an ACQUIRE method when a resource can be meaningfully released and reacquired (e.g. locks.)
+
+In addition, some holders derive from the Wrapper class. Wrappers are like holders but also implement operator overloads for type casting, assignment, comparison, etc. so that the holder proxies the object smart-pointer style. The HandleHolder object is actually a wrapper.
+
+### <a name="2.2.4"/>2.2.4 Where do I find a holder?
+
+First, look for a prebaked holder that does what you want. Some common ones are described below.
+
+If no existing holder fits your need, make one. If it's your first holder, start by reading [src\inc\holder.h][holder.h]. Decide if you want a holder or a wrapper. If you don't do much with a resource except acquire and release it, use a holder. Otherwise, you want the wrapper since its overloaded operators make it much easier to replace the resource variable with the wrapper.
+
+[holder.h]: https://github.com/dotnet/coreclr/blob/master/src/inc/holder.h
+
+Instantiate the holder or wrapper template with the required parameters. You must supply the data type being managed, the RELEASE function, the default value for uninitialized constructions, the IS_NULL function and the ACQUIRE function. Unless you're implementing a critical section holder, you can probably supply a NOP for ACQUIRE . Most resources can't be meaningfully released and reacquired so it's easier to allocate the resource outside the holder and pass it in through its constructor. For convenience, [holder.h][holder.h] defines a DoNothing<Type> template that creates a NOP ACQUIRE function for any given resource type. There are also convenience templates for writing RELEASE functions. See [holder.h][holder.h] for their definitions and examples of their use.
+
+Publish the holder in the most global header file possible. [src\inc\holder.h][holder.h] is ideal for OS-type resources. Otherwise, put it in the header file that owns the type being managed.
+
+### <a name="2.2.5"/>2.2.5 Can I bake my own holder?
+
+When we first put holders into the code, we encouraged developers to inherit from the base holder class rather than writing their own. But the reality has been that many holders only need destruction and SuppressRelease() and it's proven easier for developers to write them from scratch rather than try to master the formidable C++ template magic that goes on in [holder.h][holder.h] It is better that you write your own holders than give up the design pattern altogether because you don't want to tackle [holder.h].
+
+But however you decide to implement it, if you call your object a "holder", please make sure its external behavior conforms to the conventions listed above in "Common Features of Holders."
+
+### <a name="2.2.6"/>2.2.6 What if my backout code throws an exception?
+
+All holders wrap an implicit NOTHROW contract around your backout code. Thus, you must write your backout code only using primitives that are guaranteed not to throw. If you absolutely have no choice but to violate this (say, you're calling Release() on a COM object that you didn't write), you must catch the exception yourself.
+
+This may sound draconian but consider the real implications of throwing out of your backout code. Backout code, by definition, is code that must complete when throwing out of a block. If it didn't complete, there is no way to salvage the situation and still meet our reliability goals. Either something leaked or CLR state was left inconsistent.
+
+Often, you can avoid failures in backout code by designing a better data structure. For example, implementers of common data structures such as hash tables and collections should provide backout holders for undoing operations as inserts. When creating globally visible data structures such as EEClass objects, you should initialize the object in private and allocate everything needed before "publishing it." In some cases, this may require significant rethinking of your data structures and code. But the upshot is that you won't have to undo global data structure changes in backout code.
+
+### <a name="2.2.7"/>2.2.7 Pay attention to holder initialization semantics
+
+Holders consistently release on destruction – that's their whole purpose. Sadly, we are not so consistent when it comes the initialization semantics. Some holders, such as the Crst holder, do an implicit Acquire on initialization. Others, such as the ComHolder do not (initializing a ComHolder does _not_ do an AddRef.) The BaseHolder class constructor leaves it up to the holder designer to make the choice. This is an easy source of bugs so pay attention to this.
+
+### <a name="2.2.8"/>2.2.8 Some generally useful prebaked holders
+
+#### <a name="2.2.8.1"/>2.2.8.1 New'ed memory
+
+**Wrong:**
+
+ Foo *pFoo = new Foo();
+ delete pFoo;
+
+**Right:**
+
+ NewHolder<Foo> pFoo = new Foo();
+
+#### <a name="2.2.8.2"/>2.2.8.2 New'ed array
+
+**Wrong:**
+
+ Foo *pFoo = new Foo[30];
+ delete pFoo;
+
+**Right:**
+
+ NewArrayHolder<Foo> pFoo = new Foo[30];
+
+#### <a name="2.2.8.3"/>2.2.8.3 COM Interface Holder
+
+**Wrong:**
+
+ IFoo *pFoo = NULL;
+ FunctionToGetRefOfFoo(&pFoo);
+ pFoo->Release();
+
+**Right:**
+
+ ComHolder<IFoo> pFoo; // declaring ComHolder does not insert AddRef!
+ FunctionToGetRefOfFoo(&pFoo);
+
+#### <a name="2.2.8.4"/>2.2.8.4 Critical Section Holder
+
+**Wrong:**
+ pCrst->Enter();
+ pCrst->Leave();
+
+**Right:**
+
+ {
+ CrstHolder(pCrst); //implicit Enter
+ } //implicit Leave
+
+## <a name="2.3"/>2.3 Does your code follow our OOM rules?
+
+### <a name="2.3.1"/>2.3.1 What is OOM and why is it important?
+
+OOM stands for "Out of Memory." The CLR must be fully robust in the face of OOM errors. For us, OOM is not an obscure corner case. SQL Server runs its processes in low-memory conditions as normal practice. OOM exceptions are a regular occurrence when hosted under SQL Server and we are required to handle every single one correctly.
+
+This means that:
+
+- Any operation that fails due to an OOM must allow future retries. This means any changes to global data structures must be rolled back and OOM exceptions cannot be cached.
+- OOM failures must be distinguishable from other error results. OOM's must never be transformed into some other error code. Doing so may cause some operations to cache the error and return the same error on each retry.
+- Every function must declare whether or not it can generate an OOM error. We cannot write OOM-safe code if we have no way to know what calls can generate OOM's. This declaration is done by the INJECT_FAULT and FORBID_FAULT contract annotations.
+
+### <a name="2.3.2"/>2.3.2 Documenting where OOM's can happen
+
+Sometimes, a code sequence requires that no opportunities for OOM occur. Backout code is the most common example. This can become hard to maintain if the code calls out to other functions. Because of this, it is very important that every function document in its contract whether or not it can fail due to OOM. We do this using the (poorly named) INJECT_FAULT and FORBID_FAULT annotations.
+
+To document that a function _can_ fail due to OOM:
+
+**Runtime-based (preferred)**
+
+ void AllocateThingie()
+ {
+ CONTRACTL
+ {
+ INJECT_FAULT(COMPlusThrowOM(););
+ }
+ CONTRACTL_END
+ }
+
+**Static**
+
+ void AllocateThingie()
+ {
+ STATIC_CONTRACT_FAULT;
+ }
+
+To document that a function _cannot_ fail due to OOM:
+
+**Runtime-based (preferred)**
+
+ BOOL IsARedObject()
+ {
+ CONTRACTL
+ {
+ FORBID_FAULT;
+ }
+ CONTRACTL_END
+ }
+
+**Static**
+
+ BOOL IsARedObject()
+ {
+ STATIC_CONTRACT_FORBID_FAULT;
+ }
+
+INJECT_FAULT()'s argument is the code that executes when the function reports an OOM. Typically this is to throw an OOM exception or return E_OUTOFMEMORY. The original intent for this was for our OOM fault injection test harness to insert simulated OOM's at this point and execute this line. At the moment, this argument is ignored but we may still employ this fault injection idea in the future so please code it appropriately.
+
+The CLR asserts if you invoke an INJECT_FAULT function under the scope of a FORBID_FAULT. All our allocation functions, including the C++ new operator, are declared INJECT_FAULT.
+
+#### <a name="2.3.2.1"/>2.3.2.1 Functions that handle OOM's internally
+
+Sometimes, a function handles an internal OOM without needing to notify the caller. For example, perhaps the additional memory was used to implement an internal cache but your function can still do its job without it. Or perhaps the function is a logging function in which case, it can silently NOP – the caller doesn't care. In such cases, wrap the allocation in the FAULT_NOT_FATAL holder which temporarily lifts the FORBID_FAULT state.
+
+ {
+ FAULT_NOT_FATAL();
+ pv = new Foo();
+ }
+
+FAULT_NOT_FATAL() is almost identical to a CONTRACT_VIOLATION() but the name indicates that it is by design, not a bug. It is analogous to TRY/CATCH for exceptions.
+
+#### <a name="2.3.2.2"/>2.3.2.2 OOM state control outside of contracts
+
+If you wish to set the OOM state for a scope rather than a function, use the FAULT_FORBID() holder. To test the current state, use the ARE_FAULTS_FORBIDDEN() predicate.
+
+#### <a name="2.3.2.3"/>2.3.2.3 Remember...
+
+- Do not use INJECT_FAULT to indicate the possibility of non-OOM errors such as entries not existing in a hash table or a COM object not supporting an interface. INJECT_FAULT indicates OOM errors and no other type.
+- Be very suspicious if your INJECT_FAULT() argument is anything other than throwing an OOM exception or returning E_OUTOFMEMORY. OOM errors must distinguishable from other types of errors so if you're merely returning NULL without indicating the type of error, you'd better be a simple memory allocator or some other function that will never fail for any reason other than an OOM.
+- THROWS and INJECT_FAULT correlate strongly but are independent. A NOTHROW/INJECT_FAULT combo might indicate a function that returns HRESULTs including E_OUTOFMEMORY. A THROWS/FORBID_FAULT however indicate a function that can throw an exception but not an OutOfMemoryException. While theoretically possible, such a contract is probably a bug.
+
+## <a name="2.4"/>2.4 Are you using SString and/or the safe string manipulation functions?
+
+The native C implementation of strings as raw char* buffers is a well-known breeding ground for buffer overflow bugs. While acknowledging that there's still a ton of legacy char*'s in the code, new code and new data structures should use the SString class rather than raw C strings whenever possible.
+
+### <a name="2.4.1"/>2.4.1 SString
+
+SString is the abstraction to use for unmanaged strings in CLR code. It is important that as much code is possible uses the SString abstraction rather than raw character arrays, because of the danger of buffer overrun related to direct manipulation of arrays. Code which does not use SString must be manually reviewed for the possibility of buffer overrun or corruption during every security review.
+
+This section will provide an overview for SString. For specific details on methods and use, see the file [src\inc\sstring.h][sstring.h]. SString has been in use in our codebase for quite a few years now so examples of its use should be easy to find.
+
+[sstring.h]: https://github.com/dotnet/coreclr/blob/master/src/inc/sstring.h
+
+An SString object represents a Unicode string. It has its own buffer which it internally manages. The string buffer is typically not referenced directly by user code; instead the string is manipulated indirectly by methods defined on SString. Ultimately there are several ways to get at the raw string buffer if such functionality is needed to interface to existing APIs. But these should be used only when absolutely necessary.
+
+When SStrings are used as local variables, they are typically used via the StackSString type, which uses a bit of stack space as a preallocated buffer for optimization purposes. When SStrings are use in structures, the SString type may be used directly (if it is likely that the string will be empty), or through the InlineSString template, which allows an arbitrary amount of preallocated space to be declared inline in the structure with the SString. Since InlineSStrings and StackSStrings are subtypes of SString, they have the same API, and can be passed wherever an SString is required.
+
+As parameters, SStrings should always be declared as reference parameters. Similarly, SStrings as return value should also use a "by reference" style.
+
+An SString's contents can be initialized by a "raw" string, or from another SString. A WCHAR based string is assumed to be Unicode, but for a CHAR based string, you must specify the encoding by explicitly invoking one of the tagged constructors with the appropriate encoding (currently Utf8, Ansi, or Console).
+
+In addition, you can specially call out string literals with the Literal tag – this allows the SString implementation to make some additional optimizations during copying and allocation operations. It's important to only use this take for actual read-only compiler literals. Never use them for other strings which might be freed or modified in the future, even if you believe the SString's lifetime will be shorter than the buffer's.
+
+SStrings' contents are typically operated on through use of one of the iterators provided. A SString::CIterator (obtained from a const SString), is used to look at but not change the string. A SString::Iterator (obtained from a non-const SString) should be used when the string will be modified. Note that it is a slightly heavier operation to create a non-const Iterator, so you should use CIterator when appropriate.
+
+Either kind of iterator acts like (through the magic of C++ operator overloading) a pointer into the string buffer. Performance is also similar, although it may be slightly reduced. An iterator also has similar lifetime constraints to a buffer pointer – if an SString changes sizes, the existing iterators on it cease to be valid. (Fortunately the iterator infrastructure provides some explicit checks to aid in enforcement of this constraint, in a checked build.)
+
+If you need to use the string in the context of an external API (either to get the string's contents to pass out, or to use the SString as a buffer to receive a return result.), you may use one of the conversion APIs. Read-only use of the buffer is provided via a simple API; however if you need to write to the string's buffer, you must use an Open/Close call pair around the operation.
+
+For easy creation of an SString for a string literal, use the SL macro. This can be used around either a normal (ASCII characters only) or wide string constant.
+
+## <a name="2.5"/>2.5 Are you using safemath.h for pointer and memory size allocations?
+
+Integer overflow bugs are an insidious source of buffer overrun vulnerabilities.Here is a simple example of how such a bug can occur:
+
+ void *pInput = whatever;
+ UINT32 cbSizeOfData = GetSizeOfData();
+ UINT32 cbAllocSize = SIZE_OF_HEADER + cbSizeOfData;
+ void *pBuffer = Allocate(cbAllocSize);
+ memcpy(pBuffer + SIZE_OF_HEADER, pInput, cbSizeOfData);
+
+If GetSizeOfData() obtains its result from untrusted data, it could return a huge value just shy of UINT32_MAX. Adding SIZE_OF_HEADER causes a silent overflow, resulting in a very small (and incorrect) value being passed to Allocate() which dutifully returns a short buffer. The memcpy, however, copies a huge number of bytes and overflows the buffer.
+
+The source of the bug is clear. The code should have checked if adding SIZE_OF_HEADER and cbSizeOfData overflowed before passing it to Allocate().
+
+We have now standardized on an infrastructure for performing overflow-safe arithmetic on key operations such as calculating allocation sizes. This infrastructure lives in [clr\src\inc\safemath.h][safemath.h].
+
+The _safe_ version of the above code follows:
+
+ #include "safemath.h"
+
+ void *pInput = whatever;
+ S_UINT32 cbSizeOfData = S_UINT32(GetSizeOfData());
+ S_UINT32 cbAllocSize = S_UINT32(SIZE_OF_HEADER) + cbSizeOfData;
+ if (cbAllocSize.IsOverflow())
+ {
+ return E_OVERFLOW;
+ }
+ void *pBuffer = Allocate(cbAllocSize.Value());
+ memcpy(pBuffer + SIZE_OF_HEADER, pInput, cbSizeOfData);
+
+As you can see, the transformation consists of the following:
+
+- Replace the raw C++ integer type with the "S_" version.
+- Do the arithmetic as usual.
+- Call IsOverflow() on the _final_ result to see if an overflow occurred anytime during the calculations. It's not necessary to check intermediate results if multiple arithmetic operations are chained. [Safemath.h][safemath.h] will propagate the overflow state through the entire chain of operations.
+- If IsOverflow() returned false, then call Value() on the final result to get the raw integer back. Otherwise, there's no value to be returned – invoke your error handling code.
+
+As you'd expect, Value() asserts if IsOverflow() is true.
+
+As you might _not_ expect, Value() also asserts if you never called IsOverflow() to check – whether or not the result actually overflowed. This guarantees you won't forget the IsOverflow() check. If you didn't check, Value() won't give you the result.
+
+Currently, the "S_" types are available only for unsigned ints and SIZE_T. Check in [safemath.h][safemath.h] for what's currently defined. Also, only addition and multiplication are supported although other operations could be added if needed.
+
+**Key Takeaway: Use safemath.h for computing allocation sizes and pointer offsets.** Don't rely on the fact that the caller may have already validated the data. You never know what new paths might be added to your vulnerable code.
+
+**Key Takeaway: If you're working on existing code that does dynamic memory allocation, check for this bug.**
+
+**Key Takeaway: Do not roll your own overflow checks. Always use safemath.h.** Writing correct overflow-safe arithmetic code is harder than you might think (take a look at the implementation in [safemath.h][safemath.h] if you don't believe me.) Every unauthorized version is another security hotspot that has to be watched carefully. If safemath.h doesn't support the functionality you need, please get the functionality added to safemath.h rather than creating a new infrastructure.
+
+**Key Takeaway: Don't let premature perf concerns stop you from using safemath.h.** Despite the apparently complexity, the optimized codegen for this helper is very efficient and in most cases, at least as efficient as any hand-rolled version you might be tempted to create.
+
+**Note:** If you've worked on other projects that use the SafeInt class, you might be wondering why we don't do that here. The reason is that we needed something that could be used easily from exception-intolerant code.
+
+## <a name="2.6"/>2.6 Are you using the right type of Critical Section?
+
+Synchronization in the CLR is challenging because we must support the strong requirements of the CLR Hosting API. This has two implications:
+
+- Hosting availability goals require that we eliminate all races and deadlocks. We need to maintain a healthy process under significant load for weeks and months at a time. Miniscule races will eventually be revealed.
+- Hosting requires that we often execute on non-preemptively scheduled threads. If we block a non-preemptively scheduled thread, we idle a CPU and possibly deadlock the process.
+
+### <a name="2.6.1"/>2.6.1 Use only the official synchronization mechanisms
+
+First, the most important rule. If you learn nothing else here, learn this:
+
+> DO NOT BUILD YOUR OWN LOCK.
+
+A CLR host must be able to detect and break deadlocks. To do this, it must know at all times who owns locks and who is waiting to acquire a lock. If you bypass a host using your own mechanisms, or if even you use a host's events to simulate a lock, you will defeat a host's ability to trace and break deadlocks. You must also eschew the OS synchronization services such as CRITICAL_SECTION.
+
+We have the following approved synchronization mechanisms in the CLR:
+
+1. **Crst:** This is our replacement for the Win32 CRITICAL_SECTION. We should be using Crst's pretty much everywhere we need a lock in the CLR.
+2. **Events:** A host can provide event handles that replace the Win32 events.
+3. **InterlockedIncrement/Decrement/CompareExchange:** These operations may be used for lightweight ref-counting and initialization scenarios.
+
+Make sure you aren't using events to build the equivalent of a critical section. The problem with this is that we cannot identify the thread that "owns" the critical section and hence, the host cannot trace and break deadlocks. In general, if you're creating a situation that could result in a deadlock, even if only due to bad user code, you must ensure that a CLR host can detect and break the deadlock.
+
+### <a name="2.6.2"/>2.6.2 Using Crsts
+
+The Crst class ([crst.h][crst.h]) is a replacement for the standard Win32 CRITICAL_SECTION. It has all the properties and features of a CRITICAL_SECTION, plus a few extra nice features. We should be using Crst's pretty much everywhere we need a lock in the CLR.
+
+Crst's are also used to implement our locking hierarchy. Every Crst is placed into a numbered group, or _level_. A thread can only request a Crst whose level is lower than any Crst currently held by the thread. I.e., if a thread currently holds a level 3 Crst, it can try to enter a level 2 Crst, but not a level 4 Crst, nor a different level 3 Crst. This prevents the cyclic dependencies that lead to deadlocks.
+
+We used to assign levels manually, but this leads to problems when it comes time to add a new Crst type or modify an existing one. Since the assignment of levels essentially flattens the dependencies between Crst types into one linear sequence we have lost information on which Crst types really depend on each other (i.e. which types ever interact by being acquired simultaneously on one thread and in which order). This made it hard to determine where to rank a new lock in the sequence.
+
+Instead we now record the explicit dependencies as a set of rules in the src\inc\CrstTypes.def file and use a tool to automatically assign compatible levels to each Crst type. See CrstTypes.def for a description of the rule syntax and other instructions for updating Crst types.
+
+[crst.h]: https://github.com/dotnet/coreclr/blob/master/src/vm/crst.h
+
+### <a name="2.6.3"/>2.6.3 Creating Crsts
+
+To create a Crst:
+
+ Crst *pcrst = new Crst(type [, flags]);
+
+Where "type" is a member of the CrstType enumeration (defined in the automatically generated src\inc\CrstTypes.h file). These types indicate the usage of the Crst, particularly with regard to which other Crsts may be obtained simultaneously, There is a direct mapping for the CrstType to a level (see CrstTypes.h) though the reverse is not true.
+
+Don't create static instances of Crsts<sup>[2]</sup>. Use CrstStatic class for this purpose, instead.
+
+Simply define a CrstStatic as a static variable, then initialize the CrstStatic when appropriate:
+
+ g_GlobalCrst.Init(type"tag", level);
+
+A CrstStatic must be destroyed with the Destroy() method as follows:
+
+ g_GlobalCrst.Destroy();
+
+[2]: In fact, you should generally avoid use of static instances that require construction and destruction. This can have an impact on startup time, it can affect our shutdown robustness, and it will eventually limit our ability to recycle the CLR within a running process.
+
+### <a name="2.6.4"/>2.6.4 Entering and Leaving Crsts
+
+To enter or leave a crst, you must wrap the crst inside a CrstHolder. All operations on crsts are available only through the CrstHolder. To enter the crst, create a local CrstHolder and pass the crst as an argument. The crst is automatically released by the CrstHolder's destructor when control leaves the scope either normally or via an exception:
+
+ {
+ CrstHolder ch(pcrst); // implicit enter
+
+ ... do your thing... may also throw...
+
+ } // implicit leave
+
+**You can only enter and leave Crsts in preemptive GC mode.** Attempting to enter a Crst in cooperative mode will forcibly switch your thread into preemptive mode.
+
+If you need a Crst that you can take in cooperative mode, you must pass a special flag to the Crst constructor to do so. See the information about CRITSECT_UNSAFE_\* flags below. You will also find information about why it's preferable not to take Crsts in cooperative mode.
+
+You can also manually acquire and release crsts by calling the appropriate methods on the holder:
+
+ {
+ CrstHolder ch(pcrst); // implicit enter
+
+ ...
+ ch.Release(); // temporarily leave
+ ...
+ ch.Acquire(); // temporarily enter
+
+ } // implicit leave
+
+Note that holders do not let you nest Acquires or Releases. You will get an assert if you try. Introduce a new scope and a new holder if you need to do this.
+
+If you need to create a CrstHolder without actually entering the critical section, pass FALSE to the holder's "take" parameter like this:
+
+ {
+ CrstHolder ch(pcrst, FALSE); // no implicit enter
+
+ ...
+ } // no implicit leave
+
+If you want to exit the scope without leaving the Crst, call SuppressRelease() on the holder:
+
+ {
+ CrstHolder ch(pcrst); // implicit enter
+ ch.SuppressRelease();
+ } // no implicit leave
+
+### <a name="2.6.5"/>2.6.5 Other Crst Operations
+
+If you want to validate that you own no other locks at the same or lower level, assert the debug-only IsSafeToTake() method:
+
+ _ASSERTE(pcrst->IsSafeToTake());
+
+Entering a crst always calls IsSafeToTake() for you but calling it manually is useful for functions that acquire a lock only some of the time.
+
+### <a name="2.6.6"/>2.6.6 Advice on picking a level for your Crst
+
+The point of giving your critical section a level is to help us prevent deadlocks by detecting cycles early in the development process. We try to group critical sections that protect low-level data structures and don't use other services into the lower levels, and ones that protect higher-level data structures and broad code paths into higher levels.
+
+If your lock is only protecting a single data structure, and if the methods accessing that data structure don't call into other CLR services that could also take locks, then you should give your lock the lowest possible level. Using the lowest level ensures that someone can't come along later and modify the code to start taking other locks without violating the leveling. This will force us to consider the implications of taking other locks while holding your lock, and in the end will lead to better code.
+
+If your lock is protecting large sections of code that call into many other parts of the CLR, then you need to give your lock a level high enough to encompass all the locks that will be taken. Again, try to pick a level as low as possible.
+
+Add a new definition for your level rather than using an existing definition, even if there is an existing definition with the level you need. Giving each lock its own level in the enum will allow us to easily change the levels of specific locks at a later time.
+
+### <a name="2.6.7"/>2.6.7 Can waiting on a Crst generate an exception?
+
+It depends.
+
+If you initialize the crst as CRST_HOST_BREAKABLE, any attempt to acquire the lock can trigger an exception (intended to kill your thread to break the deadlock.) Otherwise, you are guaranteed not to get an exception or failure. Regardless of the flag setting, releasing a lock will never fail.
+
+You can only use a non host-breakable lock if you can guarantee that that lock will never participate in a deadlock. If you cannot guarantee this, you must use a host-breakable lock and handle the exception. Otherwise, a CLR host will not be able to break deadlocks cleanly.
+
+There are several ways we enforce this.
+
+1. A lock that is CRST_UNSAFE_SAMELEVEL must be HOST_BREAKABLE: SAMELEVEL allows multiple locks at the same level to be taken in any order. This sidesteps the very deadlock avoidance that leveling provides.
+2. You cannot call managed code while holding a non-hostbreakable lock. We assume that you can't guarantee what the managed code will do. Thus, you can't guarantee that the managed code won't acquire user locks, which don't participate at all in the leveling scheme. User locks can be acquired in any order and before or after any internal CLR locks. Hence, you cannot guarantee that the lock won't participate in a deadlock cycle along with the user locks.
+
+You may be wondering why we invest so much effort into the discipline of deadlock avoidance, and then also require everyone to tolerate deadlock breaking by the host. Sometimes we are unhosted, so we must avoid deadlocks. Some deadlocks involve user code (like class constructors) and cannot be avoided. Some exceptions from lock attempts are due to resource constraints, rather than deadlocks.
+
+### <a name="2.6.8"/>2.6.8 CRITSECT_UNSAFE Flags
+
+By default, Crsts can only be acquired and released in preemptive GC mode and threads can only own one lock at any given level at a given time. Some locks need to bypass these restrictions. To do so, you must pass the appropriate flag when you create the critical section. (This is the optional third parameter to the Crst constructor.)
+
+**CRST_UNSAFE_COOPGC**
+
+If you pass this flag, it says that your Crst will always be taken in Cooperative GC mode. This is dangerous because you cannot allow a GC to occur while the lock is held<sup>[3]</sup>. Entering a coop mode lock puts your thread in ForbidGC mode until you leave the lock. For handy reference, some of the things you can't do in ForbidGC mode are:
+
+- Allocate managed memory
+- Call managed code
+- Enter a GC-safe point
+- Toggle the GC mode
+- Block for long periods of time
+- Synchronize with the GC
+- Call any other code that does these things
+
+**CRST_UNSAFE_ANYMODE**
+
+If you pass this flag, your Crst can be taken in either Cooperative or Preemptive mode. The thread's mode will not change as a result of taking the lock, however, it will be placed in a GCNoTrigger state. We have a set of assertions to try to ensure that you don't cause problems with the GC due to this freedom. These assertions are the famous "Deadlock situation" messages from our V1 code base. However, it's important to realize that these assertions do not provide full safety, because they rely on code coverage to catch your mistakes.
+
+Note that CRST_UNSAFE_COOPGC and CRST_UNSAFE_ANYMODE are mutually exclusive despite being defined as "or'able" bits.
+
+**CRST_UNSAFE_SAMELEVEL**
+
+All Crsts are ordered to avoid deadlock. The CRST_UNSAFE_SAMELEVEL flag weakens this protection by allowing multiple Crsts at the same level to be taken in any order. This is almost always a bug.
+
+I know of one legitimate use of this flag. It is the Crst that protects class construction (.cctors). The application can legally create cycles in class construction. The CLR has rules for breaking these cycles by allowing classes to see uninitialized data under well-defined circumstances.
+
+In order to use CRST_UNSAFE_SAMELEVEL, you should write a paragraph explaining why this is a legal use of the flag. Add this explanation as a comment to the constructor of your Crst.
+
+Under no circumstances may you use CRST_UNSAFE_SAMELEVEL for a non-host-breakable lock.
+
+[3] More precisely, you cannot allow a GC to block your thread at a GC-safe point. If it does, the GC could deadlock because the GC thread itself blocks waiting for a third cooperative mode thread to reach its GC-safe point... which it can't do because it's trying to acquire the very lock that your first thread owns. This wouldn't be an issue if acquiring a coop-mode lock was itself a GC-safe point. But too much code relies on this not being a GC-safe point to fix this easily
+
+### <a name="2.6.9"/>2.6.9 Bypassing leveling (CRSTUNORDEREDnordered)
+
+CrstUnordered (used in rules inside CrstTypes.def) is a special level that says that the lock does not participate in any of the leveling required for deadlock avoidance. This is the most heinous of the ways you can construct a Crst. Though there are still some uses of this in the CLR, it should be avoided by any means possible.
+
+### <a name="2.6.10"/>2.6.10 So what _are_ the prerequisites and side-effects of entering a Crst?
+
+The following matrix lists the effective contract and side-effects of entering a crst for all combinations of CRST_HOST_BREAKABLE and CRST_UNSAFE_\* flags. The SAMELEVEL flag has no effect on any of these parameters.
+
+| | Default | CRST_HOST_BREAKABLE |
+| ------------------- | ----------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
+| Default | NOTHROW<br> FORBID_FAULT<br>GC_TRIGGERS<br>MODE_ANY<br>(switches thread to preemptive) | THROWS<br>INJECT_FAULT<br>GC_TRIGGERS<br>MODE_ANY<br>(switches thread to preemptive) |
+| CRST_UNSAFE_COOPGC | NOTHROW<br>FORBID_FAULT<br>GC_NOTRIGGER<br>MODE_COOP<br>(puts thread in GCNoTrigger mode) | THROWS<br>INJECT_FAULT<br>GC_NOTRIGGER<br>MODE_COOP<br>(puts thread in GCNoTrigger mode) |
+| CRST_UNSAFE_ANYMODE | NOTHROW<br>FORBID_FAULT<br>GC_NOTRIGGER<br>MODE_ANY<br>(puts thread in GCNoTrigger mode) | THROWS<br>INJECT_FAULT<br>GC_NOTRIGGER<br>MODE_ANY<br>(puts thread in GCNoTrigger mode) |
+
+### <a name="2.6.11"/>2.6.11 Using Events and Waitable Handles
+
+In typical managed app scenarios, services like WszCreateEvent are thin wrappers over OS services like ::CreateEvent. But in hosted scenarios, these calls may be redirected through an abstraction layer to the host. If that's the case, they may return handles that behave somewhat like OS events, but do not support coordination with unmanaged code. Nor can we provide WaitForMultipleHandles support on these handles. You are strictly limited to waiting on a single handle.
+
+If you need to coordinate with unmanaged code, or if you need to do WaitForMultipleHandles ANY/ALL, you will have to avoid WszCreateEvent. If you really know what you are doing, go directly to the OS to obtain these handles. Everyone else should seek advice from someone who thoroughly understands the implications to our host. Obviously the general rule is that everyone should go through our hosted abstraction.
+
+Sometimes you might find yourself building the equivalent of a critical section, but using an event directly. The problem here is that we cannot identify the thread that owns the lock, because the owner isn't identified until he "leaves'" the lock by calling SetEvent or Pulse. Consider whether a Crst might be more appropriate.
+
+### <a name="2.6.12"/>2.6.12 Do not get clever with "lockless" reader-writer data structures
+
+Earlier, we had several hashtable structures that attempted to be "clever" and allow lockless reading. Of course, these structures didn't take into account multiprocessors and the other memory models. Even on single-proc x86, stress uncovered exotic race conditions. This wasted a lot of developer time debugging stress crashes.
+
+We finally stopped being clever and added proper synchronization, with no serious perf degradation.
+
+So if you are tempted to get clever in this way again, **stop and do something else until the urge passes.**
+
+### <a name="2.6.13"/>2.6.13 Yes, your thread could be running non-preemptively!
+
+Under hosted scenarios, your thread could actually be scheduled non-preemptively (do not confuse this with "GC preemptive mode.".) Blocking a thread without yielding back to the host could have consequences ranging from CPU starvation (perf) to an actual deadlock. You are particularly vulnerable when calling OS apis that block.
+
+Unfortunately, there is no official list of "safe" OS apis. The safest approach is to stick to the officially approved synchronization mechanisms documented in this chapter and be extra careful when invoking OS api.
+
+### <a name="2.6.14"/>2.6.14 Dos and Don'ts for Synchronization
+
+- Don't build your own lock or use OS locks. Only use Crst or host events and waitable handles. A host must know who owns what to detect and break deadlocks.
+- Don't use events to simulate locks or any other synchronization mechanism that could lead to deadlocks. Again, if a host doesn't know about a deadlock situation, it can't break it.
+- Don't use a CRITICAL_SECTION anywhere inside the CLR. Use Crst. One exception. If there are bootstrap or shutdown issues that require synchronization beyond the period when the CLR is initialized, you may use CRITICAL_SECTION (e.g. g_LockStartup).
+- Do pick the lowest possible level for your Crst.
+- Don't create static instances of Crst. Use CrstStatic instead.
+- Do assert IsSafeToTake() if your function only takes a crst some of the time.
+- Do use the default Crst rather than the CRST_UNSAFE_\* alternatives. They're named that for a reason.
+- Do choose correctly between host-breakable and non-breakable crsts. Crsts that don't protect calls to managed code and participate fully in the leveling scheme can be non-breakable. Otherwise, you must use breakable.
+- Don't take locks in cooperative mode if you can avoid it. This can delay or stall the GC. You are in a ForbidGC region the entire time you hold the lock.
+- Don't block a thread without yielding back to the host. Your "thread" may actually be a nonpreemptive thread. Always stick to the approved synchronization primitives.
+- Do document your locking model. If your locking model involves protecting a resource with a critical section, maybe you don't have to mention that in a comment. But if you have an elaborate mechanism where half your synchronization comes from GC guarantees and being in cooperative mode, while the other half is based on taking a spin lock in preemptive mode – then you really need to write this down. Nobody (not even you) can debug or maintain your code unless you have left a detailed comment.
+
+## <a name="2.7"/>2.7 Are you making hidden assumptions about the order of memory writes?
+
+_Issues: X86 processors have a very predictable memory order that 64-bit chips or multiprocs don't observe. We've gotten burned in the past because of attempts to be clever at writing thread-safe data structures without crsts. The best advice here is "don't be so clever, the perf improvements usually don't justify the risk." (look for Vance's writeup on memory models for a start.) _
+
+## <a name="2.8"/>2.8 Is your code compatible with managed debugging?
+
+The managed debugging services have some very unique properties in the CLR, and take a heavy dependency on the rest of the system. This makes it very easy to break managed debugging without even touching a line of debugger code. Here are some key trivia and tips to help you play well with the managed-debugging services.
+
+Be aware of things that make the debugger subsystem different than other subsystems:
+
+- The debugger runs mostly out-of-process.
+- The debugger generally inspects things at a very intimate level. For example, the debugger can see private fields, the offsets of those fields, and what registers an object may be stored in.
+- The debugger needs to be able to stop and synchronize the debuggee, in a similar way as the GC. That means all those GC-contracts, GC-triggers, GC-toggling, etc, may heavily affect the debugger's synchronization too.
+- Whereas most subsystems can just patiently wait for a GC to complete, the debugger will need to do complicated work during a GC-suspension.
+
+Here are some immediate tips for working well with the managed-debugging services:
+
+- Check if you need to DAC-ize your code for debugging! DACizing means adding special annotations so that the debugger can re-use your code to read key CLR data structures from out-of-process. This is especially applicable for code that inspects runtime data structures (running callstacks; inspecting a type; running assembly or module lists; enumerating jitted methods; doing IP2MD lookups; etc). Code that will never be used by the debugger does not have to be DAC-ized. However, when in doubt, it's safest to just DAC-ize your code.
+- Don't disassemble your own code. Breakpoints generally work by writing a "break opcode" (int3 on x86) into the instruction stream. Thus when you disassemble your code, you may get the breakpoint opcode instead of your own original opcode. Currently, we have to workaround this by having all runtime disassembly ask the debugger if there's a break opcode at the targeted address, and that's painful.
+- Avoid self-modifying code. Avoid this for the same reasons that you shouldn't disassemble your own code. If you modify your own code, that would conflict with the debugger adding breakpoints there.
+- Do not change behavior when under the debugger. An app should behave identically when run outside or under the debugger. This is absolutely necessary else we get complaints like "my program only crashes when run under the debugger". This is also necessary because somebody may attach a debugger to an app after the fact. Specific examples of this:
+ - Don't assume that just because an app is under the debugger that somebody is trying to debug it.
+ - Don't add additional run-time error checks when under the debugger. For example, avoid code like: if ((IsDebuggerPresent() && (argument == null)) { throw MyException(); }
+ - Avoid massive perf changes when under the debugger. For example, don't use an interpreted stub just because you're under the debugger. We then get bugs like [my app is 100x slower when under a debugger](http://blogs.msdn.com/b/jmstall/archive/2006/01/17/pinvoke-100x-slower.aspx).
+ - Avoid algorithmic changes. For example, do not make the JIT generate non-optimized code just because an app is under the debugger. Do not make the loader policy resolve to a debuggable-ngen image just because an app is under the debugger.
+- Separate your code into a) side-effect-free (non-mutating) read-only accessors and b) functions that change state. The motivation is that the debugger needs to be able to read-state in a non-invasive way. For example, don't just have GetFoo() that will lazily create a Foo if it's not available. Instead, split it out like so:
+ - GetFoo() - fails if a Foo does not exist. Being non-mutating, this should also be GC_NOTRIGGER. Non-mutating will also make it much easier to DAC-ize. This is what the debugger will call.
+ - and GetOrCreateFoo() that is built around GetFoo(). The rest of the runtime can call this.
+ - The debugger can then just call GetFoo(), and deal with the failure accordingly.
+- If you add a new stub (or way to call managed code), make sure that you can source-level step-in (F11) it under the debugger. The debugger is not psychic. A source-level step-in needs to be able to go from the source-line before a call to the source-line after the call, or managed code developers will be very confused. If you make that call transition be a giant 500 line stub, you must cooperate with the debugger for it to know how to step-through it. (This is what StubManagers are all about. See [src\vm\stubmgr.h](https://github.com/dotnet/coreclr/blob/master/src/vm/stubmgr.h)). Try doing a step-in through your new codepath under the debugger.
+- **Beware of timeouts** : The debugger may completely suspend your process at arbitrary points. In most cases, the debugger will do the right thing (and suspend your timeout too), but not always. For example, if you have some other process waiting for info from the debuggee, it [may hit a timeout](http://blogs.msdn.com/b/jmstall/archive/2005/11/11/contextswitchdeadlock.aspx).
+- **Use CLR synchronization primitives (like Crst)**. In addition to all the reasons listed in the synchronization section, the CLR-aware primitives can cooperate with the debugging services. For example:
+ - The debugger needs to know when threads are modifying sensitive data (which correlates to when the threads lock that data).
+ - Timeouts for CLR synchronization primitives may operate better in the face of being debugged.
+- **Optimized != Non-debuggable:** While performance is important, you should make sure your perf changes do not break the debugger. This is especially important in stepping, which requires the debugger to know exactly where we are going to execute managed code. For example, when we started using IL stubs for reverse pinvoke calls in the .NET Framework 2, the debugger was no longer notified that a thread was coming back to managed code, which broke stepping. You can probably find a way to make your feature area debuggable without sacrificing performance.
+
+**Examples of dependencies** : Here's a random list of ways that the debugger depends on the rest of the runtime.
+
+- Debugger must be able to inspect CLR data-structures, so your code must be DAC-ized. Examples include: running a module list, walking the thread list, taking a callstack, recognizing stubs, and doing an IP2MD lookup. You can break the debugger by just breaking the DAC (changing DAC-ized code so that it is no longer dac-ized correctly).
+- Type-system: Debugger must be able to traverse the type-system.
+- Need notifications from VM: loader, exception, jit-complete, etc.
+- Anything that affects codegen (Emit, Dynamic language, IBC)s: the debugger needs to know where the code is and how it's laid out.
+- GC, threading – debugger must be GC-aware. For example, we must protect the user from trying to inspect the GC-heap in the middle of a GC. The debugger must also be able to do a synchronization that may compete with a GC-synchronization.
+- Step-in through a stub: Any time you add a new stub or new way of calling managed code, you might break stepping.
+- Versioning: You could write a debugger in managed code targeting CLR version X, but debugging a process that's loaded CLR version Y. Now that's some versioning nightmares.
+
+## <a name="2.9"/>2.9 Does your code work on 64-bit?
+
+### <a name="2.9.1"/>2.9.1 Primitive Types
+
+Because the CLR is ultimately compiled on several different platforms, we have to be careful about the primitive types which are used in our code. Some compilers can have slightly different declarations in standard header files, and different processor word sizes can require values to have different representations on different platforms.
+
+Because of this, we have gathered definition all of the "blessed" CLR types in a single header file, [clrtypes.h](https://github.com/dotnet/coreclr/blob/master/src/inc/clrtypes.h). In general, you should only use primitive types which are defined in this file. As an exception, you may use built-in primitive types like int and short when precision isn't particularly interesting.
+
+The types are grouped into several categories.
+
+- Fixed representation integral types. (INT8, UINT8, INT16, UINT16, INT32, UINT32, INT64, UINT64) These typedefs will always have the same representation on each platform. Each type is named with the number of bits in the representation.
+- Pointer sized integral types. (SIZE_T, SSIZE_T) These types will change size on platforms, depending on the native pointer size. Use SIZE_T whenever you need to cast pointers to and from integral types. SSIZE_T is the signed version of SIZE_T; use it if you are computing a difference of two arbitrary pointers.
+- Large count-sized integral types (COUNT_T, SCOUNT_T) These are used when you would normally use a SIZE_T or SSIZE_T on a 32 bit machine, but you know you won't ever need more than 32 bits, even on a 64 bit machine. Use this type where practical to avoid bloated data sizes.
+- Semantic content types: (BOOL, BYTE). Use these types to indicate additional semantic context to an integral type. BYTE indicates "raw data", and BOOL indicates a value which can be either TRUE or FALSE.
+- Character data types (CHAR, SCHAR, UCHAR, WCHAR, ASCII, ANSI, UTF8). These have fixed sizes and represent single characters in strings. CHAR may be either signed or unsigned. Note that CHAR/SCHAR/UCHAR specify no semantics about character set; use ASCII, ANSI, and UTF8 to indicate when a specific encoding is used.It is worth mentioning that manipulation of strings as raw character arrays is discouraged; instead code should use the SString class wherever possible.
+- Pointer to executable code PCODE. Use these for any pointers to (managed) executable code.
+
+All standard integral types have *_MIN and *_MAX values declared as well.
+
+## <a name="2.10"/>2.10 Does your function declare a CONTRACT?
+
+Every function in the CLR must declare a contract. A contract enumerates important behavioral facts such as whether a function throws or whether it can trigger gc. It also a general container for expressing preconditions and postconditions specific to that function.
+
+Contracts help determine which functions can safely invoke others. These constraints are enforced in two ways:
+
+- Statically, using a special tool that analyzes callgraphs and flags violations.
+- Runtime assertions.
+
+These two approaches are complementary. Static analysis is always preferable but the tool cannot reliably find all call paths and can not check custom preconditions. Runtime checks are only as good as our code coverage.
+
+Here is a typical contract:
+
+ LPVOID Foo(char *name, Blob *pBlob)
+ {
+ CONTRACTL
+ {
+ THROWS; // This function may throw
+ INJECT_FAULT(COMPlusThrowOM()); // This function may fail due to OOM
+ GC_TRIGGERS; // This function may trigger a GC
+ MODE_COOPERATIVE; // Must be in GC-cooperative mode to call
+ CAN_TAKE_LOCK; // This function may take a Crst, spinlock, etc.
+ EE_THREAD_REQUIRED; // This function expects an EE Thread object in the TLS
+ PRECONDITION(CheckPointer(name)); // Invalid to pass NULL
+ PRECONDITION(CheckPointer(pBlob, NULL_OK)); // Ok to pass NULL
+ }
+ CONTRACTL_END;
+
+ ...
+ }
+
+There are several flavors of contracts. This example shows the most common type (CONTRACTL, where "L" stands for "lite.")
+
+At runtime (on a checked build), the contract does the following:
+
+At the start of Foo(), it validates that it's safe to throw, safe to generate an out of memory error, safe to trigger gc, that the GC mode is cooperative, and that your preconditions are true.
+
+On a retail build, CONTRACT expands to nothing.
+
+### <a name="2.10.1"/>2.10.1 What can be said in a contract?
+
+As you can see, a contract is a laundry list of "items" that either assert some requirement on the current thread state or impose a requirement on downstream callees. The following is a whirlwind tour of the supported annotations. The nuances of each one are explained in more detail in their individual chapters.
+
+#### <a name="2.10.1.1"/>2.10.1.1 THROWS/NOTHROW
+
+Declares whether an exception can be thrown out of this function. Declaring **NOTHROW** puts the thread in a NOTHROW state for the duration of the function call. You will get an assert if you throw an exception or call a function declared THROWS. An EX_TRY/EX_CATCH construct however will lift the NOTHROW state for the duration of the TRY body.
+
+#### <a name="2.10.1.2"/>2.10.1.2 INJECT_FAULT(_handler-stmt_)/FORBID_FAULT
+
+This is a poorly named item. INJECT_FAULT declares that the function can **fail** due to an out of memory (OOM) condition. FORBID_FAULT means that the function promises never to fail due to OOM. FORBID_FAULT puts the thread in a FORBID_FAULT state for the duration of the function call. You will get an assert if you allocate memory (even with the C++ new operator) or call a function declared INJECT_FAULT.
+
+#### <a name="2.10.1.3"/>2.10.1.3 GC_TRIGGERS/GC_NOTRIGGER
+
+Declares whether the function is allowed to trigger a GC. GC_NOTRIGGER puts the thread in a NOTRIGGER state where any call to a GC_TRIGGERS function will assert.
+
+**Observation:** THROWS does not necessarily imply GC_TRIGGERS. COMPlusThrow does not trigger GC.
+
+#### <a name="2.10.1.4"/>2.10.1.4 MODE_PREEMPTIVE/ MODE_COOPERATIVE/ MODE_ANY
+
+This item asserts that the thread is in a particular mode or declares that the function is mode-agnostic. It does not change the state of the thread in any way.
+
+#### <a name="2.10.1.5"/>2.10.1.5 LOADS_TYPE(_loadlevel_)
+
+This item asserts that the function may invoke the loader and cause a type to loaded up to (and including) the indicated loadlevel. Valid load levels are taken from ClassLoadLevel enumerationin [classLoadLevel.h](https://github.com/dotnet/coreclr/blob/master/src/vm/classloadlevel.h).
+
+The CLR asserts if any attempt is made to load a type past the current limit set by LOADS_TYPE. A call to any function that has a LOADS_TYPE contract is treated as an attempt to load a type up to that limit.
+
+#### <a name="2.10.1.6"/>2.10.1.6 CAN_TAKE_LOCK / CANNOT_TAKE_LOCK
+
+These declare whether a function or callee takes any kind of EE or user lock: Crst, SpinLock, readerwriter, clr critical section, or even your own home-grown spin lock (e.g., ExecutionManager::IncrementReader).
+
+In TLS we keep track of the current intent (whether to lock), and actual reality (what locks are actually taken). Enforcement occurs as follows:
+
+[contract.h]: https://github.com/dotnet/coreclr/blob/master/src/inc/contract.h
+
+- SCAN
+ - A CANNOT_TAKE_LOCK function calling a CAN_TAKE_LOCK function is illegal (just like THROWS/NOTHROWS)
+- Dynamic checking:
+ - A CANNOT_TAKE_LOCK function calling a CAN_TAKE_LOCK function is illegal
+ - *_LOCK_TAKEN / *_LOCK_RELEASED macros (contract.h):
+ - Sprinkled at all places we take/release actual or conceptual locks
+ - Asserts if taking a lock in a CANNOT_TAKE_LOCK scope
+ - Keeps count of locks currently taken by thread
+ - Remembers stack of lock pointers for diagnosis
+ - ASSERT_NO_EE_LOCKS_HELD(): Handy way for you to verify no locks are held right now on this thread (i.e., lock count == 0)
+
+#### <a name="2.10.1.7"/>2.10.1.7 EE_THREAD_REQUIRED / EE_THREAD_NOT_REQUIRED
+
+These declare whether a function or callee deals with the case "GetThread() == NULL".
+
+EE_THREAD_REQUIRED simply asserts that GetThread() != NULL.
+
+EE_THREAD_NOT_REQUIRED is a noop by default. You must "set COMPlus_EnforceEEThreadNotRequiredContracts=1" for this to be enforced. Setting the envvar forces a C version of GetThread() to be used, instead of the optimized assembly versions. This C GetThread() always asserts in an EE_THREAD_NOT_REQUIRED scope regardless of whether there actually is an EE Thread available or not. The reason is that if you claim you don't require an EE Thread, then you have no business asking for it (even if you get lucky and there happens to be an EE Thread available).
+
+Of course, there are exceptions to this. In particular, if there is a clear code path for GetThread() == NULL, then it's ok to call GetThread() in an EE_THREAD_NOT_REQUIRED scope. You declare your intention by using GetThreadNULLOk():
+
+ Thread* pThread = GetThreadNULLOk();
+ if (pThread != NULL)
+ {
+ pThread->m_dwAVInRuntimeImplOkayCount++;
+ }
+
+Rule: You should only use GetThreadNULLOk if it is patently obvious from the call site that NULL is dealt with directly. Obviously, this would be bad:
+
+ GetThreadNULLOk()->BeginCriticalRegion();
+
+This is also frowned upon, as it's unclear whether a NULL Thread is handled:
+
+ MyObj myObj(GetThreadNULLOk());
+
+In more complex situations, a caller may be able to vouch for an EE Thread's existence, while its callee cannot. So you can set up a scope that temporarily stops doing the EE_THREAD_NOT_REQUIRED verification as follows:
+
+ CONTRACTL
+ {
+ EE_THREAD_NOT_REQUIRED;
+ } CONTRACTL_END;
+
+ Thread* pThread = GetThreadNULLOk();
+ if (pThread == NULL)
+ return;
+
+ // We know there's an EE Thread now, so it's safe to call GetThread()
+ // and expect a non-NULL return.
+ BEGIN_GETTHREAD_ALLOWED;
+ CallCodeThatRequiresThread();
+ END_GETTHREAD_ALLOWED;
+
+BEGIN/END_GETTHREAD_ALLOWED simply instantiate a holder that temporarily disables the assert on each GetThread() call. A non-holder version is also available which can generate less code if you're wrapping a NOTHROW region: BEGIN/END_GETTHREAD_ALLOWED_IN_NO_THROW_REGION. In fact, GetThreadNULLOk() is implemented by just calling GetThread() from within a BEGIN/END_GETTHREAD_ALLOWED_IN_NO_THROW_REGION block.
+
+You should only use BEGIN/END_GETTHREAD_ALLOWED(_IN_NO_THROW_REGION) if:
+
+- It is provably impossible for GetThread() to ever return NULL from within that scope, or
+- All code within that scope directly deals with GetThread()==NULL.
+
+If the latter is true, it's generally best to push BEGIN/END_GETTHREAD_ALLOWED down the callee chain so all callers benefit.
+
+#### <a name="2.10.1.8"/>2.10.1.8 SO_TOLERANT/SO_INTOLERANT
+
+These are related to stack probes. SO_TOLERANT means the function is written in such a way that it is safe to throw a StackOverflow exception between any two instructions. It doesn't update global state, doesn't modify data structures, and doesn't call out to the operating system.
+
+If you don't specify SO_TOLERANT, the function is treated as SO_INTOLERANT.
+
+The CLR asserts if you invoke an SO_INTOLERANT function outside the scope of a stack probe. The probe's purpose is to check in advance if sufficient stack is available and trigger the SO exception before venturing into SO_INTOLERANT code.
+
+#### <a name="2.10.1.9"/>2.10.1.9 PRECONDITION(_expr_)
+
+This is pretty self-explanatory. It is basically an **_ASSERTE.** Both _ASSERTE's and PRECONDITIONS are used widely in the codebase. The expression can evaluate to either a Boolean or a Check.
+
+#### <a name="2.10.1.10"/>2.10.1.10 POSTCONDITION(_expr_)
+
+This is an expression that's tested on a _normal_ function exit. It will not be tested if an exception is thrown out of the function. Postconditions can access the function's locals provided that the locals were declared at the top level scope of the function. C++ objects will not have been destructed yet.
+
+Because of the limitations of our macro infrastructure, this item imposes some syntactic ugliness into the function. More on this below.
+
+### <a name="2.10.2"/>2.10.2 Is order important?
+
+Preconditions and postconditions will execute in the order declared. The "intrinsic" items will execute before any preconditions regardless of where they appear.
+
+### <a name="2.10.3"/>2.10.3 Using the right form of contract.
+
+Contracts come in several forms:
+
+- CONTRACTL: This is the most common type. It does runtime checks as well as being visible to the static scanner. It is suitable for all runtime contracts except those that use postconditions. When in doubt, use this form.
+- CONTRACT(returntype): This is an uglier version that's needed if you include a POSTCONDITION. You must supply the correct function return type for this form and it cannot be "void" (use CONTRACT_VOID instead.) You must also use the special RETURN macro rather than the normal return keyword.
+- CONTRACT_VOID: Use this if you need a postcondition and the return type is void. CONTRACT(void) will not work.
+- STATIC_CONTRACT_\*: This form generates no runtime code but still emits the hidden tags visible to the static contract scanner. Use this only if checked build perf would suffer greatly by putting a runtime contract there or if for some technical reason, the runtime-based contract is not possible..
+- LIMITED_METHOD_CONTRACT: A static contract equivalent to NOTHROW/GC_NOTRIGGER/FORBID_FAULT/MODE_ANY/CANNOT_TAKE_LOCK. Use this form only for trivial one-liner functions. Remember it does not do runtime checks so it should not be used for complex functions.
+- WRAPPER_NO_CONTRACT: A static no-op contract for functions that trivially wrap another. This was invented back when we didn't have static contracts and we now wish it hadn't been invented. Please don't use this in new code.
+
+### <a name="2.10.4"/>2.10.4 When is it safe to use a runtime contract?
+
+Contracts do not require that current thread have a Thread structure. Even those annotations that explicitly check Thread bits (the GC and MODE annotations) will correctly handle the NULL ThreadState case.
+
+Contracts can and are used outside of the files that build CLR. However, the GC_TRIGGERS and MODE family of items is not available outside of CLR.
+
+You cannot use runtime contracts if:
+
+- Your code is callable from the implementation of FLS (Fiber Local Storage). This may result in an infinite recursion as the contract infrastructure itself uses FLS.
+- Your code makes a net change to the ClrDebugState. Only the contract infrastructure should be doing this but see below for more details.
+
+### <a name="2.10.5"/>2.10.5 Do not make unscoped changes to the ClrDebugState.
+
+The ClrDebugState is the per-thread data structure that houses all of the flag bits set and tested by contracts (i.e. NOTHROW, NOTRIGGER.). You should never modify this data directly. Always go through contracts or the specific holders (such as GCX_NOTRIGGER.)
+
+This data is meant to be changed in a scoped manner only. In particular, the CONTRACT destructor always restores the _entire_ ClrDebugState from a copy saved on function entry. This means that any net changes made by the function body itself will be wiped out when the function exits via local _or_ non-local control. The same caveat is true for holders such as GCX_NOTRIGGER.
+
+### <a name="2.10.6"/>2.10.6 For more details...
+
+See the big block comment at the start of [src\inc\contract.h][contract.h].
+
+## <a name="2.11"/>2.11 Is your code DAC compliant?
+
+At a high level, DAC is a technique to enable execution of CLR algorithms from out-of-process (eg. on a memory dump). Core CLR code is compiled in a special mode (with DACCESS_COMPILE defined) where all pointer dereferences are intercepted.
+
+Various tools (most notably the debugger and SOS) rely on portions of the CLR code being properly "DACized". Writing code in this way can be tricky and error-prone. Use the following references for more details:
+
+- The best documentation is in the code itself. See the large comments at the top of [src\inc\daccess.h](https://github.com/dotnet/coreclr/blob/master/src/inc/daccess.h).
diff --git a/Documentation/coding-guidelines/clr-jit-coding-conventions.md b/Documentation/coding-guidelines/clr-jit-coding-conventions.md
new file mode 100644
index 0000000..0bab276
--- /dev/null
+++ b/Documentation/coding-guidelines/clr-jit-coding-conventions.md
@@ -0,0 +1,2001 @@
+# CLR JIT Coding Conventions
+May 2015
+
+# Overview
+
+Consistent code conventions are important for several reasons:
+
+* *Most importantly:* To make it easy to read and understand the code. Remember: you may be the only one writing the code initially, but many people will need to read, understand, modify, and fix the code over its lifetime. These conventions attempt to make this task much easier. If the code consistently follows these standards, it improves developer understanding across the entire source base.
+* To make it easy to debug the code, with both the Visual Studio and windbg debuggers. It should be easy to set breakpoints, view locals, and display and view data structures.
+* To make it easier to search and browse the code, both with Visual Studio IntelliSense, and with simple tools like "grep" and Notepad. This implies, for example, that names must be unique, and not require C++ parser intelligence (for example, Visual Studio IntelliSense) to distinguish.
+* To attempt to improve code quality through consistency, and requiring patterns that are less likely to result in bugs either initially, or after code modification.
+
+For these reasons, C++ code that is part of the Common Language Runtime (CLR) Just-In-Time compiler (JIT) will follow these conventions.
+
+Note that these conventions are different from the CLR C++ Coding Conventions, documented elsewhere and used for the VM code, though they do have strong similarities. This is due to historical differences in the JIT and VM code, the teams involved in forming these conventions, as well as to technical differences in the code itself.
+
+> Note: the JIT currently doesn't follow some of these conventions very widely. The non-conformant code should be updated, eventually.
+
+> Note: we now use jit-format to format our code. All changes it makes supersede the conventions in this doc. Please see the [jit-format documentation](https://github.com/dotnet/jitutils/blob/master/doc/getstarted.md#formatting-jit-source) for instructions on running jit-format.
+
+# How to use this document
+
+* All new code written in the JIT should adhere to these conventions.
+* Existing code that does not follow these conventions should be converted to these conventions when it is modified.
+ * Typically, conversion to these conventions would be done on a function basis.
+ * You need to balance this suggestion against the very real value of submitting a minimal change for any individual fix or feature. Consider doing convention changes as a separate change, to avoid polluting the changes for a bug fix (or other change) with convention changes (which would make it harder to identify exactly which changes are strictly required for the bug fix).
+* Code reviewers should look for adherence to the conventions.
+
+# Contents
+
+* [4 Principles](#4)
+* [5 Spaces, not tabs](#5)
+* [6 Source code line width](#6)
+* [7 Commenting](#7)
+ * [7.1 General](#7.1)
+ * [7.1.1 Comment style](#7.1.1)
+ * [7.1.2 Spelling and grammar](#7.1.2)
+ * [7.1.3 Bug IDs](#7.1.3)
+ * [7.1.4 Email names](#7.1.4)
+ * [7.1.5 TODO](#7.1.5)
+ * [7.1.6 Performance](#7.1.6)
+ * [7.2 File header comment](#7.2)
+ * [7.3 Commenting code blocks](#7.3)
+ * [7.4 Commenting variables](#7.4)
+ * [7.5 Commenting `#ifdefs`](#7.5)
+* [8 Naming Conventions](#8)
+ * [8.1 General](#8.1)
+ * [8.2 Hungarian or other prefix/postfix naming](#8.2)
+ * [8.3 Macro names](#8.3)
+ * [8.4 Local variables](#8.4)
+ * [8.5 Global variables](#8.5)
+ * [8.6 Function parameters](#8.6)
+ * [8.7 Non-static C++ member variables](#8.7)
+ * [8.8 Static member variables](#8.8)
+ * [8.9 Functions, including member functions](#8.9)
+ * [8.10 Classes](#8.10)
+ * [8.11 Enums](#8.11)
+* [9 Function Structure](#9)
+ * [9.1 In a header file](#9.1)
+ * [9.1.1 Comments for function declarations](#9.1.1)
+ * [9.2 In an implementation file](#9.2)
+ * [9.2.1 Function size](#9.2.1)
+ * [9.3 Function definitions](#9.3)
+ * [9.4 Function header comment](#9.4)
+ * [9.4.1 Example](#9.4.1)
+ * [9.5 Specific function information](#9.5)
+ * [9.5.1 Constructor with member initialization list](#9.5.1)
+* [10 Local Variable Declarations](#10)
+ * [10.1 Pointer declarations](#10.1)
+* [11 Spacing](#11)
+ * [11.1 Logical and arithmetic expressions](#11.1)
+ * [11.2 Continuing statements on multiple lines](#11.2)
+ * [11.3 Function call](#11.3)
+ * [11.4 Arrays](#11.4)
+* [12 Control Structures](#12)
+ * [12.1 Braces for `if`](#12.1)
+ * [12.2 Braces for looping structures](#12.2)
+ * [12.3 `switch` statements](#12.3)
+ * [12.4 Examples](#12.4)
+* [13 C++ Classes](#13)
+* [14 Preprocessor](#14)
+ * [14.1 Conditional compilation](#14.1)
+ * [14.1.1 `#if FEATURE`](#14.1.1)
+ * [14.1.2 Disabling code](#14.1.2)
+ * [14.1.3 Debug code](#14.1.3)
+ * [14.2 `#define` constants](#14.2)
+ * [14.3 Macro functions](#14.3)
+ * [14.3.1 Macro functions versus C++ inline functions](#14.3.1)
+ * [14.3.2 Line continuation](#14.3.2)
+ * [14.3.3 Multi-statement macro functions](#14.3.3)
+ * [14.3.4 Control flow](#14.3.4)
+ * [14.3.5 Scope](#14.3.5)
+ * [14.3.6 Examples](#14.3.6)
+* [15 Language Usage Rules](#15)
+ * [15.1 C/C++ general](#15.1)
+ * [15.1.1 Casts](#15.1.1)
+ * [15.1.2 Globals](#15.1.2)
+ * [15.1.3 `bool` versus `BOOL`](#15.1.3)
+ * [15.1.4 `NULL` and `nullptr`](#15.1.4)
+ * [15.1.5 Use of zero](#15.1.5)
+ * [15.1.6 Nested assignment](#15.1.6)
+ * [15.1.7 `if` conditions](#15.1.7)
+ * [15.1.8 `const`](#15.1.8)
+ * [15.1.9 Ternary operators](#15.1.9)
+ * [15.1.10 Use of `goto`](#15.1.10)
+ * [15.2 Source file organization](#15.2)
+ * [15.3 Function declarations](#15.3)
+ * [15.3.1 Default arguments](#15.3.1)
+ * [15.3.2 Overloading](#15.3.2)
+ * [15.3.3 Enums versus primitive parameter types](#15.3.3)
+ * [15.3.4 Functions returning pointers](#15.3.4)
+ * [15.3.5 Reference arguments](#15.3.5)
+ * [15.3.6 Resource release](#15.3.6)
+ * [15.3.7 OUT parameters](#15.3.7)
+ * [15.4 STL usage](#15.4)
+ * [15.5 C++ class design](#15.5)
+ * [15.5.1 Public data members](#15.5.1)
+ * [15.5.2 Friend functions](#15.5.2)
+ * [15.5.3 Constructors](#15.5.3)
+ * [15.5.4 Destructors](#15.5.4)
+ * [15.5.5 Operator overloading](#15.5.5)
+ * [15.5.6 Copy constructor and assignment operator](#15.5.6)
+ * [15.5.7 Virtual functions](#15.5.7)
+ * [15.5.8 Inheritance](#15.5.8)
+ * [15.5.9 Global class objects](#15.5.9)
+ * [15.6 Exceptions](#15.6)
+ * [15.7 Code tuning for performance optimization](#15.7)
+ * [15.8 Obsoleting functions, classes and macros](#15.8)
+
+# <a name="4"/>4 Principles
+
+As stated above, the primary purpose of these conventions is to improve readability and understandability of the source code, by making it easier for any developer, now or in the future, to easily read, understand, and modify any portion of the source code.
+
+It is assumed that developers should be able to use the Visual Studio editor and debugger to the fullest extent possible. Thus, the conventions should allow us to leverage Visual Studio IntelliSense, editing and formatting, debugging, and so forth. The conventions will not preclude the use of other editors. For example, function declaration commenting style should be such that IntelliSense will automatically display that comment when typing that function name at a use site. Indenting style should be such that using Visual Studio automatic formatting rules creates correctly formatted code.
+
+# <a name="5"/>5 Spaces, not tabs
+
+Use spaces, not tabs. Files should not contain tab characters.
+
+Indenting is 4 characters per indent level.
+
+In Visual Studio, go to "Tools | Options ... | Text Editor | All Languages | Tabs", edit the tab size setting to be 4 spaces, and select the "Insert spaces" radio-button to enable conversion of tabs to spaces.
+
+# <a name="6"/>6 Source code line width
+
+A source code line should be limited to a reasonable length, so it fits in a reasonably-sized editor window without scrolling or wrapping. Consider 120 characters the baseline of reasonable, adjusted for per-site judgment.
+
+> Rationale: Modern widescreen monitors can easily display source files much wider than 120 characters, however we don't encourage (or allow) that for a number of reasons:
+>
+> 1. Very long lines tend to make the code more difficult to read
+> 2. If the need for long lines is because there is a lot of scope-based indentation, that is an indication that refactoring is necessary to reduce the number of nested scopes.
+> 3. Many people place other windows side-by-side with source code (such as additional source code windows, or Visual Studio tool windows like the Code Definition Window), or use side-by-side "diff" programs. Thus, making (most) code visible when viewed side-by-side, without scrolling, is advantageous.
+> 4. Even if there are occasional uses for wide lines, it is expected that most lines will be much shorter, leaving a considerable amount of wasted space that could be used for other purposes (see #3).
+
+Many editors support display of a vertical line at a specified column position. Enable this in your editor to easily know when you write past the specified maximum column position. Visual Studio has this feature if you install the "Productivity Power Tools": https://visualstudiogallery.msdn.microsoft.com/3a96a4dc-ba9c-4589-92c5-640e07332afd.
+
+# <a name="7"/>7 Commenting
+
+## <a name="7.1"/>7.1 General
+
+A comment should never just restate what the code does. Instead it should answer a question about the code, such as why, when, or how. Comments that say we must do something need to also state why we must do this.
+
+Avoid using abbreviations or acronyms; it harms readability.
+
+### <a name="7.1.1"/>7.1.1 Comment style
+
+We prefer end-of-line style `//` comments to original C `/* */` comments.
+
+One important exception is when adding a small comment within an argument list to help document the argument, e.g., `PrintIt(/* AlignIt */ true);` (However, see section FIXTHIS for the suggested alternative to the form.)
+
+### <a name="7.1.2"/>7.1.2 Spelling and grammar
+
+Check for spelling and grammar errors in your comments: just because you can understand the comment when you write it doesn't mean somebody else will parse it in the same way. Carefully consider the poor reader, especially those for whom English is not their first language.
+
+### <a name="7.1.3"/>7.1.3 Bug IDs
+
+Don't put the bug or issue identifier (ID) of a fixed bug or completed feature in the source code comments. Such IDs become obsolete when (and not if) the bug tracking system changes, and are an indirect source of information. Also, understanding the bug report might be difficult in the absence of fresh context. Rather, the essence of the bug fix should be distilled into an appropriate comment. The precise condition that a case covers should be specified in the comment, as for all code.
+
+In particular, putting a bug ID in a comment is often a shortcut people use to avoid writing a complete, descriptive comment about the code. The code and comments should stand alone.
+
+Bug IDs of active bugs may be used in the source code to prevent other people from having to debug the problem again, only to figure out that a bug has already been opened for it. These bug IDs are expected to be removed as soon as the bug is fixed.
+
+One thing that would be useful is for a particular case in code to be associated with a test case that exercises the case. This only makes sense if the case in question is localized to one or a few locations in the code, not pervasive and spread throughout the code. However, we don't currently have a good mechanism for referencing test cases (or other external metadata).
+
+### <a name="7.1.4"/>7.1.4 Email names
+
+Email names or full names should not be used in the source code as people move on to other projects, leave the company, leave another company when working on the JIT in the open source world, or simply stop working on the JIT for some reason. For example, a comment that states, "Talk to JohnDoe to understand this code" isn't helpful after JohnDoe has left the company or is otherwise not available.
+
+### <a name="7.1.5"/>7.1.5 TODO
+
+"TODO" comments in the code should be used to identify areas in the code that:
+
+* May require tuning for code quality (runtime performance) or throughput.
+* Need some cleanup for better maintainability or readability.
+* Are either known or thought possibly to have a bug.
+
+Tracking bugs should be associated with the TODO items, but they are also in the source so that they are visible to the Open Source community, which may not have access to the same bug database.
+
+This is the format to be used:
+
+```c++
+// TODO[-Arch][-Platform][-CQ|-Throughput|-Cleanup|-Bug|-Bug?]: description of the issue
+```
+
+* One type modifier (CQ, Throughput, Cleanup, Bug or Bug?) must be specified.
+* The -Arch and -Platform modifiers are optional, and should generally specify actual architectures in all-caps (e.g. AMD64, X86, ARM, ARM64), and then in Pascal casing for Platforms and architecture classes (e.g. –ARMArch, –LdStArch, –XArch, –Unix, –Windows).
+* This list is not intended to be exhaustive.
+
+Examples:
+```c++
+ // TODO-LdStArch-Bug: Should regTmp be a dst on the node or an internal reg?
+ // Either way, it is not currently being handled by Lowering.
+
+ // TODO-CQ: based on whether src type is aligned use movaps instead.
+
+ // TODO-Cleanup: Add a comment about why this is unreached() for RyuJIT backend.
+
+ // TODO-Arm64-Bug: handle large constants! Probably need something like the ARM
+ // case above: if (arm_Valid_Imm_For_Instr(ins, val)) ...
+```
+
+### <a name="7.1.6"/>7.1.6 Performance
+
+Be sure to comment the performance characteristics (memory and time) of an API, class, sensitive block of code, line of code that looks simple but actually does something complex, etc.
+
+## <a name="7.2"/>7.2 File header comment
+
+C and C++ source files (header files and implementation files) must include a file header comment at the beginning of the file that describes the file, gives the file owner, and gives some basic information about the purpose of the file, related documents, etc. The format of this header is as follows:
+
+```c++
+//
+// Copyright (c) Microsoft. All rights reserved.
+// Licensed under the MIT license. See LICENSE file in the project root for full license information.
+//
+// <summary of the purpose of the file, description of the component, overview of the architecture and API, etc.>
+//
+```
+
+Major components usually occupy their own file. The top of the file is a good place to document the design of that component, including any information that would be helpful to a new reader of the code. A reference can be made to an actual design and implementation document (specification), but that document must be co-located with the source code, and not on some server that is unlikely to remain active for as long as the source will live.
+
+## <a name="7.3"/>7.3 Commenting code blocks
+
+Properly commented code blocks allow code to be scanned through and read like a book. There are a number of different commenting conventions that can be used for blocks of code, ranging from comments with significant whitespace to help visually distinguish major code segments to follow, down to single end-of-line comments to annotate individual statements or expressions. Choose the comment style that creates the most readable code.
+
+Major blocks can be commented using the following convention (in each example, the "&lt;comment>" line represents any number of lines with actual comment text):
+
+```c++
+<blank line>
+//
+// <comment>
+//
+<blank line>
+<code>
+```
+
+Or:
+
+```c++
+<blank line>
+// <comment>
+<blank line>
+<code>
+```
+
+Minor blocks can be commented using the following convention:
+
+```c++
+// <comment>
+<code>
+```
+
+Beware, however, of creating a visually dense block of comments-and-code without whitespace that is difficult to read.
+
+If the code line is short enough, and the comment is short enough, it can all be on the same line:
+
+```c++
+<code> // <comment>
+```
+
+Major comments should be used to comment a body of code, and minor comments should be used to clarify the intent of sub-bodies within a main body of code.
+
+Comments should be written with proper punctuation, including periods to end sentences. Be careful to check the spelling of your comments.
+
+The following example illustrates correct usage of major and minor comments in a body of code:
+
+```c++
+CorElementType MetaSig::NextArgNormalized()
+{
+ // Cache where the walk starts.
+ m_pLastType = m_pWalk;
+
+ //
+ // Now get the next element if one exists.
+ //
+
+ if (m_iCurArg == GetArgCount())
+ {
+ // We are done walking the entire signature.
+ return ELEMENT_TYPE_END;
+ }
+ else
+ {
+ // Skip the current argument and go the next.
+ m_iCurArg++;
+ CorElementType mt = m_pWalk.PeekElemType(m_pModule,
+ m_typeContext);
+ m_pWalk.SkipExactlyOne();
+ return mt;
+ }
+}
+```
+
+## <a name="7.4"/>7.4 Commenting variables
+
+All global variables and C++ class data members must be commented at the point of declaration.
+
+It is recommended that local variable declarations are also commented. One should not have to scan all uses of the variable to determine the exact meaning of the variable, including possible values, when it may or may not be initialized, etc. If the name of the variable is sufficient to describe the intent of the variable, then variable comments are unnecessary. However, do keep in mind that someone reading the code for the first time will not know all the rules you might have had in mind.
+
+The following conventions should be used:
+
+```c++
+// <comment>
+<variable declaration>
+```
+
+Or, for sufficiently concise comments:
+
+```c++
+<variable declaration> // <comment>
+```
+
+The following variable declarations provide an example of how to properly follow this convention when commenting variables:
+
+```c++
+class Thread
+{
+ // This is the maximum stack depth of allowed for any thread.
+ // This can be set by a config setting, but might be overridden
+ // if it's out of range.
+ static int s_MaxStackDepth;
+
+ bool m_fStressHeap; // Are we doing heap-stressing?
+};
+```
+
+## <a name="7.5"/>7.5 Commenting `#ifdefs`
+
+Do specify the macro name in a comment at the end of the closing `#endif` of a long or nested `#if`/`#ifdef`.
+
+```c++
+#ifdef VIEWER
+#ifdef MAC
+...
+#endif // MAC
+#else // !VIEWER
+...
+#endif // !VIEWER
+```
+
+This is so, when you see a page of code with just the `#endif` somewhere in the page, but the `#ifdef` is somewhere off the top of the page, you don't need to page back to see if the `#ifdef`'ed code is relevant to you; you can just look at the comment on the `#endif`. It also helps when there is nesting of `#ifdef`s, as above.
+
+The comment on a `#else` should indicate the condition that will cause the following code to be compiled. The comment on a `#endif` should indicate the condition that caused the immediately preceding code block to be compiled. In both cases, don't simply repeat the condition of the original `#if` – that does help with matching, but doesn't help interpret the exact condition of interest.
+
+Right:
+```c++
+#if defined(FOO) && defined(BAR)
+ ...
+#endif // defined(FOO) && defined(BAR)
+#ifdef FOO
+ ...
+#else // !FOO
+ ...
+#endif // !FOO
+```
+
+Wrong:
+```c++
+#ifdef FOO
+ ...
+#else // FOO
+ ...
+#endif // FOO
+```
+
+Do comment why the conditional `#ifdef` is needed (just as you might comment an `if` statement).
+
+A comment for code within an `#ifdef` should also appear within the `#ifdef`. For example:
+
+Right:
+```c++
+#ifdef _TARGET_ARM_
+ // This case only happens on ARM...
+ if (...)
+ ...
+#endif // _TARGET_ARM_
+```
+
+Wrong:
+```c++
+// This case only happens on ARM...
+#ifdef _TARGET_ARM_
+ if (...)
+ ...
+#endif // _TARGET_ARM_
+```
+
+# <a name="8"/>8 Naming Conventions
+
+Names should be sufficiently descriptive to immediately indicate the purpose of the function or variable.
+
+## <a name="8.1"/>8.1 General
+
+It is useful for names to be unique, to make it easier to search for them in the code. For example, it might make sense for every class to implement a debug-only `dump()` function. It's a simple name, and descriptive. However, if you do a simple textual search ("grep" or "findstr", or "Find in Files" in Visual Studio) for "dump", you will find far too many to be useful. Additionally, Visual Studio IntelliSense often gets confused when using "Go To Reference" or "Find All References" for such a common word that appears in many places (especially for our imprecise IntelliSense projects), rendering IntelliSense browsing also less useful.
+
+Functions and variables should be named at their level-of-intent.
+
+Good:
+```c++
+int ClientConnectionsRemaining[MAX_CLIENT_CONNECTIONS + 1];
+```
+
+Bad:
+```c++
+int Connections[MAX];
+```
+
+Do not use negative names, as thinking about the negation condition becomes difficult.
+
+Good:
+```c++
+bool isVerificationEnabled, allowStressHeap;
+```
+
+Bad:
+```c++
+bool isVerificationDisabled, dontStressHeap;
+```
+
+## <a name="8.2"/>8.2 Hungarian or other prefix/postfix naming
+
+We do not follow "Hungarian" naming, with a detailed set of name prefix requirements. We do suggest or require prefixes in a few cases, described below.
+
+* Global variables should be prefixed by "g_". (Note that the JIT has very few global variables.)
+* Non-static C++ class member variables should be prefixed by "m_".
+* Static C++ class member variables should be prefixed by "s_".
+
+Two common Hungarian conventions that we do not encourage are:
+
+* Prefixing boolean variables by "f" (for "flag"). Instead, consider using an appropriate verb prefix, such as "is" or "has". For example, `bool isFileEmpty`.
+* Prefixing pointer variables by "p" (one "p" for each level of pointer). There may be situations where this helps clarity, but it is not required.
+
+It is often helpful to choose a short, descriptive prefix for all members of a class, e.g. "lv" for local variables, or "emit" for the emitter. This short prefix also helps make the name unique, and easier to find with "grep". Thus, you might have "m_lvFoo" for a non-static member variable of a class that is using a "lv" prefix.
+
+## <a name="8.3"/>8.3 Macro names
+
+All macros and macro constants should have uppercase names. Words within a name must be separated by underscores. The following statements illustrate some macro and macro constant names:
+
+```c++
+#define PAGE_SIZE 4096
+#define CONTAINING_RECORD(_address, _type, _field) \
+ ((_type*)((LONG)(_address) - \
+ (LONG)(&((_type*)0)->_field)))
+```
+
+The use of inline functions is strongly encouraged instead of macros, for type-safety, to avoid problems of double evaluation of arguments, and to ease debugging.
+
+The first example here, PAGE_SIZE, should probably be written instead as:
+
+```c++
+const int g_PageSize = 4096;
+```
+
+which eliminates the need for a `#define` at all.
+
+Macro parameter names should start with one leading underscore. No other names (local variables names, member names, class names, parameters names, etc.) should begin with one leading underscore. This prevents problems from occurring where a macro parameter name accidentally matches a variable name at the point of macro expansion.
+
+All macro parameter names should be surrounded by parentheses in the macro definition to guard against unintended token interactions.
+
+All this being said, there still are some cases where macros are useful, such as the JIT phase list or instruction table, where data is defined in a header file by a series of macro functions that must be defined before #including the header file. This allows the macro function to be defined in different ways and the header file to be included multiple times to create different definitions from the same data. Look, for example, at instrsxarch.h. (Unfortunately, this technique confuses Visual Studio Intellisense.)
+
+## <a name="8.4"/>8.4 Local variables
+
+Local function variables should be named using camelCasing:
+
+* Multiple words should be concatenated directly, without using underscores in between.
+* The first letter of all words (except the first) should be upper case. The other letters should be lower case.
+* The first letter of the first word should be upper case if there a (lower-case) prefix. Otherwise it should be lower case.
+* Acronyms should be treated as words, and only the first letter may be upper case.
+
+The following variable declarations illustrate variable names that adhere to this convention:
+
+```c++
+CorElementType returnType; // simple camelCasing
+TypeHandle thConstraintType; // constraint for the type argument
+MethodDesc* pOverridingMD; // the override for the given method
+```
+
+## <a name="8.5"/>8.5 Global variables
+
+Global variables follow the same rules as local variable names, but should be prefixed by a "g_". The following global variable declarations illustrate variable names that adhere to this convention:
+
+```c++
+EEConfig* g_Config; // Configuration manager interface
+bool g_isVerifierEnabled; // Is the verifier enabled?
+```
+
+## <a name="8.6"/>8.6 Function parameters
+
+Function parameters follow the same rules as local variables
+
+```c++
+int Point(int x, int y) : m_x(x), m_y(y) {}
+```
+
+## <a name="8.7"/>8.7 Non-static C++ member variables
+
+Non-static C++ member variables should follow the same rules as local variable names, but should be prefixed by "m_".
+
+```c++
+class MetaSig
+{
+ // The module containing the metadata of the signature blob
+ Module* m_Module;
+
+ // The size of the signature blob in bytes. This is kSizeNotSpecified if
+ // the size is not specified.
+ UINT32 m_SigSize;
+
+ // This contains the offsets of the stack arguments.
+ // It is valid only after the entire signature has been walked.
+ // It contains the offset of only the first few arguments.
+ short m_StackOffsets[MAX_CACHED_SIG_SIZE + 1];
+};
+```
+
+## <a name="8.8"/>8.8 Static member variables
+
+Static C++ member variables should follow the same rules as non-static member variable names, but should be prefixed by "s_" instead of "m_".
+
+```c++
+class Thread
+{
+ static int s_MaxThreadRecord; // Set by config file
+ static HANDLE s_FakeThreadHandle; // Initialized after startup
+};
+```
+
+## <a name="8.9"/>8.9 Functions, including member functions
+
+Functions should be named in PascalCasing format:
+
+* Multiple words should be concatenated directly, without using underscores in-between.
+* The first letter of all words (including the first) should be upper case. The other letters should be lower case.
+
+The following C++ method declarations illustrate method names that adhere to this convention:
+
+```c++
+IExecutionEngine* GetExecutionEngine();
+void SortArgs();
+void RecordStkLevel(unsigned stkLvl);
+```
+
+It is also acceptable, and in fact encouraged, to prefix such names with a "tag" related to the function name's component or group, such as:
+
+```c++
+unsigned lvaGetMaxSpillTempSize();
+bool lvaIsPreSpilled(unsigned lclNum, regMaskTP preSpillMask);
+```
+
+This makes it more likely that the names are globally unique. The tag can start either with a lower or upper case.
+
+## <a name="8.10"/>8.10 Classes
+
+C++ class names should be named in PascalCasing format. A "C" prefix should not be used (thus, use `FooBar`, not `CFooBar` as is used in some conventions). The following C++ class declaration demonstrates proper adherence to this convention:
+
+```c++
+class SigPointer : public SigParser
+{
+ ...
+};
+```
+
+Interfaces should use a prefix of "I" (capital letter "i"):
+
+```c++
+class ICorStaticInfo : public virtual ICorMethodInfo
+{
+ ...
+};
+```
+
+## <a name="8.11"/>8.11 Enums
+
+Enum type names are PascalCased, like function names.
+
+Enum values should be all-caps, and prefixed with a short prefix that is unique to the enum. This makes them easier to "grep" for.
+
+```c++
+enum RoundLevel
+{
+ ROUND_NEVER = 0, // Never round
+ ROUND_CMP_CONST = 1, // Round values compared against constants
+ ROUND_CMP = 2, // Round comparands and return values
+ ROUND_ALWAYS = 3, // Round always
+
+ COUNT_ROUND_LEVEL,
+ DEFAULT_ROUND_LEVEL = ROUND_NEVER
+};
+```
+
+> The JIT is currently very inconsistent with respect to enum type and element naming standards. Perhaps we should adopt the VM standard of prefixing enum names with "k", and using PascalCasing for names, with a Hungarian-style per-enum prefix. For example, kRoundLevelNever or kRLNever / kRLCmpConst.
+
+# <a name="9"/>9 Function Structure
+
+The term "function" here refers to both C-style functions as well as C++ member functions, unless otherwise specified.
+
+There are two primary file types:
+
+* A header file. This is named using the .h suffix. It contains declarations, especially those declarations needed by other components. (Declarations only needed by a single file should generally be placed in the implementation file.)
+* An implementation file. This contains function definitions, etc. It is named using the .cpp suffix.
+
+Some code uses a third type of file, an "implementation header file" (or .hpp file) in which inline functions are put, which is `#include`ed into the appropriate .h file. We don't use that. Instead, we trust that our retail (ship) build type will use the compiler's whole program optimization feature (such as link-time code generation) to do cross-module inlining. Thus, we organize the implementations logically without worrying about inlining.
+
+It is acceptable to put very small inline member function implementations directly in the header file, at the point of declaration.
+
+## <a name="9.1"/>9.1 In a header file
+
+This is the format for the declaration of a function in a header file.
+
+For argument lists that fit within the max line length:
+
+```c++
+[static] [virtual] [__declspec(), etc]
+return-type FunctionName(<type-name>* <argument-name>, ...) [const];
+```
+
+For argument lists that don't fit within the max line length, or where it is deemed to be more readable:
+
+```c++
+[static] [virtual] [__declspec(), etc]
+return-type FunctionName(<type-name>* <argument-name>,
+ <type-name> <argument-name>,
+ ...
+ <type-name> <argument-name>) [const];
+```
+
+For multi-line function declarations, both argument type names and argument names should be aligned.
+
+Most declarations with more than one argument will be more readable by using the second format.
+
+Functions with no arguments should just use an empty set of parenthesis, and should not use `void` for the argument list.
+
+Right:
+```c++
+void Foo();
+```
+
+Wrong:
+```c++
+void Foo( ); // Don't put space between the parentheses
+void Foo(void); // Don't use "void"
+```
+
+All arguments can be on the same line if the line fits within the maximum line length.
+
+Right:
+```c++
+void Foo(int i);
+T Min<T>(T a, T b);
+```
+
+### <a name="9.1.1"/>9.1.1 Comments for function declarations
+
+Function declarations in a header file should have a few lines of documentation using single-line comments, indicating the intent of the prototyped function.
+
+Detailed documentation should be saved for the Function Header Comment above the function definition in the implementation file. This makes it easy to scan the API of the class in the header file.
+
+However, note that Visual Studio IntelliSense will pick up the comment that immediately precedes the function declaration, for use when calling the function. Thus, the declaration comment should be detailed enough to aid writing a call-site using IntelliSense.
+
+```c++
+class MetaSig
+{
+public:
+
+ // Used to avoid touching metadata for mscorlib methods.
+ MetaSig(MethodDesc* pMD,
+ BinderMethodID methodId);
+
+ // Returns type of current argument, then advances the
+ // argument index.
+ CorElementType NextArg();
+
+ // Returns type of current argument. Primitive valuetypes
+ // like System.Int32 are normalized to the form "int32".
+ CorElementType NextArgNormalized(UINT32* pSize);
+
+ // Checks if the calling convention of pSig is varargs.
+ static
+ bool IsVarArg(Module* pModule,
+ PCCOR_SIGNATURE pSig);
+};
+```
+
+## <a name="9.2"/>9.2 In an implementation file
+
+Typically for each header file in the project there will be an implementation file that contains the function implementations and it is named using the .cpp suffix.
+
+The signature of a function definition in the implementation file should use the same format used in the header file.
+
+Generally the order of the functions in the implementation file should match to the order in the header file.
+
+### <a name="9.2.1"/>9.2.1 Function size
+
+It is recommended that functions bodies (from the opening brace to the closing brace) are no more than 200 lines of text (including any empty lines and lines with just a single brace in the function body). A large function is difficult to scan and understand.
+
+Use your best judgment here.
+
+## <a name="9.3"/>9.3 Function definitions
+
+If the header file uses simple comments for the function prototypes, then the function definition in the implementation file should include a full, descriptive function header. If the header file uses a full function header comments for the function prototypes, then the function definition in the implementation file can use a few descriptive lines of comments. That is, there should be a full descriptive comment for the function, but only in one place. The recommendation is to place the detailed comments at the definition site. One primary reason for this choice is that most code readers spend much more time looking at the implementation files than they do looking at the header files.
+
+Note that for virtual functions, the declaration site for the virtual function should provide a sufficient comment to specify the contract of that virtual function. The various implementation sites should provide implementation-specific details.
+
+Default argument values may be repeated as comments. Example:
+
+```c++
+void Foo(int i /* = 0 */)
+{
+ <function body>
+}
+```
+
+Be careful to update all call sites when changing the default parameters of a function!
+
+Static member functions must repeat the "static" keyword as a comment. Example:
+
+```c++
+// static
+BOOL IsVarArg(Module* pModule,
+ PCCOR_SIGNATURE pSig)
+{
+ <function body>
+}
+```
+
+## <a name="9.4"/>9.4 Function header comment
+
+All functions, except trivial accessors and wrappers, should have a function header comment which describes the behavior and the implementation details of the function. The format of the function header in an implementation file is as shown below.
+
+Within the comment, argument names (and other program-related names) should be surrounded by double quotes, to emphasize that they are program objects, and not simple English words. This helps clarify those cases where a function argument might be parsed (by a human) in either way.
+
+Any of the sections that do not apply to a method may be skipped. For example, if a method has no arguments, the "Arguments" section can be left out.
+
+If you can formulate any assumptions as asserts in the code itself, you should do so. The "Assumptions" section is intended to encapsulate things that are harder (or impossible) to formulate as asserts, or to provide a place to write a more easily read English description of any assumptions that exist, even if they can be written with asserts.
+
+```c++
+//------------------------------------------------------------------------
+// <Function name>: <Short description of the function>
+//
+// Arguments:
+// <argument1-name> - Description of argument 1
+// <argument2-name> - Description of argument 2
+// ... as many as the number of function arguments
+//
+// Return Value:
+// Description of the values this function could return
+// and under what conditions. When the return value is a
+// described as a function of the arguments, those arguments
+// should be mentioned specifically by name.
+//
+// Assumptions:
+// Any entry and exit conditions, such as required preconditions of
+// data structures, memory to be freed by caller, etc.
+//
+// Notes:
+// More detailed notes about the function.
+// What errors can the function return?
+// What other methods are related or alternatives to be considered?
+
+<function definition>
+```
+
+### <a name="9.4.1"/>9.4.1 Example
+
+The following is a sample of a completed function definition:
+
+```c++
+//------------------------------------------------------------------------
+// IsVarArg: Checks if the calling convention of the signature "pSig" is varargs.
+//
+// Arguments:
+// pModule – The module which contains the metadata of "pSig".
+// pSig – The pointer to the signature blob.
+//
+// Return Value:
+// true – The calling convention of the method is varargs.
+// false – The method is not varargs, or there was some error.
+//
+// Assumptions:
+// The caller must have ensured that the format of "pSig" is
+// consistent, and that the size of the signature does not extend
+// past the end of the metadata blob.
+//
+// Notes:
+// Call-site signature blobs include ELEMENT_TYPE_SENTINEL.
+// This method does not check for the presence of the sentinel.
+
+// static
+BOOL MetaSig::IsVarArg(Module* pModule,
+ PCCOR_SIGNATURE pSig)
+{
+ <function body>
+}
+```
+
+## <a name="9.5"/>9.5 Specific function information
+
+### <a name="9.5.1"/>9.5.1 Constructor with member initialization list
+
+This is the format to use for specifying the member initialization list
+
+```c++
+ClassName::ClassName(<type-name>* <argument-name>,
+ <type-name2000> <argument-name>,
+ ...
+ <type-name> <argument-name>)
+ : m_member1(val1)
+ , m_member2(val2)
+ , m_member3(val3)
+{
+ <function_body>
+}
+```
+
+Note that the order of the initializers is defined by C++ to be member declaration order, and some compilers will report an error if the order is incorrect.
+
+# <a name="10"/>10 Local Variable Declarations
+
+Generally, variables should be declared at or close to the location of their first initialization, especially for large functions. For small functions, it is fine to declare all the locals at the start of the method.
+
+Each variable should be declared on a separate line.
+
+It is preferable to provide an initialization of a variable when it is declared.
+
+Variable names should be unique within a function. This is to make it easier to do a simple textual search for the declaration and all the uses of a name in a function, without worrying about scoping.
+
+Variables that are conditionally assigned or passed as OUT parameters to a function must be declared close to their first use. Values that are passed by reference as OUT parameters should be initialized prior to being passed.
+
+Variable names should generally be aligned, as are function parameter names, to improve readability. Variable initializers should also be aligned, if it improves readability.
+
+```c++
+char* name = getName();
+int iPosition = 0;
+int errorCode = 0;
+FindIllegalCharacter(name, &iPosition, &errorCode);
+```
+
+## <a name="10.1"/>10.1 Pointer declarations
+
+For pointer declaration, there should be no space between the type and the `*`, and one or more spaces between the `*` and the following symbol being declared. This emphasizes that the `*` is logically part of the type declaration (even though the C/C++ parser binds the `*` to the name).
+
+Right:
+```c++
+int* pi;
+int* pq;
+```
+
+Wrong:
+```c++
+int * pi;
+int * pi;
+int *pi;
+int *pi; // the alignment is on the name, not the *
+int *pi, *pq;
+```
+
+Each local pointer variable must be declared on its own line. Combined with the fact that most local variables will be declared and initialized on the same line, this naturally prevents the confusing syntax of mixing pointer and non-pointer declarations like:
+
+```c++
+int* piVal1, i2, *piVal3, i4;
+```
+
+For return types, there should be no space between the type and the `*`, and one or more spaces between the `*` and the function name.
+
+Right:
+```c++
+Module* GetModule();
+```
+
+Wrong:
+```c++
+Module *GetModule(); // no space between * and function name
+Module * GetModule(); // one space between type and *
+```
+
+For pointer types used in casts, there should be no space between the type and the `*`.
+
+Right:
+```c++
+BYTE* pByte = (BYTE*)pBuffer;
+```
+
+Wrong:
+```c++
+BYTE * pByte = (BYTE *)pBuffer; // one space between type and *
+```
+
+Reference variables should use similar spacing.
+
+Right:
+```c++
+void Foo(const BYTE& byte);
+```
+
+Double stars should appear together with no spaces.
+
+Right:
+```c++
+int** ppi;
+```
+
+Wrong:
+```c++
+int ** ppi;
+int * * ppi;
+```
+
+# <a name="11"/>11 Spacing
+
+## <a name="11.1"/>11.1 Logical and arithmetic expressions
+
+The following example illustrates correct spacing of parentheses for expressions:
+
+```c++
+if (a < ((!b) + c))
+{
+ ...
+}
+```
+
+There is a space between the `if` and the open parenthesis. This is used to distinguish C statements from functions (where the name and open parenthesis have no intervening space).
+
+There is no space after the open parenthesis following the `if` or before the closing parenthesis.
+
+Binary operators are separated on both sides with spaces.
+
+There should be parentheses around all unary and binary expressions if they are contained within other expressions. We prefer to over-specify parentheses instead of requiring developers to memorize the complete C++ precedence rules. This is especially true for `&&` and `||`, whose precedence relationship is often forgotten.
+
+Complex expressions should be broken down to use local variables to express and identify the semantics of the individual components.
+
+While `sizeof` is a built-in operator, it does not require a space between the `sizeof` and the open parenthesis. We require the argument to `sizeof` to be surrounded by parentheses.
+
+Wrong:
+```c++
+if(a < b) // no space after "if"
+if (a&&b) // no space around operator "&&"
+if (a || ( b && c )) // space after "(" and before ")"
+if (a < b + c) // binary expression "b + c" is not parenthesized
+x = sizeof f; // "sizeof" requires parentheses: sizeof(f)
+```
+
+## <a name="11.2"/>11.2 Continuing statements on multiple lines
+
+When wrapping statements, binary operators are left hanging after the left expression, so that the continuation is obvious. The right expression is indented to match the left expression. Additional spaces between the parentheses may be inserted as necessary in order to clarify how a complex conditional expression is expected to be evaluated. In fact, additional spaces are encouraged so that it is easy to read the condition.
+
+Right:
+```c++
+if ((condition1) ||
+ ((condition2A) &&
+ (condition2B)))
+
+if ( (condition1) ||
+ ( (condition2A) &&
+ (condition2B) ) )
+
+if ( ((condition1A) || (condition1B)) &&
+ ((condition2A) || (condition2B)) )
+```
+
+Wrong:
+```c++
+if ((condition1)
+ || ((condition2A)
+ && (condition2B)))
+
+if ((condition1)
+|| ((condition2A)
+ && (condition2B)))
+```
+
+## <a name="11.3"/>11.3 Function call
+
+When calling a function, use the following formatting:
+
+```c++
+Value = FunctionName(argument1, argument2, argument3);
+```
+
+There is no space between the function name and the open parenthesis.
+
+There is no space between the open parenthesis and the first argument.
+
+There is no space between the last argument and the closing parenthesis.
+
+There is a space between every comma and the next argument.
+
+There is no space between an argument and the comma following it.
+
+If all the arguments won't fit in the maximum line-width or you wish to add per-argument comments, enter each argument on its own line. A line must either contain all the arguments, or exactly one argument. All arguments should be aligned. It is preferred that the alignment is with the first argument that comes immediately after the opening parenthesis of the function call. If that makes the call too wide for the screen, the first argument can start on the next line, indented one tab stop. This avoids potential line-length conflicts, avoids having to realign all the arguments each time the method-call expression changes, and allows per-argument comments.
+
+Right:
+```c++
+Value = FunctionName(argument1,
+ (argument2A + argument2B) / 2,
+ *argument3, // comment about arg3
+ argument4);
+```
+
+Acceptable:
+```c++
+Value = FunctionName(
+ argument1,
+ (argument2A + argument2B) / 2,
+ *argument3, // comment about arg3
+ argument4);
+
+Value = TrulyVeryLongAndVerboseFunctionNameThatTakesUpALotOfHorizontalSpace(
+ argument1,
+ (argument2A + argument2B) / 2,
+ *argument3, // comment about arg3
+ argument4);
+```
+
+The following are examples of incorrect usage of spaces.
+
+Wrong:
+```c++
+Foo( i ); // space before first argument
+Foo (i); // space after function name
+Foo(i,j); // space is missing between arguments
+```
+
+For arguments that are themselves function calls you should consider assigning them to a new temporary local variable and passing the local variable.
+
+There are a couple of reasons for this:
+
+* The C++ language allows the compiler the freedom to evaluate the arguments of a function in any order, thus making the program non-deterministic when the compiler changes.
+* When debugging the program, the step into procedure becomes tedious as you have to step-in/step-out of every nested argument call before stepping into the final call.
+
+Right:
+```c++
+GenTreePtr asgStmt = gtNewStmt(asg, ilOffset);
+*pAfterStmt = fgInsertStmtAfter(block, *pAfterStmt, asgStmt);
+```
+
+## <a name="11.4"/>11.4 Arrays
+
+Array indices should not have spaces around them.
+
+Right:
+```c++
+int val = array[i] + array[j * k];
+```
+
+Wrong:
+```c++
+int val = array[ i ] + array[ j * k ];
+```
+
+# <a name="12"/>12 Control Structures
+
+The structure for control-flow structures like `if`, `while`, and `do-while` blocks is as follows:
+
+```c++
+// Comment about the upcoming control structure
+<statement>
+{
+ <statement 1>
+ ...
+ <statement n>
+}
+```
+
+The opening curly for the statement block is aligned with the preceding statement block or control-flow structure, and is on the next line. Statements within a block are indented 4 spaces. Curly braces are always required, with one exception allowed for very simple `if` statements (see below).
+
+Each distinct statement must be on a separate line. While this improves readability, it also allows for breakpoints to easily be set on any statement, since the debuggers use per-line source-level breakpoints.
+
+It is generally a good idea to leave a blank line after a control structure, for readability.
+
+## <a name="12.1"/>12.1 Braces for `if`
+
+Braces are required for all `else` blocks of all `if` statements. However, "then" blocks (the true case of an `if` statement) may omit braces if:
+
+* the "then" block is a single-line assignment statement, function call, `return` statement, or `continue` statement, *and*
+* there is no `else` block.
+
+Braces are required for all other `if` statements.
+
+Right:
+```c++
+if (x == 5)
+ printf("5\n");
+
+if (x == 5)
+{
+ printf("5\n");
+}
+
+if (x == 5)
+{
+ printf("5\n");
+}
+else
+{
+ printf("not 5\n");
+}
+
+if (x != 5)
+{
+ if (x == 6)
+ printf("6\n");
+}
+
+if (x == 5)
+ return;
+
+if (x == 5)
+ continue;
+```
+
+Wrong:
+```c++
+if (x == 5)
+ printf("5\n");
+else
+ printf("not 5\n");
+
+if (x == 5)
+ printf("5\n");
+else
+{
+ printf("not 5\n");
+ printf("Might be 6\n");
+}
+
+if (x != 5)
+ if (x !=6)
+ printf("Neither 5 or 6\n");
+
+if (x != 5)
+ for (int i = 0; i < 10; i++)
+ {
+ printf("x*i = %d\n", (x * i));
+ }
+```
+
+## <a name="12.3"/>12.3 Braces for looping structures
+
+Similar spacing should be used for `for`, `while` and `do-while` statements. These examples show correct placement of braces:
+
+```c++
+for (int i = 0; i < 100; i++)
+{
+ printf("i=%d\n", i);
+}
+
+for (int i = SomeVeryLongFunctionName(); // Each part of the "for" is aligned
+ SomeVeryComplexExpression();
+ i++)
+{
+ printf("i=%d\n", i);
+}
+
+while (i < 100)
+{
+ printf("i=%d\n", i);
+}
+
+do
+{
+ printf("i=%d\n", i);
+}
+while(i < 100);
+```
+
+Note that a loop body *must* have braces; an empty loop body with just a semicolon cannot be used, as it can easily be missed when reading.
+
+Right:
+```c++
+Foo* p;
+for (p = start; p != q; p = p->Next)
+{
+ // Empty loop body
+}
+```
+
+Right:
+```c++
+for (Foo* p = start; p != q; p = p->Next)
+{
+ // Empty loop body
+}
+```
+
+Wrong:
+```c++
+Foo* p;
+for (p = start; p != q; p = p->Next);
+```
+
+## <a name="12.4"/>12.4 `switch` statements
+
+For `switch` statements, each `case` label must be aligned to the same column as the `switch` (and the opening brace). The code body for each `case` label should be indented one level. Note that this implies that each case label must exist on its own line; do not place multiple case labels on the same line.
+
+A default case is always required. Use the `unreached()` macro to indicate that it is unreachable, if necessary.
+
+Local variables should not be declared in the scope defined by the `switch` statement.
+
+A nested statement block can be used for the body of a `case` statement if you need to declare local variables, especially if you need local variable initializers. The braces should be indented one level from the `case` label, so as to not be at the same level as the `switch` braces. As with all statement blocks, the statements within the block should be indented one level from the opening and closing braces.
+
+Fall-through between two case statements should be indicated with the `__fallthrough;` annotation. It should be surrounded by blank lines to make it maximally visible.
+
+`case` labels (except the first) should generally be preceded by a blank line, to increase readability.
+
+```c++
+//
+// Comment about the purpose of the switch
+//
+switch (Counter)
+{
+case 1:
+ // Comment about the action required for case 1
+ [code body]
+
+ __fallthrough;
+
+case 2:
+ // Comment about the action required for case 1 or 2
+ [code body]
+ break;
+
+case 3:
+ {
+ // Comment about the action required for case 3
+ <local variable declaration>
+ [code body]
+ }
+ break;
+
+default:
+ unreached();
+}
+```
+
+## <a name="12.5"/>12.5 Examples
+
+The following skeletal statements illustrate the proper indentation and placement of braces for control structures. In all cases, indentations consist of four spaces each.
+
+```c++
+// static
+int MyClass::FooBar(int iArgumentOne /* = 0 */,
+ ULONG* pArgumentTwo)
+{
+ // Braces for "if" statements are generally preferred.
+ if (iArgumentOne == 0)
+ {
+ return 0;
+ }
+
+ // All "for" loops must have braces.
+ for (int counter = 0; counter < 10; counter++)
+ {
+ ThenBlock ();
+ }
+
+ // Simple "if" statements with assignments or function calls
+ // in the "then" block may skip braces.
+ if (counter == 0)
+ ThenCode();
+
+ // All if-else statements require braces.
+ if (counter == 2)
+ {
+ counter += 100;
+ }
+ else
+ {
+ counter += 200;
+ }
+
+ return 0;
+}
+```
+
+# <a name="13"/>13 C++ Classes
+
+The format for a C++ class declaration is as follows.
+
+The labels for the `public`, `protected`, and `private` sections should be aligned with the opening brace.
+
+The labels for the `public`, `protected`, and `private` sections should occur in this order.
+
+This is the format to use:
+
+```c++
+//--------------------------------------------
+//
+// <Class name>: <Description of the class>
+//
+// Assumptions:
+// Is the class thread-safe? How does it handle memory allocation?
+// etc.
+//
+// Notes:
+// More detailed notes about the class.
+// Alternative or related types to compare
+//
+class <class name>
+{
+public:
+ <optional blank line>
+ <public methods>
+ <public data members. Ideally, there will be none>
+
+protected:
+ <optional blank line>
+ <protected methods>
+ <protected data members>
+
+private:
+ <optional blank line>
+ <private methods>
+ <private data members>
+};
+```
+
+Example:
+```c++
+//--------------------------------------------
+//
+// MetaSig: encapsulate a meta-data signature blob. This could be the
+// signature of functions, fields or local variables. It provides
+// facilities to inspect the properties of the signature like
+// calling convention and number of arguments, as well as to iterate
+// the arguments.
+//
+// Assumptions:
+// The caller must have ensured that the format of the signature is
+// consistent, and that the size of the signature does not extend
+// past the end of the metadata blob.
+//
+// Notes:
+// Note that the elements of the signature representing primitive
+// valuetype can be accessed either in their raw form
+// (e.g., System.Int32) or in their normalized form (e.g., int32).
+// Note that parsing of generic signatures requires the caller to
+// provide the SigTypeContext to use to interpret the
+// type arguments.
+// Also look at SigPointer if you need to parse a single
+// element of a signature.
+//
+class MetaSig
+{
+public:
+
+ //
+ // Constructors
+ //
+ MetaSig(PCCOR_SIGNATURE szMetaSig,
+ DWORD cbMetaSig,
+ Module* pModule);
+
+ // Used to avoid touching metadata for mscorlib methods.
+ MetaSig(MethodDesc* pMD,
+ BinderMethodID methodId);
+
+ //
+ // Argument iterators
+ //
+
+ // Returns type of current argument, then advances the
+ // argument index.
+ CorElementType NextArg();
+
+ // Returns type of current argument. Primitive valuetypes like
+ // System.Int32 are normalized to the form "int32".
+ CorElementType NextArgNormalized(UINT32* pSize);
+
+ //
+ // Helper methods
+ //
+
+ // Checks if the calling convention of pSig is varargs
+ // two given strings.
+ static
+ BOOL IsVarArg(Module* module,
+ PCCOR_SIGNATURE sig);
+
+private:
+
+ // The module containing the metadata of the signature blob.
+ Module* m_module;
+
+ // The size of the signature blob. This is SizeNotSpecified if
+ // the size is not specified.
+ UINT32 m_sigSizeBytes;
+
+ // This contains the offsets of the stack arguments.
+ // It is valid only after the entire signature has been walked.
+ // It contains the offset of only the first few arguments.
+ short m_stackOffsets[MAX_CACHED_SIG_SIZE + 1];
+};
+```
+
+# <a name="14"/>14 Preprocessor
+
+## <a name="14.1"/>14.1 Conditional compilation
+
+Prefer `#if` over `#ifdef` for conditional compilation. This allows setting the macro to 0 to disable it. `#ifdef` will not work in this case, and instead requires ensuring that the macro is not defined.
+
+One exception: we use `#ifdef DEBUG` for debug-only code (see Section FIXME).
+
+Right:
+```c++
+#if MEASURE_MEM_ALLOC
+```
+
+Wrong:
+```c++
+#ifdef MEASURE_MEM_ALLOC
+```
+
+Note, however, that you must be diligent in only using `#if`, because `#ifdef FOO` will be true whether FOO is defined to 0 or 1!
+
+`#if` and other preprocessor directives should not be indented at all, and should be placed at the very start of the line.
+
+If you have conditional `#if`/`#ifdef` in the source, explain what they do, just like you would comment an `if` statement.
+
+Minimize conditional compilation by defining good abstractions, partitioning files better, or defining appropriate constants or macros.
+
+### <a name="14.1.1"/>14.1.1 `#if FEATURE`
+
+If a new or existing feature is being added or modified then use a `#define FEATURE_XXX` to both highlight the code used to implement this and to allow the JIT to be compiled both with and without the feature.
+
+```c++
+#define FEATURE_VALNUM_CSE 0 // disable Value Number CSE optimization logic
+#define FEATURE_LEXICAL_CSE 1 // enable Lexical CSE optimization logic
+
+#if FEATURE_VALNUM_CSE
+void Compiler::optValnumCSEinit()
+```
+
+Note that periodically we do need to go through and remove FEATURE_* defines that are always enabled, and will never be disabled.
+
+### <a name="14.1.2"/>14.1.2 Disabling code
+
+It is generally discouraged to permanently disable code by commenting it out or by putting `#if 0` around it, in an attempt to keep it around for reference. This reduces the hygiene of the code base over time and such disabled code is rarely actually useful. Instead, such disabled code should be entirely deleted. If you do disable code without deleting it, then you must add a comment as to why the code is disabled, and why it is better to leave the code disabled than it is to delete it.
+
+One exception is that it is often useful to `#if 0` code that is useful for debugging an area, but is not otherwise useful. Even in this case, however, it is probably better to introduce a COMPlus_* variable to enable the special debugging mode.
+
+### <a name="14.1.3"/>14.1.3 Debug code
+
+Use `#ifdef DEBUG` for debug-only code. Do not use `#ifdef _DEBUG` (with a leading underscore).
+
+Use the `INDEBUG(x)` macro (and related macros) judiciously, for code that only runs in DEBUG, to avoid `#ifdef`s.
+
+Use the `JITDUMP(x)` macro for printing things to the JIT dump output. Note that these things will only get printed when the `verbose` variable is set, which is when `COMPlus_JitDump=*` or when `COMPlus_JitDump=XXX` and we are JITting function XXX; or when `COMPlus_NgenDump=*` or `COMPlus_NgenDump=XXX` and we are NGENing function XXX. Do not use `JITDUMP` for all output in a debug-only function that might be useful to call from the debugger. In that case, define a function that uses `printf` (which is a JIT-specific implementation of this function), which can be called from the debugger, and invoke that function like this:
+
+```c++
+DBEXEC(verbose, MyDumpFunction());
+```
+
+A common pattern in the JIT is the following:
+
+```c++
+#ifdef DEBUG
+ if (verbose)
+ printf("*************** In genGenerateCode()\n");
+#endif // DEBUG
+```
+
+This could be written on fewer lines as:
+
+```c++
+JITDUMP("*************** In genGenerateCode()\n");
+```
+
+However, the former is preferred because it is trivial to set an unconditional breakpoint on the "printf" that triggers when we are compiling the function that matches what COMPlus_JitDump is set to – a very common debugging technique. Note that conditional breakpoints could be used, but they are more cumbersome, and are very difficult to get right in windbg.
+
+If many back-to-back JITDUMP statements are going to be used it is preferred that they be written using printf().
+
+Wrong:
+```c++
+JITDUMP(" TryOffset: 0x%x\n", clause.TryOffset);
+JITDUMP(" TryLength: 0x%x\n", clause.TryLength);
+JITDUMP(" HandlerOffset: 0x%x\n", clause.HandlerOffset);
+JITDUMP(" HandlerLength: 0x%x\n", clause.HandlerLength);
+```
+
+Right:
+```c++
+#ifdef DEBUG
+if (verbose)
+{
+ printf(" TryOffset: 0x%x\n", clause.TryOffset);
+ printf(" TryLength: 0x%x\n", clause.TryLength);
+ printf(" HandlerOffset: 0x%x\n", clause.HandlerOffset);
+ printf(" HandlerLength: 0x%x\n", clause.HandlerLength);
+}
+#endif // DEBUG
+```
+
+Always put debug-only code under `#ifdef DEBUG` (or the equivalent). Do not assume the compiler will get rid of your debug-only code in a non-debug build flavor. This also documents more clearly that you intend the code to be debug-only.
+
+## <a name="14.2"/>14.2 `#define` constants
+
+Use `const` or `enum` instead of `#define` for constants when possible. The value will still be constant-folded, but the `const` adds type safety.
+
+If you do use `#define` constants, the values of multiple constant defines should be aligned.
+
+```c++
+#define PREFIX_CACHE_DEFAULT_SIZE 16
+#define PREFIX_CACHE_MAX_SIZE (2 * 1024)
+#define DEVICE_NAME L"MyDevice"
+```
+
+## <a name="14.3"/>14.3 Macro functions
+
+Expressions (except very simple constants) should be enclosed in parentheses to prevent incorrect multiple expansion of the macro arguments.
+
+Enclose all argument instances in parentheses.
+
+Macro arguments should be named with two leading underscores, to prevent their names from being confused with normal source code names, such as variable or function names.
+
+### <a name="14.3.1"/>14.3.1 Macro functions versus C++ inline functions
+
+All macro functions should be replaced with a C++ inline function or C++ inline template function if possible. This allows type checking of arguments, and avoids the problem of macro-expansion of macro arguments.
+
+### <a name="14.3.2"/>14.3.2 Line continuation
+
+All the `\` at the end of a multi-line macro definition should be aligned with each other.
+
+There must be no `\` on the last line of a multi-line macro definition.
+
+### <a name="14.3.3"/>14.3.3 Multi-statement macro functions
+
+Functional macro definitions with multiple statements or with `if` statements should use `do { <statements> } while(0)` to ensure that the statements will always be compiled together as a single statement block. This ensures that those who mistake the macro for a function don't accidentally split the statements into multiple scopes when the macro is used. Example: consider a macro used like this:
+
+```c++
+if (fCond)
+ SOME_MACRO();
+```
+
+Wrong:
+```c++
+#define SOME_MACRO() \
+ Statement1; \
+ Statement2;
+```
+
+Right:
+```c++
+#define SOME_MACRO() \
+ do \
+ { \
+ Statement1; \
+ Statement2; \
+ } while(0)
+```
+
+The braces ensure the statement block isn't split. The `do { ... } while(0)` ensures that uses of the macro always end with a semicolon.
+
+### <a name="14.3.4"/>14.3.4 Control flow
+
+Avoid using control flow inside of preprocessor functions. Since these read like function calls in the source it is best if they also act like function calls. The expectation should be that all arguments will get evaluated one time and we should avoid strange behavior such as only evaluating an argument if a prior argument evaluates to true or evaluating some argument multiple times.
+
+### <a name="14.3.5"/>14.3.5 Scope
+
+Macros that require a pair of macros due to the introduction of a scope are strongly discouraged in the JIT. These do exist in the VM and the convention there is to have a _BEGIN and _END suffix at the end of a common all caps macro name.
+
+```c++
+#define PAL_CPP_EHUNWIND_BEGIN {
+#define PAL_CPP_EHUNWIND_END }
+```
+
+### <a name="14.3.6"/>14.3.6 Examples
+
+```c++
+#define MIN(_x, _y) (((__x) < (__y)) ? (__x) : (__y))
+
+#define CONTAINING_RECORD(_address, _type, _field) \
+ ((_type*)((LONG)(_address) - \
+ (LONG)(&((_type*)0)->_field)))
+
+#define STRESS_ASSERT(_cond) \
+ do \
+ { \
+ if (!(_cond) && g_pConfig->IsStressOn()) \
+ DebugBreak(); \
+ } \
+ while(0)
+```
+
+# <a name="15"/>15 Language Usage Rules
+
+The following rules are not related to formatting; they provide guidance to improve semantic clarity.
+
+## <a name="15.1"/>15.1 C/C++ general
+
+### <a name="15.1.1"/>15.1.1 Casts
+
+Instead of C-style casts, use `static_cast<>`, `const_cast<>` and `reinterpret_cast<>` for pointers as they are more expressive and type-safe.
+
+### <a name="15.1.2"/>15.1.2 Globals
+
+Avoid global variables as they pollute the global namespace and require careful handling to ensure thread safety. Prefer static class variables.
+
+### <a name="15.1.3"/>15.1.3 `bool` versus `BOOL`
+
+`bool` is a built-in C++ language type. `bool` variables contain the value `true` or `false`. When stored (such as a member of a struct), it is one byte in size, and `true` is stored as one, `false` as zero.
+
+`BOOL` is a typedef, either of `bool` (from clrtypes.h) or `int` (from Windows header files), whose value is one of the `#define` macros `TRUE` (1) or `FALSE` (0).
+
+Use `bool`. Only use `BOOL` when calling an existing API that uses it.
+
+Right:
+```c++
+bool isComplete = true;
+```
+
+Wrong:
+```c++
+BOOL isComplete = TRUE;
+```
+
+### <a name="15.1.4"/>15.1.4 `NULL` and `nullptr`
+
+Use the C++11 `nullptr` keyword when assigning a "null" to a pointer variable, or comparing a pointer variable against "null". Do not use `NULL`.
+
+Right:
+```c++
+int* p = nullptr;
+if (p == nullptr)
+ ...
+```
+
+Wrong:
+```c++
+int* p = NULL;
+if (p == NULL)
+ ...
+int* p = 0;
+if (p == 0)
+ ...
+```
+
+### <a name="15.1.5"/>15.1.5 Use of zero
+
+Integers should be explicitly checked against 0. Pointers should be explicitly checked against `nullptr`. Types that have a legal zero value should use a named zero value, not an explicit zero. For example, `regMaskTP` is a register mask type. Use `RBM_NONE` instead of a constant zero for it.
+
+Right:
+```c++
+int i;
+int* p = Foo();
+if (p == nullptr)
+ ...
+if (p != nullptr)
+ ...
+if (i == 0)
+ ...
+if (i != 0)
+ ...
+```
+
+Wrong:
+```c++
+int i;
+int* p = Foo();
+if (!p)
+ ...
+if (p)
+ ...
+if (p == 0)
+ ...
+if (p != 0)
+ ...
+if (!i)
+ ...
+if (i)
+ ...
+```
+
+### <a name="15.1.6"/>15.1.6 Nested assignment
+
+Do not use assignments within `if` or other control-flow statements.
+
+Right:
+```c++
+int x = strlen(szMethodName);
+if (x > 5)
+```
+
+Wrong:
+```c++
+if ((x = strlen(szMethodName)) > 5)
+```
+
+### <a name="15.1.7"/>15.1.7 `if` conditions
+
+Do not place constants first in comparison checks (unless that reads more naturally), as a trick to avoid accidental assignment in a condition, as assignment within a condition will be a compiler error in our builds.
+
+Right:
+```c++
+if (x == 5)
+```
+
+Wrong:
+```c++
+if (5 == x)
+```
+
+### <a name="15.1.8"/>15.1.8 `const`
+
+Use of the `const` qualifier is encouraged.
+
+It is specifically encouraged to mark class member function as `const`, especially small "accessors", for example:
+
+```c++
+var_types TypeGet() const { return gtType; }
+```
+
+### <a name="15.1.9"/>15.1.9 Ternary operators
+
+Ternary operators `?:` are best used to make quick and simple decisions inside function invocations. Don't use it as a replacement for the `if` statement. Note that putting individual statements on their own line makes it easy to set debugging breakpoints on them. Use of nested ternary operators is strongly discouraged. Using it for simple assignment of a single condition is fine. It's recommended that the "then" and "else" conditions of the ternary operator do not have side-effects.
+
+Right:
+```c++
+if (a == b)
+{
+ Foo();
+}
+else
+{
+ Bar();
+}
+```
+
+Wrong:
+```c++
+(a == b) ? Foo() : Bar(); // top-level ?: disallowed
+```
+
+Acceptable:
+```c++
+x = (a == b) ? 7 : 9;
+Foo((a == b) ? "hi" : "bye");
+```
+
+Wrong:
+```c++
+x = (a == b) ? ((c == d) ? 1 : 2) : 3; // nested ?: disallowed
+```
+
+### <a name="15.1.10"/>15.1.10 Use of `goto`
+
+The `goto` statement should be avoided.
+
+If you *must* use `goto`, the `goto` label must be all-caps, with words separated by underscores. The label should be surrounded by empty lines and otherwise made very visible, with prominent comments and/or placing it in column zero.
+
+Example:
+```c++
+case GT_LSH: ins = INS_SHIFT_LEFT_LOGICAL; goto SHIFT;
+case GT_RSH: ins = INS_SHIFT_RIGHT_ARITHM; goto SHIFT;
+case GT_RSZ: ins = INS_SHIFT_RIGHT_LOGICAL; goto SHIFT;
+
+SHIFT:
+```
+
+You should think very hard about other ways to code this to avoid using a `goto`. One of the biggest problems is that the `goto` label can be targeted from anyplace in the function, which makes understanding the code very difficult.
+
+## <a name="15.2"/>15.2 Source file organization
+
+The general guideline is that header files should not be bigger than 1000 lines and implementation files should not be bigger than 5000 lines of code (including comments, function headers, etc.). Files larger than this should be split up and organized in some better logical fashion.
+
+A class declaration should contain no implementation code. This is intended to make it easy to browse the API of the class. Note that our shipping "retail" build uses Visual C++ Link Time Code Generation, which can perform cross-module inlining. It is acceptable to define small accessor functions in the class declaration, for simplicity.
+
+Maintain clear visual separation and identification of "segments" of API, and in particular of the private area of declarations. Logical chunks of APIs should be separated with comments like this:
+
+```c++
+<blank line>
+//
+// Description of the following chunk of API
+//
+<blank line>
+```
+
+## <a name="15.3"/>15.3 Function declarations
+
+### <a name="15.3.1"/>15.3.1 Default arguments
+
+Avoid default arguments values unless the argument has very little semantic impact, especially when adding a new argument to an existing method. Avoiding default values forces all call sites to think about the argument value to use, and prevents call sites from silently opting into unexpected behavior.
+
+### <a name="15.3.2"/>15.3.2 Overloading
+
+Never overload functions on a primitive type (e.g. `Foo(int i)` and `Foo(long l)`).
+
+Avoid operator overloading unless the overload matches the "natural" semantics of the operator when applied to integral types.
+
+### <a name="15.3.3"/>15.3.3 Enums versus primitive parameter types
+
+Use enums rather than primitive types for function arguments as it promotes type-safety, and the function signature is more descriptive.
+
+Specifically, declare and use enum types with two values instead of boolean for function arguments – the enum conveys more information to the reader at the callsite.
+
+Bad:
+```c++
+Foo(true);
+Foo(false);
+```
+
+Good:
+```c++
+enum DuplicateSpecification
+{
+ DS_ALLOW_DUPS,
+ DS_UNIQUE_ONLY
+};
+void Foo(DuplicateSpecification useDups);
+Foo(DS_ALLOW_DUPS);
+Foo(DS_UNIQUE_ONLY);
+```
+
+This is especially true if the function has multiple boolean arguments.
+
+Bad:
+```c++
+Bar(true, false);
+```
+
+Good:
+```c++
+Bar(DS_ALLOW_DUPS, FORMAT_FIT_TO_SCREEN);
+```
+
+### <a name="15.3.4"/>15.3.4 Functions returning pointers
+
+Functions that return pointers must think carefully about whether a `nullptr` return value could be ambiguous between success with a `nullptr` return value and failure.
+
+### <a name="15.3.5"/>15.3.5 Reference arguments
+
+Never use non-const reference arguments as the call-site has no indication that the argument may change. Const reference arguments may be used as they do not have the above problem, and are also required for operators.
+
+### <a name="15.3.6"/>15.3.6 Resource release
+
+If you call a function to release a resource and pass it a pointer or handle, you must set the pointer to `nullptr` or handle to `INVALID_HANDLE`. This ensures that the pointer or handle will not be accidentally used in code that follows.
+
+```c++
+CloseHandle(hMyFile);
+hMyFile = INVALID_HANDLE;
+```
+
+### <a name="15.3.7"/>15.3.7 OUT parameters
+
+Functions with OUT parameters must initialize them (e.g., to 0 or `nullptr`) on entry to the function. If the function fails, this protects the caller from accidental use of potentially uninitialized values.
+
+## <a name="15.4"/>15.4 STL usage
+
+> JIT STL usage rules need to be specified.
+
+## <a name="15.5"/>15.5 C++ class design
+
+### <a name="15.5.1"/>15.5.1 Public data members
+
+Do not declare public data members. Instead, public accessor functions should be exposed to access class members.
+
+### <a name="15.5.2"/>15.5.2 Friend functions
+
+Avoid friend functions - they expose internals of the class to the friend function, making subsequent changes to the class more fragile. However, it is notably worse to make everything public.
+
+### <a name="15.5.3"/>15.5.3 Constructors
+
+If you declare a constructor, make sure to initialize all the class data members.
+
+### <a name="15.5.4"/>15.5.4 Destructors
+
+The JIT uses a specialized memory allocator that does not release memory until compilation is complete. Thus, it is generally bad to declare or require destructors and the calling of `delete`, since memory will never be reclaimed, and JIT developers used to never dealing with deallocation are also likely to omit calls to `delete`.
+
+### <a name="15.5.5"/>15.5.5 Operator overloading
+
+Define operators such as `=`, `==`, and `!=` only if you really want and use this capability, and can make them super-efficient.
+
+Never define an operator to do anything other than the standard semantics for built-in types.
+
+Never hide expensive work behind an operator. If it's not super efficient then make it an explicit method call.
+
+### <a name="15.5.6"/>15.5.6 Copy constructor and assignment operator
+
+The compiler will automatically create a default copy constructor and assignment operator for a class. If that is undesirable, use the C++11 delete functions feature to prevent that, as so:
+
+```c++
+private:
+ // No copy constructor or operator=
+ MyClass(const MyClass& info) = delete;
+ MyClass& operator=(const MyClass&) = delete;
+```
+
+### <a name="15.5.7"/>15.5.7 Virtual functions
+
+An overridden virtual method is explicitly declared virtual for clarity.
+
+Virtual functions have overhead, so don't use them unless you need polymorphism.
+
+However, note that virtual functions are often a cleaner, clearer, and faster solution than alternatives.
+
+### <a name="15.5.8"/>15.5.8 Inheritance
+
+Don't use inheritance just because it will work. Use it sparingly and judiciously, when it makes sense to the situation. Deeply nested hierarchies can be confusing to understand.
+
+Be careful with inheritance vs. containment. When in doubt, use containment.
+
+Don't use multiple implementation inheritance.
+
+### <a name="15.5.9"/>15.5.9 Global class objects
+
+Never declare a global instance of a class that has a constructor. Such constructors run in a non-deterministic order which is bad for reliability, and they have to be executed during process startup which is bad for startup performance. It is better to use lazy initialization in such a case.
+
+## <a name="15.6"/>15.6 Exceptions
+
+Exceptions should be thrown only on true error paths, not in the general execution of a function. Exceptions are quite expensive on some platforms. What is an error path is a subjective choice depending on the scenario.
+
+Do not catch all exceptions blindly. Catching all exceptions may mask a genuine bug in your code. Also, on Win32, some exceptions such as out-of-stack, cannot be safely resumed without careful coding.
+
+Use care when referencing local variables within the "catch" and "finally" blocks, as their values may be an undefined state if an exception occurs.
+
+## <a name="15.7"/>15.7 Code tuning for performance optimization
+
+In general, code should be written to be readable first, and optimized for performance second. Don't optimize for the compiler! This will help to keep the code understandable, maintainable, and less prone to bugs.
+
+In the case of tight loops and code that has been analyzed to be a performance bottleneck, performance optimizations take a higher priority. Talk to the performance team if in doubt.
+
+## <a name="15.8"/>15.8 Obsoleting functions, classes and macros
+
+The Visual C++ compiler has support built in for marking various user defined constructs as deprecated. This functionality is accessed via one of two mechanisms:
+
+```c++
+#pragma deprecated(identifier1 [, identifier2 ...])
+```
+
+This mechanism allows you to deprecate pretty much any identifier. In particular it can be used to mark a macro as obsolete:
+
+```c++
+#define FOO(x) x
+#pragma deprecated(FOO)
+```
+
+Attempts to utilize FOO will result in compiler warning C4995 being raised:
+
+```c++
+obs.cpp(18) : warning C4995: 'FOO': name was marked as #pragma deprecated
+```
+
+Note that this warning will fire at the point in the code where the macro is expanded, which may include instances where another macro is used which happens to utilize the deprecated macro (the warning will not fire at the definition of the outer macro). In order to correctly obsolete such a macro it may be necessary to refactor your code to avoid its use by any outer macros which are not being obsoleted.
+
+Another obsoleting mechanism is:
+
+```c++
+__declspec(deprecated("descriptive text"))
+```
+
+This mechanism can be used with classes and methods. It cannot be applied to macros. It is more flexible than the #pragma mechanism since it provides a way to output additional information with the warning message and can be applied to a specific overload of a given method:
+
+```c++
+#ifdef _MSC_VER
+__declspec(deprecated("This method is deprecated, use the version that takes a StackCrawlMark instead"))
+#endif
+static Module* GetCallersModule(int skip);
+```
+
+Attempting to call the method annotated above will result in a C4996 warning being raised:
+
+```c++
+d:\dd\puclr\ndp\clr\src\vm\marshalnative.cpp(431) : warning C4996: 'SystemDomain::GetCallersModule' was declared deprecated
+```
+
+Code that legitimately still needs to use deprecated functionality (or is being grandfathered in as new functions are deprecated) can use the normal C++ mechanism to suppress the deprecation warnings:
+
+```c++
+#ifdef _MSC_VER
+#pragma warning(push)
+#pragma warning(disable:4996) // Suppress warning on call to
+ // deprecated method
+#endif
+
+Module* pModule = SystemDomain::GetCallersModule(1);
+
+#ifdef _MSC_VER
+#pragma warning(pop)
+#endif
+```
+
+Note that all these techniques are specific to the Microsoft C++ compiler and must therefore be conditionally compiled out for non-Windows builds, as shown in the examples above.
diff --git a/Documentation/coding-guidelines/cross-platform-performance-and-eventing.md b/Documentation/coding-guidelines/cross-platform-performance-and-eventing.md
new file mode 100644
index 0000000..f332724
--- /dev/null
+++ b/Documentation/coding-guidelines/cross-platform-performance-and-eventing.md
@@ -0,0 +1,287 @@
+# .NET Cross-Plat Performance and Eventing Design
+
+##Introduction
+
+As we bring up CoreCLR on the Linux and OS X platforms, it’s important that we determine how we’ll measure and analyze performance on these platforms. On Windows we use an event based model that depends on ETW, and we have a good amount of tooling that builds on this approach. Ideally, we can extend this model to Linux and OS X and re-use much of the Windows tooling.
+
+# Requirements
+
+Ideally, we'd like to have the following functionality on each OS that we bring-up:
+
+- Collection of machine-wide performance data including CPU sampling, threading information (e.g. context switches), and OS specific events / system call tracing.
+- Collection of CLR-specific events that we have today exposed as ETW events.
+- Collection of EventSource events by a default OS-specific collector, as we do today with ETW on Windows.
+- User-mode call stacks for both performance and tracing data.
+- Portability of traces across machines, so that analysis can occur off box.
+- Data viewable on collection OS.
+- Stretch: Data can be understood by TraceEvent, which opens up interesting analysis scenarios.
+ - Using PerfView and existing tooling.
+ - Ability to use CAP (Automated analysis) on non-Windows data.
+
+# Scoping to the Current Landscape
+
+Given that we’ve built up a rich set of functionality on Windows, much of which depends on ETW and is specific to the OS, we’re going to see some differences across the other operating systems.
+
+Our goal should be to do the best job that we can to enable data collection and analysis across the supported operating systems by betting on the right technologies, such that as the landscape across these operating systems evolve, .NET is well positioned to take advantage of the changes without needing to change the fundamental technology choices that we’ve made. While this choice will likely result in some types of investigations being more difficult due to absent features that we depend upon on Windows, it is likely to position us better for the future and align us with the OS communities.
+
+# Linux
+
+## Proposed Design
+
+Given that the performance and tracing tool space on Linux is quite fragmented, there is not one tool that meets all of our requirements. As such, we'll use two tools when necessary to gather both performance data and tracing data.
+
+**For performance data collection we'll use perf_events**, an in-tree performance tool that provides access to hardware counters, software counters and system call tracing. Perf_event will be the primary provider of system-wide performance data such as CPU sampling and context switches.
+
+**For tracing we'll use LTTng**. LTTng supports usermode tracing with no kernelspace requirements. It allows for strongly typed static events with PID and TID information. The system is very configurable and allows for enabling and disabling of individual events.
+
+## Tools Considered
+
+### Perf_Events
+
+#### Pros
+
+- Kernel level tracing of hardware counters (CPU samples, context switches, etc.), software counters and system calls.
+- Machine-wide or process-wide. No process attach required.
+- Collection of owned processes without elevated permissions.
+- Provides user-mode stack traces via frame pointers and libunwind.
+- Extensible support for [JIT symbol resolution](https://git.kernel.org/cgit/linux/kernel/git/namhyung/linux-perf.git/tree/tools/perf/Documentation/jit-interface.txt).
+- In-tree: Basically available for almost every distro.
+- Data is stored in perf tool file format (perf.data) – can be opened by a viewer such as “perf report”.
+
+#### Cons
+
+- No user-mode static tracing. Only dynamic user-mode tracing, using “breakpoints” with no event payloads.
+
+### LTTng
+
+#### Pros
+
+- User-mode static tracing with no kernel modules required.
+- Strongly-typed static event support.
+- No pre-registration of static event types required. Events can be enabled before they are known to LTTng.
+- System call tracing supported with optional kernel module. User-mode does not require a kernel module.
+- Machine-wide or process-wide. No process attach required.
+- Collection of owned processes without elevated permissions.
+- Events can be tagged with context such as PID, TID.
+- Out-of-tree but binaries available for many common distros.
+- Data stored in Common Trace Format – designed for interop.
+
+#### Cons
+
+- No built-in callstack collection.
+
+### SystemTap
+
+#### Pros
+
+- Supports User-mode static tracing including call stacks.
+- Static tracing does not require pre-registration of the event or payload definition when the app starts, which makes EventSource support simple.
+- Out-of-tree but binaries available for many common distros.
+
+#### Cons
+
+- Complex kernel module is generated and compiled on-the-fly based on the tracing script.
+- Static tracing includes a fixed set of static tracing APIs with limited overloads (e.g. int, string). Can’t consider it strongly typed tracing.
+- User-mode stack trace support requires debug information to support unwinding. No story for JIT compiled code.
+- User-mode stack traces are only supported on x86 and x64.
+- Data is stored as unstructured text.
+
+### DTrace4Linux
+
+#### Pros
+
+- Would allow for tracing code and collection script re-use across Linux and OS X.
+
+#### Cons
+
+- Source only – no binary redist.
+- Small subset of actual DTrace functionality.
+- One person’s work rather than many contributions from the community.
+
+### FTrace
+
+#### Pros
+
+- High performance function tracer.
+- In-tree: Basically available for almost every distro.
+
+#### Cons
+
+- Tracing in kernel-mode only.
+- No performance data capture.
+
+### Extended Berkeley Packet Filter (eBPF)
+
+#### Pros
+
+- Should support user-mode static tracing.
+- Possible integration with perf_event.
+
+#### Cons
+
+- Not currently available – currently being integrated into the kernel.
+- Final featureset not clear yet.
+
+## Infrastructure Bring-Up Action Items
+
+- Investigate: Determine if clock skew across the trace files will be an issue.
+- Investigate: Are traces portable, or do they have to be opened on collection machine?
+- Investigate: Do we need rundown or can we use /tmp/perf-$pid.map? How does process/module rundown work?
+- Implement: Enable frame pointers on JIT compiled code and helpers to allow stacks to be walked. (PR # [468](https://github.com/dotnet/coreclr/pull/468))
+- Implement: Stack walking using existing stackwalker (libunwind and managed code).
+- Implement: JIT/NGEN call frame resolution - /tmp/perf-$pid.map
+- Implement: Trace collection tool.
+ - Responsible for handling all of the complexity.
+ - Start and stop tracing based on information requested.
+ - **OPEN ISSUE:** Handle rundown if required.
+ - Collect any information needed for off-box viewing (e.g. /tmp/perf-$pid.map).
+ - Compress into one archive that can be copied around easily.
+- Implement: Viewing of data in PerfView
+
+# OS X
+
+## Proposed Design
+
+On OS X, the performance tooling space is much less fragmented than Linux. However, this also means that there are many fewer options.
+
+**For performance data collection and tracing, we’ll use Instruments.** Instruments is the Apple-built and supported performance tool for OS X. It has a wide range of collection abilities including CPU sampling, context switching, system call tracing, power consumption, memory leaks, etc. It also has support for custom static and dynamic tracing using DTrace as a back-end, which we can take advantage of to provide a logging mechanism for CLR events and EventSource.
+
+Unfortunately, there are some features that Instruments/DTrace do not provide, such as resolution of JIT compiled call frames. Given the existing tooling choices, and the profiler preferences of the OS X community of developers, it likely makes the most sense to use Instruments as our collection and analysis platform, even though it does not support the full set of features that we would like. It’s also true that the number of OS X specific performance issues is likely to be much smaller than the set of all performance issues, which means that in many cases, Windows or Linux can be used, which will provide a more complete story for investigating performance issues.
+
+## Tools Considered
+
+### Instruments
+
+#### Pros
+
+- Available for all recent versions of OS X.
+- Provided free by Apple as part of XCode.
+- Wide range of performance collection options, both using a GUI and on the command line.
+- Can be configured to have relatively low overhead at collection time (unfortunately not the default).
+- Supports static and dynamic tracing via DTrace probes.
+- Supports machine wide and process specific collection.
+- Supports kernel and user-mode call stack collection.
+
+#### Cons
+
+- No support for JIT compiled frame resolution.
+- Closed source - no opportunities for contribution of "missing" features.
+- Closed file format - likely difficult to open a trace in PerfView.
+
+### DTrace
+
+#### Pros
+
+- In-box as part of the OS.
+- Supports static tracing using header files generated by dtrace.
+- Supports dynamic tracing and limited argument capture.
+- Supports kernel and user-mode call stack collection.
+
+#### Cons
+
+- No support for JIT compiled frame resolution - Third party call stack frame resolution feature (jstack) does not work on OS X.
+- Minimal to no investment - DTrace only kept functional for Instruments scenarios.
+- No opportunities for contribution of "missing" features.
+
+## Infrastructure Bring-Up Action Items
+
+- Implement: Enable frame pointers on JIT compiled code and helpers to allow stacks to be walked. (PR # [468](https://github.com/dotnet/coreclr/pull/468))
+- Implement: Trace collection tool
+ - NOTE: Use deferred mode to minimize overhead.
+ - Investigate: Using iprofiler to collect data instead of the instruments UI.
+
+# CLR Events
+
+On Windows, the CLR has a number of ETW events that are used for diagnostic and performance purposes. These events need to be enabled on Linux and OS X so that we can collect and use them for performance investigations.
+
+## Platform Agnostic Action Items
+
+- Implement: Abstract ETW calls to an inline-able platform abstraction layer.
+ - **OPEN ISSUE:** Can / should we re-use PAL?
+- Implement: Stack walker event implementation for x-plat – this is likely the same code for both Linux and OS X.
+
+## Linux Action Items
+
+- Implement: Build mechanics to translate ETW manifest into LTTng tracepoint definitions.
+- Implement: Generate calls to tracepoints in the PAL (see above).
+
+## OS X Action Items
+
+- Implement: Build mechanics to translate ETW manifest into DTrace probe definitions.
+- Implement: Generate calls to probes in PAL (see above).
+
+# EventSource Proposal
+
+Ideally, EventSource operates on Linux and OS X just like it does on Windows. Namely, there is no special registration of any kind that must occur. When an EventSource is initialized, it does everything necessary to register itself with the appropriate logging system (ETW, LTTng, DTrace), such that its events are stored by the logging system when configured to do so.
+
+EventSource should emit events to the appropriate logging system on each operating system. Ideally, we can support the following functionality on all operating systems:
+
+- No pre-registration of events or payload definitions.
+- Enable/disable individual events or sets of events.
+- Strongly typed payload fields.
+
+**Supporting all of these requirements will mean a significant investment.** Today, LTTng and DTrace support all of these requirements, but do so for tracepoints that are defined statically at compile time. This is done by providing tooling that takes a tool specific manifest and generates C code that can then be compiled into the application.
+
+As an example of the kind of work we’ll need to do: LTTng generates helpers that are then called as C module constructors and destructors to register and unregister tracepoint provider definitions. If we want to provide the same level of functionality for EventSource events, we’ll need to understand the generated code and then write our own helpers and register/unregister calls.
+
+While doing this work puts us in an ideal place from a performance and logging verbosity point-of-view, we should make sure that the work done is getting us the proper amount of benefit (e.g. is pay-for-play). As such, **we should start with a much simpler design, and move forward with this more complex solution once we’ve proven that the benefit is clear**.
+
+## Step # 1: Static Event(s) with JSON Payload
+
+As a simple stop-gap solution to get EventSource support on Linux and OS X, we can implement a single EventSource event (or one event per verbosity) that is used to emit all EventSource events regardless of the EventSource that emits them. The payload will be a JSON string that represents the arguments of the event.
+
+## Step # 2: Static Event Generation with Strongly-Typed Payloads
+
+Once we have basic EventSource functionality working, we can continue the investigation into how we’d register/unregister and use strongly typed static tracepoints using LTTng and DTrace, and how we’d call them when an EventSource fires the corresponding event.
+
+## Compatibility Concerns
+
+In general, we should be transparent about this plan, and not require any compatibility between the two steps other than to ensure that our tools continue to work as we transition.
+
+## Step # 1 Bring-Up Action Items
+
+- Implement: A static EventSource tracepoint / probe as a CLR event.
+- Implement: JSON serialization of the event payload.
+- Implement: EventListener implementation for each platform that calls out to the tracepoint / probe.
+
+# Proposed Priorities
+
+Given the significant work required to bring all of this infrastructure up, this is likely to be a long-term investment. As such, it makes sense to aim at the most impactful items first, and continually evaluate where we are along the road.
+
+## Scenarios
+
+We’ll use the following scenarios when defining priorities:
+
+- P1: Performance analysis in support of bring-up of the .NET Core runtime and framework on Linux and OS X.
+- P2: Performance analysis of ASP.NET running on .NET Core on Linux and OS X.
+
+To support these scenarios, we need the following capabilities:
+
+- P1: Collection and analysis of CPU, threading, syscalls, native memory. Support for JIT compiled call frame resolution.
+- P2: Collection and analysis of managed memory, managed thread pool, async, causality, JIT events.
+
+We expect that the following assumptions will hold for the majority of developers and applications:
+
+- Development occurs on Windows or OS X.
+- Application deployment occurs on Windows or Linux.
+
+## Work Items
+
+### Priority 1
+
+- Enable basic performance data collection on Linux with perf_events:
+ - Implement a collection script that makes collection easy for anyone.
+ - Enable JIT compiled code resolution for call stacks in perf_event.
+
+### Priority 2
+
+- Enable more advanced performance data collection for runtime components on Linux:
+ - CLR in-box event support – Emits diagnostic / performance events for GC, JIT, ThreadPool, etc.
+ - Linux EventSource support – Support for FrameworkEventSource, Tasks, Async Causality, and custom EventSource implementations.
+ - Data collection on Linux via LTTng.
+
+### Future:
+
+- Enable Linux traces to be analyzed using PerfView / TraceEvent on Windows.
+- Evaluate options for viewing Linux traces on OS X.
+- Enable more advanced performance data collection for runtime components on OS X via CLR in-box events and EventSource.
diff --git a/Documentation/design-docs/first-class-structs.md b/Documentation/design-docs/first-class-structs.md
new file mode 100644
index 0000000..fd6a376
--- /dev/null
+++ b/