diff options
author | stakx <stakx@eml.cc> | 2018-05-28 17:25:14 +0200 |
---|---|---|
committer | Jan Kotas <jkotas@microsoft.com> | 2018-05-28 08:25:14 -0700 |
commit | 3c4aa12471b0b94634678fa51e97bd7f43396e80 (patch) | |
tree | 0fc9ea68e961d30a49e230e5aa09a22b9f36d257 /Documentation | |
parent | 6bf04a47badd74646e21e70f4e9267c71b7bfd08 (diff) | |
download | coreclr-3c4aa12471b0b94634678fa51e97bd7f43396e80.tar.gz coreclr-3c4aa12471b0b94634678fa51e97bd7f43396e80.tar.bz2 coreclr-3c4aa12471b0b94634678fa51e97bd7f43396e80.zip |
BOTR: Ensure generic params/args are rendered properly (#18140)
* BOTR: Ensure generic params/args are rendered properly
* Use inline code formatting for managed code
* Use inline code formatting for unmanaged code
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/botr/type-loader.md | 90 |
1 files changed, 45 insertions, 45 deletions
diff --git a/Documentation/botr/type-loader.md b/Documentation/botr/type-loader.md index 60a13cd22b..3f2bafb050 100644 --- a/Documentation/botr/type-loader.md +++ b/Documentation/botr/type-loader.md @@ -81,7 +81,7 @@ There are usually many calls to the type loader during JITting. Consider: return new MyClass(); } -In the IL, MyClass is referred to using a metadata token. In order to generate a call to the **JIT\_New** helper which takes care of the actual instantiation, the JIT will ask the type loader to load the type and return a handle to it. This handle will be then directly embedded in the JITted code as an immediate value. The fact that types and members are usually resolved and loaded at JIT time and not at run-time also explains the sometimes confusing behavior easily hit with code like this: +In the IL, MyClass is referred to using a metadata token. In order to generate a call to the `JIT_New` helper which takes care of the actual instantiation, the JIT will ask the type loader to load the type and return a handle to it. This handle will be then directly embedded in the JITted code as an immediate value. The fact that types and members are usually resolved and loaded at JIT time and not at run-time also explains the sometimes confusing behavior easily hit with code like this: object CreateClass() { @@ -92,60 +92,60 @@ In the IL, MyClass is referred to using a metadata token. In order to generate a } } -If **MyClass** fails to load, for example because it's supposed to be defined in another assembly and it was accidentally removed in the newest build, then this code will still throw **TypeLoadException**. The reason that the catch block did not catch it is that it never ran! The exception occurred during JITting and would only be catchable in the method that called **CreateClass** and caused it to be JITted. In addition, it may not be always obvious at which point the JITting is triggered due to inlining, so users should not expect and rely on deterministic behavior. +If `MyClass` fails to load, for example because it's supposed to be defined in another assembly and it was accidentally removed in the newest build, then this code will still throw `TypeLoadException`. The reason that the catch block did not catch it is that it never ran! The exception occurred during JITting and would only be catchable in the method that called `CreateClass` and caused it to be JITted. In addition, it may not be always obvious at which point the JITting is triggered due to inlining, so users should not expect and rely on deterministic behavior. ## Key Data Structures -The most universal type designation in the CLR is the **TypeHandle**. It's an abstract entity which encapsulates a pointer to either a **MethodTable** (representing "ordinary" types like **System.Object** or **List<string>** ) or a **TypeDesc** (representing byrefs, pointers, function pointers, arrays, and generic variables). It constitutes the identity of a type in that two handles are equal if and only if they represent the same type. To save space, the fact that a **TypeHandle** contains a **TypeDesc** is indicated by setting the second lowest bit of the pointer to 1 (i.e. (ptr | 2)) instead of using additional flags<sup>2</sup>. **TypeDesc** is "abstract" and has the following inheritance hierarchy. +The most universal type designation in the CLR is the `TypeHandle`. It's an abstract entity which encapsulates a pointer to either a `MethodTable` (representing "ordinary" types like `System.Object` or `List<string>`) or a `TypeDesc` (representing byrefs, pointers, function pointers, arrays, and generic variables). It constitutes the identity of a type in that two handles are equal if and only if they represent the same type. To save space, the fact that a `TypeHandle` contains a `TypeDesc` is indicated by setting the second lowest bit of the pointer to 1 (i.e. (ptr | 2)) instead of using additional flags<sup>2</sup>. `TypeDesc` is "abstract" and has the following inheritance hierarchy. ![Figure 2](../images/typeloader-fig2.png) Figure 2 The TypeDesc hierarchy -**TypeDesc** +**`TypeDesc`** Abstract type descriptor. The concrete descriptor type is determined by flags. -**TypeVarTypeDesc** +**`TypeVarTypeDesc`** -Represents a type variable, i.e. the **T** in **List<T>** or in **Array.Sort<T>** (see the part about generics below). Type variables are never shared between multiple types or methods so each variable has its one and only owner. +Represents a type variable, i.e. the `T` in `List<T>` or in `Array.Sort<T>` (see the part about generics below). Type variables are never shared between multiple types or methods so each variable has its one and only owner. -**FnPtrTypeDesc** +**`FnPtrTypeDesc`** Represents a function pointer, essentially a variable-length list of type handles referring to the return type and parameters. It's not that common to see this descriptor because function pointers are not supported by C#. However, managed C++ uses them. -**ParamTypeDesc** +**`ParamTypeDesc`** -This descriptor represents a byref and pointer types. Byrefs are the results of the **ref** and **out** C# keywords applied to method parameters<sup>3</sup> whereas pointer types are unmanaged pointers to data used in unsafe C# and managed C++. +This descriptor represents a byref and pointer types. Byrefs are the results of the `ref` and `out` C# keywords applied to method parameters<sup>3</sup> whereas pointer types are unmanaged pointers to data used in unsafe C# and managed C++. -**ArrayTypeDesc** +**`ArrayTypeDesc`** -Represents array types. It is derived from **ParamTypeDesc** because arrays are also parameterized by a single parameter (the type of their element). This is opposed to generic instantiations whose number of parameters is variable. +Represents array types. It is derived from `ParamTypeDesc` because arrays are also parameterized by a single parameter (the type of their element). This is opposed to generic instantiations whose number of parameters is variable. -**MethodTable** +**`MethodTable`** This is by far the central data structure of the runtime. It represents any type which does not fall into one of the categories above (this includes primitive types, and generic types, both "open" and "closed"). It contains everything about the type that needs to be looked up quickly, such as its parent type, implemented interfaces, and the v-table. -**EEClass** +**`EEClass`** -**MethodTable** data are split into "hot" and "cold" structures to improve working set and cache utilization. **MethodTable** itself is meant to only store "hot" data that are needed in program steady state. **EEClass** stores "cold" data that are typically only needed by type loading, JITing or reflection. Each **MethodTable** points to one **EEClass**. +`MethodTable` data are split into "hot" and "cold" structures to improve working set and cache utilization. `MethodTable` itself is meant to only store "hot" data that are needed in program steady state. `EEClass` stores "cold" data that are typically only needed by type loading, JITing or reflection. Each `MethodTable` points to one `EEClass`. -Moreover, **EEClasse**s are shared by generic types. Multiple generic type **MethodTable**s can point to single **EEClass**. This sharing adds additional constrains on data that can be stored on **EEClass**. +Moreover, `EEClass`es are shared by generic types. Multiple generic type `MethodTable`s can point to single `EEClass`. This sharing adds additional constrains on data that can be stored on `EEClass`. -**MethodDesc** +**`MethodDesc`** -It is no surprise that this structure describes a method. It actually comes in a few flavors which have their corresponding **MethodDesc** subtypes but most of them really are out of the scope of this document. Suffice it to say that there is one subtype called **InstantiatedMethodDesc** which plays an important role for generics. For more information please see [**Method Descriptor Design**](method-descriptor.md). +It is no surprise that this structure describes a method. It actually comes in a few flavors which have their corresponding `MethodDesc` subtypes but most of them really are out of the scope of this document. Suffice it to say that there is one subtype called `InstantiatedMethodDesc` which plays an important role for generics. For more information please see [**Method Descriptor Design**](method-descriptor.md). -**FieldDesc** +**`FieldDesc`** -Analogous to **MethodDesc** , this structure describes a field. Except for certain COM interop scenarios, the EE does not care about properties and events at all because they boil down to methods and fields at the end of the day, and it's just compilers and reflection who generate and understand them in order to provide that syntactic sugar kind of experience. +Analogous to `MethodDesc` , this structure describes a field. Except for certain COM interop scenarios, the EE does not care about properties and events at all because they boil down to methods and fields at the end of the day, and it's just compilers and reflection who generate and understand them in order to provide that syntactic sugar kind of experience. -<sup>2</sup> This is useful for debugging. If the value of a **TypeHandle** -ends with 2, 6, A, or E, then it's not a **MethodTable** and the extra +<sup>2</sup> This is useful for debugging. If the value of a `TypeHandle` +ends with 2, 6, A, or E, then it's not a `MethodTable` and the extra bit has to be cleared in order to successfully inspect the -**TypeDesc**. +`TypeDesc`. -<sup>3</sup> Note that the difference between **ref** and **out** is just in a +<sup>3</sup> Note that the difference between `ref` and `out` is just in a parameter attribute. As far as the type system is concerned, they are both the same type. @@ -153,18 +153,18 @@ both the same type. When the type loader is asked to load a specified type, identified for example by a typedef/typeref/typespec **token** and a **Module** , it does not do all the work atomically at once. The loading is done in phases instead. The reason for this is that the type usually depends on other types and requiring it to be fully loaded before it can be referred to by other types would result in infinite recursion and deadlocks. Consider: - classA<T> : C<B<T>> + class A<T> : C<B<T>> { } - classB<T> : C<A<T>> + class B<T> : C<A<T>> { } - classC<T> + class C<T> { } -These are valid types and apparently **A** depends on **B** and **B** depends on **A**. +These are valid types and apparently `A` depends on `B` and `B` depends on `A`. -The loader initially creates the structure(s) representing the type and initializes them with data that can be obtained without loading other types. When this "no-dependencies" work is done, the structure(s) can be referred from other places, usually by sticking pointers to them into another structures. After that the loader progresses in incremental steps and fills the structure(s) with more and more information until it finally arrives at a fully loaded type. In the above example, the base types of **A** and **B** will be approximated by something that does not include the other type, and substituted by the real thing later. +The loader initially creates the structure(s) representing the type and initializes them with data that can be obtained without loading other types. When this "no-dependencies" work is done, the structure(s) can be referred from other places, usually by sticking pointers to them into another structures. After that the loader progresses in incremental steps and fills the structure(s) with more and more information until it finally arrives at a fully loaded type. In the above example, the base types of `A` and `B` will be approximated by something that does not include the other type, and substituted by the real thing later. The exact half-loaded state is described by the so-called load level, starting with CLASS\_LOAD\_BEGIN, ending with CLASS\_LOADED, and having a couple of intermediate levels in between. There are rich and useful comments about individual load levels in the [classloadlevel.h](https://github.com/dotnet/coreclr/blob/master/src/vm/classloadlevel.h) source file. Notice that although types can be saved in NGEN images, the representing structures cannot be simply mapped or blitted into memory and used without additional work called "restoring". The fact that a type came from an NGEN image and needs to be restored is also captured by its load level. @@ -174,7 +174,7 @@ Runtime][generics-design] for more detailed explanation of load levels. ## 2.2 Generics -In the generics-free world, everything is nice and everyone is happy because every ordinary (not represented by a **TypeDesc**) type has one **MethodTable** pointing to its associated **EEClass** which in turn points back to the **MethodTable**. All instances of the type contain a pointer to the **MethodTable** as their first field at offset 0, i.e. at the address seen as the reference value. To conserve space, **MethodDescs** representing methods declared by the type are organized in a linked list of chunks pointed to by the **EEClass**<sup>4</sup>. +In the generics-free world, everything is nice and everyone is happy because every ordinary (not represented by a `TypeDesc`) type has one `MethodTable` pointing to its associated `EEClass` which in turn points back to the `MethodTable`. All instances of the type contain a pointer to the `MethodTable` as their first field at offset 0, i.e. at the address seen as the reference value. To conserve space, `MethodDesc`s representing methods declared by the type are organized in a linked list of chunks pointed to by the `EEClass`<sup>4</sup>. ![Figure 3](../images/typeloader-fig3.png) @@ -183,24 +183,24 @@ Figure 3 Non-generic type with non-generic methods <sup>4</sup> Of course, when managed code runs, it does not call methods by looking them up in the chunks. Calling a method is a very "hot" operation and normally needs to access only information in the -**MethodTable**. +`MethodTable`. ### 2.2.1 Terminology **Generic Parameter** -A placeholder to be substituted by another type; the **T** in the declaration of **List<T>**. Sometimes called formal type parameter. A generic parameter has a name and optional generic constraints. +A placeholder to be substituted by another type; the `T` in the declaration of `List<T>`. Sometimes called formal type parameter. A generic parameter has a name and optional generic constraints. **Generic Argument** -A type being substituted for a generic parameter; the **int** in **List<int>**. Note that a generic parameter can also be used as an argument. Consider: +A type being substituted for a generic parameter; the `int` in `List<int>`. Note that a generic parameter can also be used as an argument. Consider: List<T> GetList<T>() { return new List<T>(); } -The method has one generic parameter **T** which is used as a generic argument for the generic list class. +The method has one generic parameter `T` which is used as a generic argument for the generic list class. **Generic Constraint** @@ -262,27 +262,27 @@ type, they have the typical instantiation in mind. Example: public class A<S, T, U> {} The C# `typeof(A<,,>)` compiles to ldtoken A\'3 which makes the -runtime load **A`3** instantiated at **S** , **T** , **U**. +runtime load ``A`3`` instantiated at `S` , `T` , `U`. **Canonical Instantiation** An instantiation where all generic arguments are -**System.\_\_Canon**. **System.\_\_Canon** is an internal type defined +`System.__Canon`. `System.__Canon` is an internal type defined in **mscorlib** and its task is just to be well-known and different from any other type which may be used as a generic argument. Types/methods with canonical instantiation are used as representatives of all instantiations and carry information shared by -all instantiations. Since **System.\_\_Canon** can obviously not +all instantiations. Since `System.__Canon` can obviously not satisfy any constraints that the respective generic parameter may have on it, constraint checking is special-cased with respect to -**System.\_\_Canon** and ignores these violations. +`System.__Canon` and ignores these violations. ### 2.2.2 Sharing With the advent of generics, the number of types loaded by the runtime tends to be higher. Although generic types with different -instantiations (for example **List<string>** and **List<object>**) -are different types each with its own **MethodTable** , it turns out +instantiations (for example `List<string>` and `List<object>`) +are different types each with its own `MethodTable`, it turns out that there is a considerable amount of information that they can share. This sharing has a positive impact on the memory footprint and consequently also performance. @@ -292,21 +292,21 @@ consequently also performance. Figure 4 Generic type with non-generic methods - shared EEClass Currently all instantiations containing reference types share the same -**EEClass** and its **MethodDescs**. This is feasible because all +`EEClass` and its `MethodDesc`s. This is feasible because all references are of the same size - 4 or 8 bytes - and hence the layout of all these types is the same. The figure illustrates this for -**List<object>** and **List<string>**. The canonical **MethodTable** +`List<object>` and `List<string>`. The canonical `MethodTable` was created automatically before the first reference type instantiation was loaded and contains data which is hot but not instantiation specific like non-virtual slots or -**RemotableMethodInfo**. Instantiations containing only value types +`RemotableMethodInfo`. Instantiations containing only value types are not shared and every such instantiated type gets its own unshared -**EEClass**. +`EEClass`. -**MethodTables** representing generic types loaded so far are cached +`MethodTable`s representing generic types loaded so far are cached in a hash table owned by their loader module<sup>5</sup>. This hash table is consulted before a new instantiation is constructed, making sure -that there will never be two or more **MethodTable** instances +that there will never be two or more `MethodTable` instances representing the same type. See [Design and Implementation of Generics |