class: center, middle .center[![:scale 100%](images\banner.png)] --- class: center, middle .center[![:scale 100%](images\aboutme.png)] --- exclude: true class: center, middle # GC in .NET --- exclude: true ## GC in .NET There are various runtimes but a few GCs: * .NET * .NET Core * Mono * .NET Compact Framework * Silverlight (?) -- exclude: true * .NET Core is open-sourced! - https://github.com/dotnet/coreclr/tree/master/src/gc --- exclude: true ## GC in .NET * But... * gc.cpp -> 2 classes * gc.cpp -> 37624 lines, ~1.2 MB * `#ifdef`, `#ifdef`, `#ifdef`, … --- exclude: true ### Server/Workstation GC `gc.cpp` has <40 kLOC of C++ `.\src\gc\gcsvr.cpp` defines `SERVER_GC` constant and `SVR` namespace: ```cpp #define SERVER_GC 1 namespace SVR { #include "gcimpl.h" // <-- defines MULTIPLE_HEAPS #include "gc.cpp" } ``` `.\src\gc\gcwks.cpp` defines `WKS` namespace: ```cpp namespace WKS { #include "gcimpl.h" #include "gc.cpp" } ``` --- exclude: true ### Server/Workstation GC ...and then the whole `gc.cpp` begins... ```cpp heap_segment* gc_heap::get_segment_for_loh (size_t size * #ifdef MULTIPLE_HEAPS , gc_heap* hp #endif //MULTIPLE_HEAPS ) { * #ifndef MULTIPLE_HEAPS gc_heap* hp = 0; #endif //MULTIPLE_HEAPS heap_segment* res = hp->get_segment (size, TRUE); ``` --- exclude: true ### Non-Concurrent/Background GC * `.\src\gc\gc.cpp` consumes `BACKGROUND_GC` constant * always defined in both SVR and WKS versions * dynamic flag checked ```cpp void GCStatistics::AddGCStats(const gc_mechanisms& settings, size_t timeInMSec) { * #ifdef BACKGROUND_GC * if (settings.concurrent) { bgc.Accumulate((uint32_t)timeInMSec*1000); cntBGC++; } else if (settings.background_p) { // ... ``` --- exclude: true ## GC in .NET * *"My dad collects garbage at Microsoft"* - Patric Dussud's son -- exclude: true * Lead Developer - Maoni Stephens -- exclude: true * History * JScript * -> "JVM" 0.1 * -> "JVM" 0.2 (Lisp to C++) * ~> C++ (CLR) ??? Patric Dussud - Zaczęło się od JScript napisanego w cztery osoby przez kilka weekendów. Kolega (?) miał napisać wersję new/delete, on GC - i choć się nie wyrobił, kolega stwierdził, że jego kod jest na tyle wolny (20x), że GC wolniejszy raczej nie będzie. To był bardzo prosty, tzw. "conservative GC". Potem patrząc na JVM napisał kolejną wersję, opartą na tej JScriptowej, też "conservative" - przed jakąś konferencją chcieli zaimplementować de facto własny JVM. Potem po konsultacjach z kolegą (Devid Moon z Symbolics, zajmujący się generacyjnymi GC) i obczytaniu się, próbując napisać "najlepszy możliwy GC", zaimplementował prototyp w Lispie (bo Lisp miał w tamtym czasie "the best debugging tools"). Potem napisał konwerter, który przekonwertował kod do C++. Potem oczywiście doszlifowany, ale to była (jest) rdzenna część algorytmiczna. --- class: center, middle .center[![:scale 80%](images\poster2.png)] --- class: center, middle .center[![:scale 80%](images\poster1.png)] --- class: middle .center[![:scale 100%](images\vmmapgc.png)] --- class: center, middle # Partitioning --- ### Partitioning By object: * size * type/kind * mutability * lifetime * ... --- ### Partitioning By object: * **size** * type/kind * mutability * **lifetime** * ... --- ### Partitioning - size * copying cost! * different quantities * many, many small objects often allocated * rare large objects --- class: middle, center # Large Object Heap --- ### Large Object Heap * all objects bigger/equal than 85,000 bytes * rare objects like big arrays (mostly), huge strings, ... * in .NET 32-bit runtime double arrays > 1000 elements * CLR uses it by itself, so it is not empty be default * since .NET Core 3.0 `GCLOHThreshold` available * so sad all those recruitment questions... :( * since .NET 1.1 drastic LOH improvement: ended using `malloc`, segments are used instead --- class: center, middle # Small Object Heap --- ### Small Object Heap * all objects smaller than 85,000 bytes * MOST of the objects in our applications * sensible to split them even further --- ### Partitioning - lifetime * weak/strong generational hypothesis .center[![:scale 55%](images\weakstronghyph.png)] --- ### Partitioning - logical summary .center[![:scale 75%](images\paritioning.png)] --- ### Partitioning - through generations .center[![:scale 40%](images\partition01.png)] --- ### Partitioning - through generations (cont.) .center[![:scale 40%](images\partition02.png)] --- class: center, middle ## Partitioning - physical --- ### Partitioning - physical (aka all-at-once) .center[![:scale 55%](images\partition03.png)] --- ### Partitioning - physical (aka all-at-once) .center[![:scale 80%](images\partition04.png)] --- ### Partitioning - physical (aka all-at-once) .center[Segment sizes] .center[![:scale 75%](images\partition05.png)] -- So clearly we have here... .center[![:scale 80%](images\partition04.png)] --- exclude: true ### Partitioning - physical in Workstation GC .center[![:scale 75%](images\partition07.png)] - (a) all-at- once configuration, - (b) two-stage configuration (the same as each-block configuration) --- ### Partitioning - physical (aka all-at-once) .center[![:scale 75%](images\partition06.png)] --- exclude: true ### Partitioning - physical in Server GC .center[![:scale 75%](images\partition08.png)] --- exclude: true ### Partitioning - segment "anatomy" .center[![:scale 55%](images\partition09.png)] -- exclude: true .right[![:scale 20%](images\partition03.png)] --- class: center, middle # .NET memory - allocations --- class: center, middle ## Stack vs Heap vs …? --- class: center, middle .center[![:scale 70%](images\stack01.png)] --- class: center, middle .center[![:scale 70%](images\stack02.png)] --- class: center, middle .center[![:scale 70%](images\stack03.png)] --- class: center, middle .center[![:scale 70%](images\stack04.png)] --- class: center, middle .center[![:scale 70%](images\stack05.png)] --- class: middle, center ## Bump pointer allocator --- ### Bump pointer allocator .center[![:scale 60%](images\allocator01.png)] --- ### Bump pointer allocator .center[![:scale 60%](images\allocator02.png)] --- ### Bump pointer allocator .center[![:scale 60%](images\allocator00.png)] --- ### Bump pointer allocator .center[![:scale 60%](images\allocator03.png)] .center[*Allocation quantum* – 8 kB (1-8kB)] --- class: center, middle .center[![:scale 80%](images\allocator04.png)] --- exclude: true class: center, middle .center[![:scale 80%](images\allocator07_before.png)] .left[ "Dummy" bump pointer allocation and fragmentation problem: * (a) Sweeping Garbage Collection produces fragmentation and if allocation context is not aware of free memory - sad :(, * (b) Compact Garbage Collection reclaims memory by pushing back allocation context but requires a lot of memory copying ] --- class: center, middle .center[![:scale 80%](images\allocator05.png)] "Smart" bump pointer allocation reuses free space! --- class: center, middle .center[![:scale 80%](images\allocator06.png)] --- class: center, middle .center[![:scale 80%](images\allocator07.png)] Compacting still makes sense - from time to time! --- class: middle, center ## Free-list allocator --- exclude: true ### Free-list allocator * searching through a free items list to find a gap big enough * *best-fit* - the smallest block fitting (little leftovers) * *first-fit* - the first block fitting (faster but leftovers) * buckets - first-fit into of buckets of various size ranges * in .NET free list is (partially) stored on the heap itself * "free object" with a predefined MT * keeps size as an array * keeps address of the next "free object" (single-linked list) * keeps special "undo" address * for sizes >= 2*minimum object size .center[![:scale 50%](images\freelist01.png)] --- ### Free-list allocator (cont.) - Buckets as metadata .center[![:scale 70%](images\freelist02.png)] .center[![:scale 40%](images\freelist03.png)] For gen 0 and 1 - free item is being discarded (becomes unusable fragmentation) if it fails to fit the required size. --- exclude: true ### Free-list allocator (cont.) Undo is used to... undo planned free-items usage. Just operation on single-linked list. .center[![:scale 60%](images\freelist04-undo.png)] --- class: center, middle ## Allocation... creating a new object --- ### Creating a new object ```cs var obj = new SomeClass(); ``` becomes ```cs newobj instance void SomeClass::.ctor() ``` Question: * who resets object's fields to defaults? * who decides where to allocate (SOH/LOH)? --- #### `newobj`'s JIT decision path .center[![:scale 55%](images\allocator08.png)] --- ### Small Object Heap allocation * mostly - **bump-pointer allocation** inside the current allocation context * `JIT_TrialAllocSFastMP_InlineGetThread` * fallbacks to `JIT_NEW` in case of allocation context being full --- ```asm ; As input, rcx contains MethodTable pointer ; As result, rax contains new object address LEAF_ENTRY JIT_TrialAllocSFastMP_InlineGetThread, _TEXT ; Read object size into edx ; m_BaseSize is guaranteed to be a multiple of 8. mov edx, [rcx + OFFSET__MethodTable__m_BaseSize] ; Read Thread Local Storage address into r11 INLINE_GETTHREAD r11 ; Read alloc_limit into r10 mov r10, [r11 + OFFSET__Thread__m_alloc_context__alloc_limit] ; Read alloct_ptr into rax mov rax, [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr] add rdx, rax ; rdx = alloc_ptr + size cmp rdx, r10 ; is rdx smaller than alloc_limit ja AllocFailed ; Update alloc_ptr in TLS mov [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr], rdx ; Store MT under alloc_ptr address (constituting new object) mov [rax], rcx ret AllocFailed: jmp JIT_NEW ; fast-path failed, jump to slow-path LEAF_END JIT_TrialAllocSFastMP_InlineGetThread, _TEXT ``` --- #### `JIT_NEW` helper The same as used for objects with finalizer or in LOH. * "slower" C++ bump-pointer allocator (because it is generic for both SOH/LOH) * if fails, the whole story begins - the true "slow-path": * trying to use existing, unused space in. It will: * Try to use free list to find a suitable free gap for a new allocation context - **free-list allocation of a new allocation context** * Try to adjust allocation limit inside already Commited memory * Try to Commit more memory from Reserved memory and adjust allocation limit inside. * If all above fails, GC will be triggered * If all above fails - `OutOfMemoryException` :( .center[![:scale 50%](images\allocator05.png)] --- class: center, middle .center[![:scale 50%](images\allocator10.png)] --- class: center, middle .center[![:scale 75%](images\allocator09.png)] --- ### Large Object Heap allocation * **free-list allocation** and simplified bump-pointer at the end of the segment * no use of allocation context * ... thus synchronization overhead * ... and memory zeroing overhead * only "slow-path": * try to use free list to find a suitable free gap for an object * in each segment containing LOH: * try to adjust allocation limit inside already Committed memory, * try to Commit more memory from Reserved memory and adjust allocation limit inside * if all above fails, GC will be triggered. * If all above fails - `OutOfMemoryException` :( --- ### Tracing * Mark * Sweep * Compact --- class: center, middle # Mark --- ### Mark .center[![:scale 60%](images\mark01.png)] --- .center[![:scale 60%](images\mark02.png)] --- .center[![:scale 60%](images\mark03.png)] --- .center[![:scale 60%](images\mark04.png)] --- .center[![:scale 60%](images\mark05.png)] --- .center[![:scale 60%](images\mark06.png)] --- .center[![:scale 60%](images\mark07.png)] --- .center[![:scale 60%](images\mark08.png)] --- .center[![:scale 60%](images\mark09.png)] --- .center[![:scale 60%](images\mark10.png)] --- .center[![:scale 60%](images\mark11.png)] --- .center[![:scale 60%](images\mark12.png)] -- Roots: -- * stack * registers * static data * thread static data * internal structures * string interning * finalization queue --- ### Roots - statics ```cs public class ExampleClass { public static int StaticPrimitive; public static S StaticStruct; public static R StaticObject = new R(); } public class R { public int Value; } public struct S { public int Value; } ``` -- * Static data have per AppDomain scope - there may be multiple if same assembly loaded into multiple AppDomains * Where do they live: * primitive data - *High Frequency Heap* of the corresponding application domain (part of its *Loader Heap*). * reference type instances - on the regular GC Heap (additionally referenced by the internal “statics table”) * user-defined value-type instances - same! * so... in what generation they will land? --- ### Roots - statics .center[![:scale 65%](images\roots-statics.png)] --- ### Roots - string interning .center[![:scale 60%](images\roots-interning.png)] --- ### Thread local storage - `ThreadStatic` & `ThreadLocal
` ```csharp [ThreadStatic] public static string threadStatic = ...; public static ThreadLocal
threadlocal = ...; ``` --- ### Roots - Thread Local Storage .center[![:scale 60%](images\roots-TLS.png)] --- class: center, middle # Sweep --- ### Sweep .center[![:scale 65%](images\mark01.png)] .center[![:scale 35%](images\mark12.png)] --- ### Sweep .center[![:scale 65%](images\sweep01.png)] .center[![:scale 35%](images\mark12.png)] -- .center[.red[Fragmentation!]] --- class: center, middle # Compact --- ### Compact - in-place .center[![:scale 65%](images\compact08.png)] .center[![:scale 35%](images\mark12.png)] --- class: center, middle # Plan-Sweep/Compact --- ### Plan-Sweep/Compact .center[![:scale 80%](images\gcplan01.png)] -- .center[![:scale 80%](images\gcplan02.png)] -- .center[![:scale 80%](images\gcplan03.png)] -- .center[![:scale 80%](images\gcplan04.png)] *Relocation offset* for each plug, *size* for each gap (in fact, also plug-with-gap) --- ### Plan-Sweep/Compact (cont.) .center[![:scale 80%](images\gcplan05.png)] --- ### Plan-Sweep/Compact (cont.) .center[![:scale 35%](images\gcplan06.png)] -- .center[![:scale 80%](images\gcplan07.png)] -- .center[![:scale 80%](images\gcplan08.png)] .center[Binary tree of plugs info.] --- ### Plan-Sweep/Compact (cont.) - sidenote ```cmd > !dumpheap -stat The garbage collector data structures are not in a valid state for traversal. It is either in the "plan phase," where objects are being moved around, or we are at the initialization or shutdown of the gc heap. Commands related to displaying, finding or traversing objects as well as gc heap segments may not work properly. !dumpheap and !verifyheap may incorrectly complain of heap consistency errors. ``` --- ### Plan-Sweep/Compact (cont.) - bricktable .center[![:scale 80%](images\gcplan09.png)] .center[Brick size is 2,048 B for 32bit and 4,096 B for 64-bit runtimes] -- .right[![:scale 60%](images\gcplan10.png)] --- ### Plan-Sweep/Compact (cont.) .right[![:scale 60%](images\gcplan10.png)] What will be the **new address** of the object at address X? * calculate the brick table entry based on address X - by simply dividing it by a brick size * if brick table entry is <0 - jump into proper brick table entry and repeat. * if brick table entry is >0 - start to traverse plug tree to find proper plug. * get relocation offset from the plug and subtract it from X. --- ### Plan-Sweep/Compact & Pinning Sources: * `GCHandleType.Pinned` * `fixed` block ```cs byte[] buffer = new byte[255]; fixed (byte* p = buffer) { IntPtr ptr = (IntPtr)p; // do you stuff here } ``` * some internal Interop (marshal LPWSTR as String), some I/O libraries --- ### Pinning .center[![:scale 35%](images\pinning01.png)] --- ### Pinning .center[![:scale 35%](images\pinning02.png)] --- ### Pinning .center[![:scale 35%](images\pinning03.png)] --- ### Pinning .center[![:scale 35%](images\pinning04.png)] --- ### Pinning .center[![:scale 35%](images\pinning05.png)] --- exclude: true ### Pinning * *local pinned variables* - objects that are local variables, often created implicitly by using **`fixed`** keyword * short lifetime - memory dump or Heap Snapshot may show a very little * `PinObjectAtGCTime` ETW event emitted from GC when pinning (at Mark phase) * *pinned handles* - objects that are pinned explicitly by **pinned handle reference** * include some internal objects held by CLR * created by `GCHandle.Allocate` * usually longer lifetime - memory dump or Heap Snapshot may show more * also `PinObjectAtGCTime` ETW event emitted from GC when pinning (at Mark phase) * **remark**: but only for the generation(s) that the GC is collecting (since handle table is generation aware) --- ### Plan phase - pinning .center[![:scale 80%](images\gcplanpin01.png)] -- .center[![:scale 80%](images\gcplanpin02.png)] --- ### Plan phase - pinning *Pinned plug* - special marked plug * zeroed relocation offset * additional info about free space before it (in case of compaction) .center[![:scale 80%](images\gcplanpin03.png)] Easy case because there was a gap before it. --- ### Plan phase - pinning (cont.) Pinned plug after normal plug .center[![:scale 80%](images\gcplanpin04.png)] --- ### Plan phase - pinning (cont.) Pinned plug before normal plug .center[![:scale 80%](images\gcplanpin05.png)] --- ### Plan phase - pinning (cont.) Pinned plug is located before at least two marked objects .center[![:scale 80%](images\gcplanpin06.png)] --- ### Plan phase - pinning (cont.) Pinned plug is located inside larger block of marked objects .center[![:scale 80%](images\gcplanpin07.png)] Pinning consequences summary: * copying pre and post plugs introduces memory traffic * pinned plug can be extended by a single object so more memory is being pinned than it could be * during Plan phase some objects on the Managed Heap are "destroyed" making it not "walkable" in a normal way --- exclude: true ### Ref return (C# 7.0+) Return type as *managed pointer* - obviously with limitations: > *"The return value must have a lifetime that extends beyond the execution of the method. In other words, it cannot be a local variable in the method that returns it. It can be an instance or static field of a class, or it can be an argument passed to the method"* - MSDN -- exclude: true Bad: ```cs public static ref int ReturnByRefValueTypeInterior(int index) { int localInt = 7; return ref localInt; // Compilation error: Cannot return local 'localInt' by // reference because it is not a ref local } ``` -- exclude: true Good: ```cs public static ref int GetArrayElementByRef(int[] array, int index) { return ref array[index]; } ``` --- exclude: true ### Ref return (cd.) Bad or good? ```cs public static ref int ReturnByRefReferenceTypeInterior(int index) { int[] localArray = new[] { 1, 2, 3 }; return ref localArray[index]; } ``` --- exclude: true ### Ref return - an example ```cs public class RefValueBookCollection { public ValueBook[] books = { new ValueBook { Title = "Call of the Wild, The", Author = "Jack London" }, new ValueBook { Title = "Tale of Two Cities, A", Author = "Charles Dickens" } }; private ValueBook nobook = default; public ref ValueBook GetBookByTitle(string title) { for (int ctr = 0; ctr < books.Length; ctr++) if (title == books[ctr].Title) return ref books[ctr]; return ref nobook; } public static void Run() { var collection = new RefValueBookCollection(); ref var book = ref collection.GetBookByTitle("Call of the Wild, The"); book.Author = "Konrad Kokosa"; Console.WriteLine(collection.books[0].Author); } } ``` -- exclude: true **Question:** what is the result? -- exclude: true ```bash Konrad Kokosa ``` --- exclude: true class: middle, center # *Byref* internals --- exclude: true ### *Byref* internals ```cs public static ref int ReturnByRefReferenceTypeInterior(int index) { int[] localArray = new[] { 1, 2, 3 }; return ref localArray[index]; } ref int local = ref ReturnByRefReferenceTypeInterior(2); ``` .center[![:scale 60%](images\managedpointer01-1.png)] -- exclude: true * `local` is seen as so-called *interior pointer* - a pointer **into** an object * it is just an ordinary pointer... * unless GC happens - it now must be interpreted as the containing object root --- exclude: true ### *Byref* internals (cd.) .pull-right[![:scale 80%](images\managedpointer01-1.png)] .pull-left[ ```cs ref int local = ref ReturnByRef(2); ``` ] -- exclude: true .float-right[Interpretation:] -- exclude: true .center[![:scale 80%](images\bricks01.png)] *brick* - 4kB memory region: * during *Mark* phase - we serch inside **the whole region** to find a corresponding object for an *interior pointer* --- exclude: true ### *Byref* internals (cd.) Interpretation: .center[![:scale 80%](images\bricks02.png)] *brick* - 4kB memory region: * during *Compact* phase - we just search inside a specific *plug* found through *plug tree* --- exclude: true ### *Byref* internals (cd.) ```cs [MethodImpl(MethodImplOptions.NoInlining)] public static void Test(ref int data) { data = 11; // it now has to be a root! } ``` -- exclude: true ```asm PassingByref.Test(Int32 ByRef) L0000: mov dword [rcx], 0xb L0006: ret ``` -- exclude: true .pull-left[ ```asm PassingByref.Test(Int32 ByRef) push rdi push rsi sub rsp,28h mov rsi,rcx 00000009 interruptible 00000009 +rsi(interior) ... 0000003a not interruptible 0000003a -rsi(interior) add rsp,28h pop rsi pop rdi ret ``` ] -- exclude: true .pull-right[ A method may be: * *fully interruptible* - it can be suspended at "every line" * *partially interruptible* - it can be suspended only at "safe-point" (mainly method calls) * not interruptible at all For the first two - there is a risk of the GC suspension overhead. ] --- class: center, middle # That's all! Thank .red[**you**]! Any questions?!