class: center, middle # Writing a custom, real-world .NET GC ### Konrad Kokosa --- ### Agenda .large[ * **why** we even care about custom .NET GC? * **what** we have to do to write it? * **how** we can write something **working**? * **how** we can write something **useful**? * well... at least start writing... ] --- class: center, middle ## Who have attended or watched my **DotNext Moscow** talk?! --- class: center, middle # Welcome To The World Of Custom GCs! -- ### which in .NET does not exist so much... --- class: center, middle, section # Why... -- ## ... we even care about custom .NET GC? --- ### Java .center[] --- ### Java ```cmd -server -Xms24G -Xmx24G -XX:PermSize=512m -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5 -XX:InitiatingHeapOccupancyPercent=70 ``` ```cmd -server -Xss4096k -Xms12G -Xmx12G -XX:MaxPermSize=512m -XX:+HeapDumpOnOutOfMemoryError -verbose:gc -Xmaxf1 -XX:+UseCompressedOops -XX:+DisableExplicitGC -XX:+AggressiveOpts -XX:+ScavengeBeforeFullGC -XX:CMSFullGCsBeforeCompaction=10 -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:+CMSParallelRemarkEnabled -XX:GCTimeRatio=19 -XX:+UseAdaptiveSizePolicy -XX:MaxGCPauseMillis=500 -XX:+PrintGCTaskTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationConcurrentTime -XX:+PrintTenuringDistribution -Xloggc:gc.log ``` --- exclude: true class: center, middle  #### Jack of all trades is master of none. --- class: center  --- exclude: true ## Level 0 customization - .NET GC modes | | Concurrent (false) | Concurrent (true) | | ----------- |:-------------------------- |:-----------------------| | Workstation | Non-Concurrent Workstation | .good[Background Workstation] | | Server | Non-Concurrent Server | .good[Background Server] | --- ## Level 0 customization - .NET GC modes * Non-Concurrent Workstation * .good[Background Workstation] * Non-Concurrent Server * .good[Background Server] --- exclude: true class: center, middle ## How it is **currently** written? --- exclude: true ### Server/Workstation GC `gc.cpp` has <40 kLOC of C++ `.\src\gc\gcsvr.cpp` defines `SERVER_GC` constant and `SVR` namespace: ```cpp #define SERVER_GC 1 namespace SVR { #include "gcimpl.h" // <-- defines MULTIPLE_HEAPS #include "gc.cpp" } ``` `.\src\gc\gcwks.cpp` defines `WKS` namespace: ```cpp namespace WKS { #include "gcimpl.h" #include "gc.cpp" } ``` --- exclude: true ### Server/Workstation GC ...and then the whole `gc.cpp` begins... ```cpp heap_segment* gc_heap::get_segment_for_loh (size_t size * #ifdef MULTIPLE_HEAPS , gc_heap* hp #endif //MULTIPLE_HEAPS ) { * #ifndef MULTIPLE_HEAPS gc_heap* hp = 0; #endif //MULTIPLE_HEAPS heap_segment* res = hp->get_segment (size, TRUE); ``` --- exclude: true ### Non-Concurrent/Background GC * `.\src\gc\gc.cpp` consumes `BACKGROUND_GC` constant * always defined in both SVR and WKS versions * dynamic flag checked ```cpp void GCStatistics::AddGCStats(const gc_mechanisms& settings, size_t timeInMSec) { * #ifdef BACKGROUND_GC * if (settings.concurrent) { bgc.Accumulate((uint32_t)timeInMSec*1000); cntBGC++; } else if (settings.background_p) { // ... ``` --- ### Level 1 customization - Additional GC knobs: * `GCNoAffinitize` and `GCHeapAffinitizeMask`: ```xml
``` * Latency Modes * Latency Optimization Goals CoreCLR comment: *"Latency modes required user to have specific GC knowledge (e.g., budget, full-blocking GC). We are trying to move away from them as it makes a lot more sense for users to tell us what’s the most important out of the performance aspects that make sense to them"* * CLR Hosting --- ### Level 1 customization - Additional GC knobs (cont.) https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/clr-configuration-knobs.md#garbage-collector-configuration-knobs .center[] --- ### Level 1 customization - Additional GC knobs (cont.) * `GCCompactRatio` - Specifies the ratio compacting GCs vs sweeping * `gcForceCompact` - When set to true, always do compacting GC * `GCGen0MaxBudget` - Specifies the largest gen0 allocation budget * `GCgen0size` - Specifies the smallest gen0 size * `GCHeapHardLimit` - Specifies the maximum commit size for the GC heap * `GCLOHCompact` - Specifies the LOH compaction mode * `GCLOHThreshold` - Specifies the size that will make objects go on LOH * `GCSegmentSize` - Specifies the managed heap segment size * `gcTrimCommitOnLowMemory` - When set we trim the committed space more aggressively for the ephemeral seg. This is used for running many instances of server processes where they want to keep as little memory committed as possible * `GCConfigLogEnabled` - Specifies if you want to turn on config logging in GC * `GCConfigLogFile` - Specifies the name of the GC config log file --- class: center, middle ## Level `int.MaxValue` customization - Custom GC in .NET --- ### Custom GC in .NET - What can be done with it? -- * .large[we can replace **the whole** GC mechanism] -- * .large[we **can't** simply replace some parts of it] -- * .large[The Ultimate Goal - the GC better than current default :D] -- * .large[The Realistic Goal - write anything that works :)] --- class: center, middle  --- ## Let's write Minimum Valuable Product - Zero GC .large[ * only allocating * no Garbage Collection at all ] --- class: center, middle  --- class: center, middle  --- ## ... like they did in JVM - Epsilon GC .center[https://openjdk.java.net/jeps/318] .center[] --- class: middle, center  --- ## Epsilon GC/Zero GC motivation -- * **Performance testing** - *"differential performance analysis for other, real GCs. Having a no-op GC can help to filter out GC-induced performance artifacts"* -- * **Memory pressure testing** - *"Having a GC that accepts only the bounded number of allocations, and fails on heap exhaustion, simplifies testing"* -- * **VM interface testing** - *"helps to understand the absolute minimum required from the VM-GC interface to have a functional allocator"* -- * **Extremely short lived jobs** - *"A short-lived job might rely on exiting quickly to free the resources (e.g. heap memory)"* -- * **Last-drop latency improvements** - for *"(almost) completely garbage-free applications, accepting the GC cycle might be a design issue"* -- and... -- * **Template** - base for future, real custom GCc -- * **Learning** -- * **Fun** --- class: section ### Summary .large[**Why** we even care about custom .NET GC?] -- * .large[because the default configuration possibilities are quite limited] -- * .large[because it is a promise to create something more:] -- * .large[fine-tuned to specific needs] * .large[more configurable] * .large[more ...] -- * .large[because we learn a lot about the runtime and the GC] -- * .large[NOT because the default one is bad, it is pretty awesome in fact] --- class: center, middle, section # What... -- ## ... we have to do to write it? --- class: center, middle # Custom GC #### (aka Local GC) --- .center[] --- State at DotNext Moscow: .center[] --- State at now: .center[] --- ### Usage Since .NET Core 2.1: * `set COMPlus_GCName=f:\CoreCLR.ZeroGC\x64\Release\OurCustomGC.dll` Since .NET Core 3.0 preview: * `set COMPlus_GCName=OurCustomGC.dll` --- ### Implementing * regular C++ library (i.e. created in Visual Studio) * include only three files from CoreCLR: ```cpp #include "debugmacros.h" #include "gcenv.base.h" #include "gcinterface.h" ``` * implement two exported simple methods * `GC_Initialize` * `GC_VersionInfo` * implement the rest of the GC: * `IGCHeap` - responsible for... everything * `IGCHandleManager` and `IGCHandleStore` - responsible for handling... handles --- # Is it difficult? -- ### It is not difficult... but it requires **very deep** knowledge about the runtime and... the GC --- class: middle .center[] --- ### Implementing - cont. Specifying which GC API version our custom GC supports: ```cpp extern "C" DLLEXPORT void GC_VersionInfo( /* Out */ VersionInfo* result ) { result->MajorVersion = GC_INTERFACE_MAJOR_VERSION; result->MinorVersion = GC_INTERFACE_MINOR_VERSION; result->BuildVersion = 0; } ``` Currently it is still 2.1 (even in 3.0 preview) --- ### Implementing - cont. Specifying pointers to our custom `IGCHeap` and `IGCHandleManager` implementations: ```cpp extern "C" DLLEXPORT HRESULT GC_Initialize( /* In */ IGCToCLR* clrToGC, /* Out */ IGCHeap** gcHeap, /* Out */ IGCHandleManager** gcHandleManager, /* Out */ GcDacVars* gcDacVars ) { IGCHeap* heap = new ZeroGCHeap(clrToGC); IGCHandleManager* handleManager = new ZeroGCHandleManager(); *gcHeap = heap; *gcHandleManager = handleManager; return S_OK; } ``` --- ### Implementing - cont. Specifying pointers to our custom `IGCHeap` and `IGCHandleManager` implementations: ```cpp extern "C" DLLEXPORT HRESULT GC_Initialize( /* In */ IGCToCLR* clrToGC, * /* Out */ IGCHeap** gcHeap, * /* Out */ IGCHandleManager** gcHandleManager, /* Out */ GcDacVars* gcDacVars ) { IGCHeap* heap = new ZeroGCHeap(clrToGC); IGCHandleManager* handleManager = new ZeroGCHandleManager(); * *gcHeap = heap; * *gcHandleManager = handleManager; return S_OK; } ``` --- ### Implementing - cont. Remembering `IGCToCLR`: ```cpp extern "C" DLLEXPORT HRESULT GC_Initialize( * /* In */ IGCToCLR* clrToGC, /* Out */ IGCHeap** gcHeap, /* Out */ IGCHandleManager** gcHandleManager, /* Out */ GcDacVars* gcDacVars ) { * IGCHeap* heap = new ZeroGCHeap(clrToGC); IGCHandleManager* handleManager = new ZeroGCHandleManager(); *gcHeap = heap; *gcHandleManager = handleManager; return S_OK; } ``` As it provides so convenient API as: * `SuspendEE` and `RestartEE` methods for thread suspensions * `GcScanRoots` for methods root scanning * ... --- ### `IGCHeap` Meet The King: ```cpp class ZeroGCHeap : public IGCHeap { private: IGCToCLR* gcToCLR; public: ZeroGCHeap(IGCToCLR* gcToCLR) { this->gcToCLR = gcToCLR; } // Inherited via IGCHeap ... 75 methods! } ``` --- exclude: true ```cpp // Inherited via IGCHeap virtual bool IsValidSegmentSize(size_t size) override; virtual bool IsValidGen0MaxSize(size_t size) override; virtual size_t GetValidSegmentSize(bool large_seg = false) override; virtual void SetReservedVMLimit(size_t vmlimit) override; virtual void WaitUntilConcurrentGCComplete() override; virtual bool IsConcurrentGCInProgress() override; virtual void TemporaryEnableConcurrentGC() override; virtual void TemporaryDisableConcurrentGC() override; virtual bool IsConcurrentGCEnabled() override; virtual HRESULT WaitUntilConcurrentGCCompleteAsync(int millisecondsTimeout) override; // Use in native threads. TRUE if succeed. FALSE if failed or timeout virtual bool FinalizeAppDomain(void* pDomain, bool fRunFinalizers) override; virtual void SetFinalizeQueueForShutdown(bool fHasLock) override; virtual size_t GetNumberOfFinalizable() override; virtual bool ShouldRestartFinalizerWatchDog() override; virtual Object* GetNextFinalizable() override; virtual void SetFinalizeRunOnShutdown(bool value) override; virtual int GetGcLatencyMode() override; virtual int SetGcLatencyMode(int newLatencyMode) override; virtual int GetLOHCompactionMode() override; virtual void SetLOHCompactionMode(int newLOHCompactionMode) override; virtual bool RegisterForFullGCNotification(uint32_t gen2Percentage, uint32_t lohPercentage) override; virtual bool CancelFullGCNotification() override; virtual int WaitForFullGCApproach(int millisecondsTimeout) override; virtual int WaitForFullGCComplete(int millisecondsTimeout) override; virtual unsigned WhichGeneration(Object* obj) override; virtual int CollectionCount(int generation, int get_bgc_fgc_coutn = 0) override; virtual int StartNoGCRegion(uint64_t totalSize, bool lohSizeKnown, uint64_t lohSize, bool disallowFullBlockingGC) override; virtual int EndNoGCRegion() override; virtual size_t GetTotalBytesInUse() override; virtual HRESULT GarbageCollect(int generation = -1, bool low_memory_p = false, int mode = collection_blocking) override; virtual unsigned GetMaxGeneration() override; virtual void SetFinalizationRun(Object* obj) override; ``` --- exclude: true ```cpp // Inherited via IGCHeap virtual bool IsValidSegmentSize(size_t size) override; virtual bool IsValidGen0MaxSize(size_t size) override; virtual size_t GetValidSegmentSize(bool large_seg = false) override; virtual void SetReservedVMLimit(size_t vmlimit) override; virtual void WaitUntilConcurrentGCComplete() override; virtual bool IsConcurrentGCInProgress() override; virtual void TemporaryEnableConcurrentGC() override; virtual void TemporaryDisableConcurrentGC() override; virtual bool IsConcurrentGCEnabled() override; virtual HRESULT WaitUntilConcurrentGCCompleteAsync(int millisecondsTimeout) override; // Use in native threads. TRUE if succeed. FALSE if failed or timeout virtual bool FinalizeAppDomain(void* pDomain, bool fRunFinalizers) override; virtual void SetFinalizeQueueForShutdown(bool fHasLock) override; virtual size_t GetNumberOfFinalizable() override; virtual bool ShouldRestartFinalizerWatchDog() override; virtual Object* GetNextFinalizable() override; virtual void SetFinalizeRunOnShutdown(bool value) override; virtual int GetGcLatencyMode() override; virtual int SetGcLatencyMode(int newLatencyMode) override; virtual int GetLOHCompactionMode() override; virtual void SetLOHCompactionMode(int newLOHCompactionMode) override; virtual bool RegisterForFullGCNotification(uint32_t gen2Percentage, uint32_t lohPercentage) override; virtual bool CancelFullGCNotification() override; virtual int WaitForFullGCApproach(int millisecondsTimeout) override; virtual int WaitForFullGCComplete(int millisecondsTimeout) override; virtual unsigned WhichGeneration(Object* obj) override; virtual int CollectionCount(int generation, int get_bgc_fgc_coutn = 0) override; virtual int StartNoGCRegion(uint64_t totalSize, bool lohSizeKnown, uint64_t lohSize, bool disallowFullBlockingGC) override; virtual int EndNoGCRegion() override; virtual size_t GetTotalBytesInUse() override; * virtual HRESULT GarbageCollect(int generation = -1, bool low_memory_p = false, int mode = collection_blocking) override; virtual unsigned GetMaxGeneration() override; virtual void SetFinalizationRun(Object* obj) override; ``` --- exclude: true ```cpp virtual bool RegisterForFinalization(int gen, Object* obj) override; virtual HRESULT Initialize() override; virtual bool IsPromoted(Object* object) override; virtual bool IsHeapPointer(void* object, bool small_heap_only = false) override; virtual unsigned GetCondemnedGeneration() override; virtual bool IsGCInProgressHelper(bool bConsiderGCStart = false) override; virtual unsigned GetGcCount() override; virtual bool IsThreadUsingAllocationContextHeap(gc_alloc_context* acontext, int thread_number) override; virtual bool IsEphemeral(Object* object) override; virtual uint32_t WaitUntilGCComplete(bool bConsiderGCStart = false) override; virtual void FixAllocContext(gc_alloc_context* acontext, bool lockp, void* arg, void* heap) override; virtual size_t GetCurrentObjSize() override; virtual void SetGCInProgress(bool fInProgress) override; virtual bool RuntimeStructuresValid() override; virtual size_t GetLastGCStartTime(int generation) override; virtual size_t GetLastGCDuration(int generation) override; virtual size_t GetNow() override; virtual Object* Alloc(gc_alloc_context* acontext, size_t size, uint32_t flags) override; virtual Object* AllocLHeap(size_t size, uint32_t flags) override; virtual Object* AllocAlign8(gc_alloc_context* acontext, size_t size, uint32_t flags) override; virtual void PublishObject(uint8_t* obj) override; virtual void SetWaitForGCEvent() override; virtual void ResetWaitForGCEvent() override; virtual bool IsObjectInFixedHeap(Object* pObj) override; virtual void ValidateObjectMember(Object* obj) override; virtual Object* NextObj(Object* object) override; virtual Object* GetContainingObject(void* pInteriorPtr, bool fCollectedGenOnly) override; virtual void DiagWalkObject(Object* obj, walk_fn fn, void* context) override; virtual void DiagWalkHeap(walk_fn fn, void* context, int gen_number, bool walk_large_object_heap_p) override; virtual void DiagWalkSurvivorsWithType(void* gc_context, record_surv_fn fn, void* diag_context, walk_surv_type type) override; virtual void DiagWalkFinalizeQueue(void* gc_context, fq_walk_fn fn) override; virtual void DiagScanFinalizeQueue(fq_scan_fn fn, ScanContext* context) override; ``` --- exclude: true ```cpp virtual bool RegisterForFinalization(int gen, Object* obj) override; virtual HRESULT Initialize() override; virtual bool IsPromoted(Object* object) override; virtual bool IsHeapPointer(void* object, bool small_heap_only = false) override; virtual unsigned GetCondemnedGeneration() override; virtual bool IsGCInProgressHelper(bool bConsiderGCStart = false) override; virtual unsigned GetGcCount() override; virtual bool IsThreadUsingAllocationContextHeap(gc_alloc_context* acontext, int thread_number) override; virtual bool IsEphemeral(Object* object) override; virtual uint32_t WaitUntilGCComplete(bool bConsiderGCStart = false) override; virtual void FixAllocContext(gc_alloc_context* acontext, bool lockp, void* arg, void* heap) override; virtual size_t GetCurrentObjSize() override; virtual void SetGCInProgress(bool fInProgress) override; virtual bool RuntimeStructuresValid() override; virtual size_t GetLastGCStartTime(int generation) override; virtual size_t GetLastGCDuration(int generation) override; virtual size_t GetNow() override; * virtual Object* Alloc(gc_alloc_context* acontext, size_t size, uint32_t flags) override; * virtual Object* AllocLHeap(size_t size, uint32_t flags) override; * virtual Object* AllocAlign8(gc_alloc_context* acontext, size_t size, uint32_t flags) override; virtual void PublishObject(uint8_t* obj) override; virtual void SetWaitForGCEvent() override; virtual void ResetWaitForGCEvent() override; virtual bool IsObjectInFixedHeap(Object* pObj) override; virtual void ValidateObjectMember(Object* obj) override; virtual Object* NextObj(Object* object) override; virtual Object* GetContainingObject(void* pInteriorPtr, bool fCollectedGenOnly) override; virtual void DiagWalkObject(Object* obj, walk_fn fn, void* context) override; virtual void DiagWalkHeap(walk_fn fn, void* context, int gen_number, bool walk_large_object_heap_p) override; virtual void DiagWalkSurvivorsWithType(void* gc_context, record_surv_fn fn, void* diag_context, walk_surv_type type) override; virtual void DiagWalkFinalizeQueue(void* gc_context, fq_walk_fn fn) override; virtual void DiagScanFinalizeQueue(fq_scan_fn fn, ScanContext* context) override; ``` --- exclude: true ```cpp virtual void DiagScanHandles(handle_scan_fn fn, int gen_number, ScanContext* context) override; virtual void DiagScanDependentHandles(handle_scan_fn fn, int gen_number, ScanContext* context) override; virtual void DiagDescrGenerations(gen_walk_fn fn, void* context) override; virtual void DiagTraceGCSegments() override; virtual bool StressHeap(gc_alloc_context* acontext) override; virtual segment_handle RegisterFrozenSegment(segment_info *pseginfo) override; virtual void UnregisterFrozenSegment(segment_handle seg) override; virtual void ControlEvents(GCEventKeyword keyword, GCEventLevel level) override; virtual void ControlPrivateEvents(GCEventKeyword keyword, GCEventLevel level) override; virtual void GetMemoryInfo(uint32_t * highMemLoadThreshold, uint64_t * totalPhysicalMem, uint32_t * lastRecordedMemLoad, size_t * lastRecordedHeapSize, size_t * lastRecordedFragmentation) override; virtual void SetSuspensionPending(bool fSuspensionPending) override; virtual void SetYieldProcessorScalingFactor(uint32_t yieldProcessorScalingFactor) override; ``` --- class: center, middle ## So, what we MUST implement? --- class: middle .center[] --- class: middle .center[] --- class: middle .center[] --- class: section ### Summary .large[**What** we have to do to write it?] -- * .large[implement only few simple classes as C++ library] -- * .large[in fact we can omit quite a lot of methods * by providing dummy implementation] --- class: center, middle, section # How... -- ## ... we can write something **working**? --- class: center, middle  --- ### Zero GC Most `IGCHeap` methods may be dummy: ```cpp bool CustomGCHeap::RuntimeStructuresValid() { return true; } bool ZeroGCHeap::IsPromoted(Object * object) { return false; } unsigned ZeroGCHeap::GetCondemnedGeneration() { return 0; } ``` --- exclude: true ### Zero GC `IGCHeap::GarbageCollect` * called by the runtime in rare cases: * `GC.Collect` * low-memory notification * not called by the GC itself -- Trivial implementation: ```cpp HRESULT ZeroGCHeap::GarbageCollect(int generation, bool low_memory_p, int mode) { return NOERROR; } ``` --- ### Zero GC `IGCHeap` - allocations: ```cpp Object* ZeroGCHeap::Alloc(gc_alloc_context * acontext, size_t size, uint32_t flags) { // return address of a new object // trigger GC if necessary } Object* ZeroGCHeap::AllocLHeap(size_t size, uint32_t flags) { // return address of a new object // trigger GC if necessary } ``` --- exclude: true ### Zero GC `IGCHeap` - allocations: ```cpp Object* ZeroGCHeap::Alloc(gc_alloc_context * acontext, size_t size, uint32_t flags) { // return address of a new object * // trigger GC if necessary } Object* ZeroGCHeap::AllocLHeap(size_t size, uint32_t flags) { // return address of a new object * // trigger GC if necessary } ``` --- exclude: true ### Zero GC `IGCHeap` - allocations: ```cpp Object* ZeroGCHeap::Alloc(gc_alloc_context * acontext, size_t size, uint32_t flags) { // return address of a new object } Object* ZeroGCHeap::AllocLHeap(size_t size, uint32_t flags) { // return address of a new object } ``` --- exclude: true ### Zero GC `IGCHeap` - allocations: ```cpp Object* ZeroGCHeap::Alloc(gc_alloc_context * acontext, size_t size, uint32_t flags) { int sizeWithHeader = size + sizeof(ObjHeader); ObjHeader* address = (ObjHeader*)calloc(sizeWithHeader, sizeof(char)); return (Object*)(address + 1); } Object* ZeroGCHeap::AllocLHeap(size_t size, uint32_t flags) { int sizeWithHeader = size + sizeof(ObjHeader); ObjHeader* address = (ObjHeader*)calloc(sizeWithHeader, sizeof(char)); return (Object*)(address + 1); } ``` --- ### Zero GC `IGCHeap` - allocations: ```cpp Object* ZeroGCHeap::Alloc(gc_alloc_context * acontext, size_t size, uint32_t flags) { int sizeWithHeader = size + sizeof(ObjHeader); ObjHeader* address = (ObjHeader*)calloc(sizeWithHeader, sizeof(char)); * return (Object*)(address + 1); } ``` .center[] --- ### Zero GC `IGCHeap` - creating handles (pinning, strong, ...): ```cpp bool ZeroGCHandleManager::Initialize() { g_gcGlobalHandleStore = new ZeroGCHandleStore(); return true; } OBJECTHANDLE ZeroGCHandleManager::CreateGlobalHandleOfType(Object * object, HandleType type) { return g_gcGlobalHandleStore->CreateHandleOfType(object, type); } ``` ```cpp int handlesCount = 0; OBJECTHANDLE handles[65535]; OBJECTHANDLE ZeroGCHandleStore::CreateHandleOfType(Object * object, HandleType type) { handles[handlesCount] = (OBJECTHANDLE__*)object; return (OBJECTHANDLE)&handles[handlesCount++]; } ``` --- ### Zero GC `IGCHandleManager` - storing handles: ```cpp void ZeroGCHandleManager::StoreObjectInHandle(OBJECTHANDLE handle, Object * object) { Object** handleObj = (Object**)handle; *handleObj = object; } bool ZeroGCHandleManager::StoreObjectInHandleIfNull(OBJECTHANDLE handle, Object* object) { Object** handleObj = (Object**)handle; if (*handleObj == NULL) { *handleObj = object; return true; } return false; } ``` --- class: middle, center ### And that's mostly all! Complete *ZeroGC - Calloc-based* implementation: https://github.com/kkokosa/CoreCLR.ZeroGC --- ### How to run our shiny *ZeroGC - Calloc-based*? Just set `COMPlus_GCName=ZeroGC.Calloc.dll` and I hope you are lucky... ```cmd cd ..\Samples.ConsoleApp COMPlus_GCName=ZeroGC.Calloc.dll run -c Release ``` or better, use `launchSettings.json` ```json { "profiles": { "ConsoleApp": { "commandName": "Project", "nativeDebugging": true, "environmentVariables": { "COMPlus_GCName": "ZeroGC.Calloc.dll" } } } } ``` * but still debugging is "quite" cumbersome... --- class: middle, center ### "Mostly" --- ### What we should to implement more? .large[ * Caveat - write barriers should be omitted * because .NET runtime JIT-injects their usage * requires Workstation GC mode - Server GC injects `JIT_WriteBarrier_SVR64` that omits ephemeral checks and crashes the runtime :( ] --- exclude: true class: middle, center ### Caveat #1 - write barriers --- exclude: true class: center, middle ### Remembered sets (card tables) .center[] --- exclude: true ```asm LEAF_ENTRY JIT_WriteBarrier_PostGrow64, _TEXT align 8 mov [rcx], rdx NOP_3_BYTE ; padding for alignment of constant PATCH_LABEL JIT_WriteBarrier_PostGrow64_Patch_Label_Lower mov rax, 0F0F0F0F0F0F0F0F0h ; Check the lower and upper ephemeral region bounds cmp rdx, rax jb Exit nop ; padding for alignment of constant PATCH_LABEL JIT_WriteBarrier_PostGrow64_Patch_Label_Upper mov r8, 0F0F0F0F0F0F0F0F0h cmp rdx, r8 jae Exit nop ; padding for alignment of constant PATCH_LABEL JIT_WriteBarrier_PostGrow64_Patch_Label_CardTable mov rax, 0F0F0F0F0F0F0F0F0h ; Touch the card table entry, if not already dirty. shr rcx, 0Bh cmp byte ptr [rcx + rax], 0FFh jne UpdateCardTable REPRET UpdateCardTable: mov byte ptr [rcx + rax], 0FFh ret align 16 Exit: REPRET LEAF_END_MARKED JIT_WriteBarrier_PostGrow64, _TEXT ``` --- exclude: true ```asm LEAF_ENTRY JIT_WriteBarrier_PostGrow64, _TEXT align 8 * mov [rcx], rdx NOP_3_BYTE ; padding for alignment of constant PATCH_LABEL JIT_WriteBarrier_PostGrow64_Patch_Label_Lower mov rax, 0F0F0F0F0F0F0F0F0h ; Check the lower and upper ephemeral region bounds cmp rdx, rax jb Exit nop ; padding for alignment of constant PATCH_LABEL JIT_WriteBarrier_PostGrow64_Patch_Label_Upper mov r8, 0F0F0F0F0F0F0F0F0h cmp rdx, r8 jae Exit nop ; padding for alignment of constant PATCH_LABEL JIT_WriteBarrier_PostGrow64_Patch_Label_CardTable mov rax, 0F0F0F0F0F0F0F0F0h ; Touch the card table entry, if not already dirty. shr rcx, 0Bh cmp byte ptr [rcx + rax], 0FFh jne UpdateCardTable REPRET UpdateCardTable: mov byte ptr [rcx + rax], 0FFh ret align 16 Exit: REPRET LEAF_END_MARKED JIT_WriteBarrier_PostGrow64, _TEXT ``` --- exclude: true ```asm LEAF_ENTRY JIT_WriteBarrier_PostGrow64, _TEXT align 8 * mov [rcx], rdx NOP_3_BYTE ; padding for alignment of constant PATCH_LABEL JIT_WriteBarrier_PostGrow64_Patch_Label_Lower mov rax, 0F0F0F0F0F0F0F0F0h ; Check the lower and upper ephemeral region bounds cmp rdx, rax jb Exit nop ; padding for alignment of constant PATCH_LABEL JIT_WriteBarrier_PostGrow64_Patch_Label_Upper mov r8, 0F0F0F0F0F0F0F0F0h cmp rdx, r8 jae Exit nop ; padding for alignment of constant PATCH_LABEL JIT_WriteBarrier_PostGrow64_Patch_Label_CardTable mov rax, 0F0F0F0F0F0F0F0F0h ; Touch the card table entry, if not already dirty. shr rcx, 0Bh cmp byte ptr [rcx + rax], 0FFh jne UpdateCardTable REPRET UpdateCardTable: * mov byte ptr [rcx + rax], 0FFh ret align 16 Exit: REPRET LEAF_END_MARKED JIT_WriteBarrier_PostGrow64, _TEXT ``` --- exclude: true ```asm LEAF_ENTRY JIT_WriteBarrier_PostGrow64, _TEXT align 8 * mov [rcx], rdx NOP_3_BYTE ; padding for alignment of constant PATCH_LABEL JIT_WriteBarrier_PostGrow64_Patch_Label_Lower * mov rax, 0F0F0F0F0F0F0F0F0h * ; Check the lower and upper ephemeral region bounds * cmp rdx, rax * jb Exit nop ; padding for alignment of constant PATCH_LABEL JIT_WriteBarrier_PostGrow64_Patch_Label_Upper * mov r8, 0F0F0F0F0F0F0F0F0h * cmp rdx, r8 * jae Exit nop ; padding for alignment of constant PATCH_LABEL JIT_WriteBarrier_PostGrow64_Patch_Label_CardTable mov rax, 0F0F0F0F0F0F0F0F0h ; Touch the card table entry, if not already dirty. shr rcx, 0Bh cmp byte ptr [rcx + rax], 0FFh jne UpdateCardTable REPRET UpdateCardTable: * mov byte ptr [rcx + rax], 0FFh ret align 16 Exit: REPRET LEAF_END_MARKED JIT_WriteBarrier_PostGrow64, _TEXT ``` --- exclude: true ### Zero GC `IGCHeap` - fooling write barriers: ```cpp HRESULT ZeroGCHeap::Initialize() { // Not used currently MethodTable* freeObjectMethodTable = gcToCLR->GetFreeObjectMethodTable(); WriteBarrierParameters args = {}; args.operation = WriteBarrierOp::Initialize; args.is_runtime_suspended = true; args.requires_upper_bounds_check = false; args.card_table = new uint32_t[1]; args.lowest_address = reinterpret_cast
(~0);; args.highest_address = reinterpret_cast
(1); args.ephemeral_low = reinterpret_cast
(~0); args.ephemeral_high = reinterpret_cast
(1); gcToCLR->StompWriteBarrier(&args); return NOERROR; } ``` --- exclude: true ### Zero GC `IGCHeap` - fooling write barriers: ```cpp HRESULT ZeroGCHeap::Initialize() { // Not used currently MethodTable* freeObjectMethodTable = gcToCLR->GetFreeObjectMethodTable(); WriteBarrierParameters args = {}; args.operation = WriteBarrierOp::Initialize; args.is_runtime_suspended = true; args.requires_upper_bounds_check = false; args.card_table = new uint32_t[1]; args.lowest_address = reinterpret_cast
(~0);; args.highest_address = reinterpret_cast
(1); * args.ephemeral_low = reinterpret_cast
(~0); * args.ephemeral_high = reinterpret_cast
(1); gcToCLR->StompWriteBarrier(&args); return NOERROR; } ``` -- exclude: true Still: * requires Workstation GC mode - Server GC injects `JIT_WriteBarrier_SVR64` that omits ephemeral checks and crashes the runtime :( --- class: center, middle ## What's next? -- Calloc-based allocator is slow (each object triggers OS call and memory zeroying) -- **Bump-pointer allocator** instead of slooow `calloc` --- class: center, middle  --- class: center, middle  --- class: middle .center[] --- .center[] ```cpp Allocator::Allocate(amount) { if (alloc_ptr + amount <= alloc_limit) { // This is the fast path - we have enough memory to bump the pointer PTR result = alloc_ptr; alloc_ptr += amount; return result; } else { // This is the slow path - new allocation context will be created ... } } ``` --- exclude: true class: middle, center .center[] Thread-affinity of **the allocation context structure** - ensured by the runtime --- exclude: true ### Bump-pointer GC allocator - step #1: ```cpp // Normally both SOH and LOH allocations go through there Object * ZeroGCHeap::Alloc( * gc_alloc_context * acontext, size_t size, uint32_t flags) { // Per thread acontext... // acontext->alloc_ptr // acontext->alloc_limit } ``` --- exclude: true ### Bump-pointer GC allocator - step #2: ```cpp // Normally both SOH and LOH allocations go through there Object * ZeroGCHeap::Alloc( gc_alloc_context * acontext, size_t size, uint32_t flags) { uint8_t* result = acontext->alloc_ptr; uint8_t* advance = result + size; if (advance <= acontext->alloc_limit) { acontext->alloc_ptr = advance; return (Object* )result; } ... } ``` --- exclude: true ### Bump-pointer GC allocator - step #3: ```cpp // Normally both SOH and LOH allocations go through there Object * ZeroGCHeap::Alloc( gc_alloc_context * acontext, size_t size, uint32_t flags) { uint8_t* result = acontext->alloc_ptr; uint8_t* advance = result + size; if (advance <= acontext->alloc_limit) { acontext->alloc_ptr = advance; return (Object* )result; } int growthSize = 16 * 1024 * 1024; uint8_t* newPages = (uint8_t*)VirtualAlloc(NULL, growthSize, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE); uint8_t* allocationStart = newPages; acontext->alloc_ptr = allocationStart + size; acontext->alloc_limit = newPages + growthSize; return (Object*)(allocationStart); } ``` --- exclude: true ### Bump-pointer GC allocator - step #4: ```cpp // Normally both SOH and LOH allocations go through there Object * ZeroGCHeap::Alloc( gc_alloc_context * acontext, size_t size, uint32_t flags) { uint8_t* result = acontext->alloc_ptr; uint8_t* advance = result + size; if (advance <= acontext->alloc_limit) { acontext->alloc_ptr = advance; return (Object* )result; } * int beginGap = 24; int growthSize = 16 * 1024 * 1024; uint8_t* newPages = (uint8_t*)VirtualAlloc(NULL, growthSize, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE); * uint8_t* allocationStart = newPages + beginGap; acontext->alloc_ptr = allocationStart + size; acontext->alloc_limit = newPages + growthSize; return (Object*)(allocationStart); } ``` --- exclude: false ### Bump-pointer GC allocator ```cpp // Normally both SOH and LOH allocations go through there Object * ZeroGCHeap::Alloc( gc_alloc_context * acontext, size_t size, uint32_t flags) { uint8_t* result = acontext->alloc_ptr; uint8_t* advance = result + size; if (advance <= acontext->alloc_limit) { acontext->alloc_ptr = advance; return (Object* )result; } int beginGap = 24; int growthSize = 16 * 1024 * 1024; uint8_t* newPages = (uint8_t*)VirtualAlloc(NULL, growthSize, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE); uint8_t* allocationStart = newPages + beginGap; acontext->alloc_ptr = allocationStart + size; acontext->alloc_limit = newPages + growthSize; return (Object*)(allocationStart); } ``` --- exclude: true ### Bump-pointer GC allocator - let's ignore those LOHs (thread-safety!): ```cpp // This variation is used in the rare circumstance when you want to allocate // an object on the large object heap but the object is not big enough to // naturally go there. Object * ZeroGCHeap::AllocLHeap(size_t size, uint32_t flags) { int sizeWithHeader = size + sizeof(ObjHeader); ObjHeader* address = (ObjHeader*)calloc(sizeWithHeader, sizeof(char*)); return (Object*)(address + 1); } ``` --- exclude: true class: middle, center ### Caveat #2 - allocation context is reused by the runtime (JIT!) --- exclude: true class: middle .center[] --- exclude: true Fast path in EE (not changeable) ```asm ; IN: rcx: MethodTable* ; OUT: rax: new object LEAF_ENTRY JIT_TrialAllocSFastMP_InlineGetThread, _TEXT mov edx, [rcx + OFFSET__MethodTable__m_BaseSize] ; m_BaseSize is guaranteed to be a multiple of 8. INLINE_GETTHREAD r11 mov r10, [r11 + OFFSET__Thread__m_alloc_context__alloc_limit] mov rax, [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr] add rdx, rax cmp rdx, r10 ja AllocFailed mov [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr], rdx mov [rax], rcx ret AllocFailed: jmp JIT_NEW LEAF_END JIT_TrialAllocSFastMP_InlineGetThread, _TEXT ``` --- exclude: true Fast path in EE (not changeable) ```asm ; IN: rcx: MethodTable* ; OUT: rax: new object LEAF_ENTRY JIT_TrialAllocSFastMP_InlineGetThread, _TEXT mov edx, [rcx + OFFSET__MethodTable__m_BaseSize] ; m_BaseSize is guaranteed to be a multiple of 8. INLINE_GETTHREAD r11 * mov r10, [r11 + OFFSET__Thread__m_alloc_context__alloc_limit] * mov rax, [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr] * * add rdx, rax * * cmp rdx, r10 ja AllocFailed mov [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr], rdx mov [rax], rcx ret AllocFailed: jmp JIT_NEW LEAF_END JIT_TrialAllocSFastMP_InlineGetThread, _TEXT ``` --- class: section ### Summary -- * .large[it is quite easy to write only-allocating GC :)] -- * .large[the main caveat are write barriers] -- * .large[bump-pointer allocator nicely cooperate with the allocation contexts supported by the runtime/JIT] --- class: middle, center, section # How... -- ## we can write something **useful**? --- class: center, middle ## Please meet the Upsilon GC! -- (*) And yes, pun for the JVM's Epsilon GC is intended! ;) --- class: center, middle  --- class: center, middle  --- class: center, middle  --- ### But... -- * we are doing a real development now -- * real development requires **debugging** of both our **custom GC** and the **CoreCLR** itself -- * so we need Debug CoreCLR build! -- * remember to checkout specific tag - HEAD may be not compiling at all! -- ```cmd .\coreclr [master ≡]> dotnet --version 3.0.100-preview3-010431 .\coreclr [master ≡]> git checkout tags/v3.0.0-preview3-27422-72 .\coreclr [(v3.0.0-preview3-27422-72) ≡]> build -Debug -x64 -skiptests ... Result: Build succeeded. 0 Warning(s) 0 Error(s) BUILD: Build succeeded. Finished at 10:27:11.23 BUILD: Product binaries are available at F:\GithubProjects\DotNetCore\coreclr\bin\Product\Windows_NT.x64.Debug ``` --- class: center, middle ### How to utilize/debug our custom runtime? --- ### Aproach zero - use CoreRun.exe .large[ * CoreRun is a hosting app built with the runtime itself * it is very easy to run but... * ... quite cumbersome in configuring references ] --- ### Approach one - self-contained with copy-pasted local binaries Self contained with custom source of the runtime: ```xml
FileSystem
Debug
x64
netcoreapp3.0
published\Debug\netcoreapp3.0\win-x64\
win-x64
true
<_IsPortable>false
``` `dotnet publish` publishes to `.\published\Debug\netcoreapp3.0\win-x64\` -- #### And then: Copy paste from `.\coreclr\bin\Product\Windows_NT.x64.Debug\` to `\published\Debug\netcoreapp3.0\win-x64\`, **overwriting** the whole self-contained runtime. --- ### Approach one - simple test ```cs static void Main(string[] args) { var coreAssemblyInfo = System.Diagnostics.FileVersionInfo.GetVersionInfo(typeof(object).Assembly.Location); Console.WriteLine($"Hello World from Core {coreAssemblyInfo.ProductVersion}"); Console.WriteLine($"The location is {typeof(object).Assembly.Location}"); Console.ReadKey(); } ``` Default: ```cs .\bin\Debug\netcoreapp3.0\win-x64\publish\ConsoleApp.exe Hello World from Core 4.6.27422.72 @BuiltBy: vsagent-a0000SS @Branch: internal/release/3.0 @SrcCode: https://github.com/dotnet/coreclr/tree/817e36a41f451092efa26c1478abfcd5c69a3bcf The location is F:\GithubProjects\CoreCLR.ZeroGC\src\Samples.ConsoleApp\bin\Debug\netcoreapp3.0\win-x64\publish\System.Private.CoreLib.dll ``` Our custom runtime: ```cmd .\bin\Debug\netcoreapp3.0\win-x64\publish\ConsoleApp.exe Hello World from Core 4.6.27521.0 @BuiltBy: kkoko_000-KONRAD @SrcCode: https://github.com/dotnet/coreclr/tree/817e36a41f451092efa26c1478abfcd5c69a3bcf The location is F:\GithubProjects\CoreCLR.ZeroGC\src\Samples.ConsoleApp\bin\Debug\netcoreapp3.0\win-x64\publish\System.Private.CoreLib.dll ``` --- ### Approach two - self-contained with custom nuget source ``` dotnet new nugetconfig ... ``` --- class: middle, center ## And so... ADD begins -- ### Assert Driven Development --- ### Assert Driven Development .large[ * in Debug runtime is much less forgiving and many `ASSERT`s will kick us back * but it is good - it literally drives us through development * ... in the lack of documentation ] --- class: center, middle  --- class: center, middle  --- ### The first run - the first ASSERT ```cpp Assert failure(PID 6264 [0x00001878], Thread: 7880 [0x1ec8]): !"Write Barrier violation. Must use SetObjectReference() to assign OBJECTREF's into the GC heap!" CORECLR! OBJECTREF::OBJECTREF + 0x71 (0x00007ffe`166822f1) CORECLR! AllocateArrayEx + 0xABA (0x00007ffe`16ad116a) CORECLR! AllocateArrayEx + 0x177 (0x00007ffe`16ad1717) CORECLR! AllocateObjectArray + 0x2AC (0x00007ffe`16ad248c) CORECLR! LargeHeapHandleBucket::LargeHeapHandleBucket + 0x34D (0x00007ffe`169a648d) CORECLR! LargeHeapHandleTable::AllocateHandles + 0x4F9 (0x00007ffe`169b8199) CORECLR! BaseDomain::AllocateObjRefPtrsInLargeTable + 0x39B (0x00007ffe`169baa0b) CORECLR! AppDomain::AllocateStaticFieldObjRefPtrs + 0x30 (0x00007ffe`16928a80) CORECLR! Module::AllocateRegularStaticHandles + 0x238 (0x00007ffe`16928a08) CORECLR! SystemDomain::LoadBaseSystemClasses + 0x3A7 (0x00007ffe`169d9b27) File: f:\githubprojects\dotnetcore\coreclr\src\vm\object.cpp Line: 1355 Image: F:\GithubProjects\CoreCLR.ZeroGC\src\Samples.ConsoleApp\bin\Debug\netcoreapp3.0\win-x64\publish\ConsoleApp.exe ``` --- ### The first run - the first ASSERT ```cpp OBJECTREF AllocateArrayEx(MethodTable *pArrayMT, INT32 *pArgs, DWORD dwNumArgs, BOOL bAllocateInLargeHeap DEBUG_ARG(BOOL bDontSetAppDomain)) { ... orArray = (ArrayBase *) Alloc(totalSize, FALSE, pArrayMT->ContainsPointers()); ... #ifdef _DEBUG // Ensure the typehandle has been interned prior to allocation. // This is important for OOM reliability. OBJECTREF objref = ObjectToOBJECTREF((Object *) orArray); GCPROTECT_BEGIN(objref); orArray->GetTypeHandle(); GCPROTECT_END(); orArray = (ArrayBase *) OBJECTREFToObject(objref); #endif ... } ``` --- ### The first run - the first ASSERT ```cpp OBJECTREF AllocateArrayEx(MethodTable *pArrayMT, INT32 *pArgs, DWORD dwNumArgs, BOOL bAllocateInLargeHeap DEBUG_ARG(BOOL bDontSetAppDomain)) { ... orArray = (ArrayBase *) Alloc(totalSize, FALSE, pArrayMT->ContainsPointers()); ... *#ifdef _DEBUG // Ensure the typehandle has been interned prior to allocation. // This is important for OOM reliability. * OBJECTREF objref = ObjectToOBJECTREF((Object *) orArray); GCPROTECT_BEGIN(objref); orArray->GetTypeHandle(); GCPROTECT_END(); orArray = (ArrayBase *) OBJECTREFToObject(objref); *#endif ... } ``` --- ### The first run - the first ASSERT ```cpp OBJECTREF::OBJECTREF(Object *pObject) { ... DEBUG_ONLY_FUNCTION; if ((pObject != 0) && ((IGCHeap*)GCHeapUtilities::GetGCHeap())->IsHeapPointer( (BYTE*)this )) { _ASSERTE(!"Write Barrier violation. Must use SetObjectReference() to assign OBJECTREF's into the GC heap!"); } m_asObj = pObject; VALIDATEOBJECT(m_asObj); if (m_asObj != 0) { ENABLESTRESSHEAP(); } Thread::ObjectRefNew(this); } ``` --- ### The first run - the first ASSERT ```cpp OBJECTREF::OBJECTREF(Object *pObject) { ... DEBUG_ONLY_FUNCTION; if ((pObject != 0) && * ((IGCHeap*)GCHeapUtilities::GetGCHeap())->IsHeapPointer( (BYTE*)this )) { _ASSERTE(!"Write Barrier violation. Must use SetObjectReference() to assign OBJECTREF's into the GC heap!"); } m_asObj = pObject; VALIDATEOBJECT(m_asObj); if (m_asObj != 0) { ENABLESTRESSHEAP(); } Thread::ObjectRefNew(this); } ``` and ```cpp bool UpsilonGCHeap::IsHeapPointer(void * object, bool small_heap_only) { return object != 0; } ``` --- ### The first run - the first ASSERT First improvement - now I track created segments: ```cpp int segmentsCount = 0; uint8_t* segments[1024]; ``` ```cpp bool UpsilonGCHeap::IsHeapPointer(void * object, bool small_heap_only) { if (segmentsCount == 0) return false; for (int i = 0; i < segmentsCount; ++i) { uint8_t* address = (uint8_t*)object; if (address >= segments[i] && address < segments[i] + GrowthSize) return true; } return false; } ``` --- ### The second run - the first bug ```cs private static void RunAllocations() { long beginAllocs = GC.GetAllocatedBytesForCurrentThread(); DoSomeAllocs(); long endAllocs = GC.GetAllocatedBytesForCurrentThread(); Console.WriteLine($"Allocated {endAllocs - beginAllocs}"); Console.WriteLine($"Total allocated {endAllocs}"); } ``` But... `GC.GetAllocatedBytesForCurrentThread()` was returning **negative** values. --- ### The second run - the first bug ```cpp FCIMPL0(INT64, GCInterface::GetAllocatedBytesForCurrentThread) { FCALL_CONTRACT; INT64 currentAllocated = 0; Thread *pThread = GetThread(); gc_alloc_context* ac = pThread->GetAllocContext(); * currentAllocated = ac->alloc_bytes + ac->alloc_bytes_loh - (ac->alloc_limit - ac->alloc_ptr); return currentAllocated; } FCIMPLEND ``` --- ### The second run - the first bug Second improvement - working `GetAllocatedBytesForCurrentThread` ```cpp Object * UpsilonGCHeap::Alloc(gc_alloc_context * acontext, size_t size, uint32_t flags) { uint8_t* result = acontext->alloc_ptr; uint8_t* advance = result + size; if (advance <= acontext->alloc_limit) { acontext->alloc_ptr = advance; return (Object* )result; } int beginGap = 24; uint8_t* newPages = (uint8_t*)VirtualAlloc(NULL, GrowthSize, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE); uint8_t* allocationStart = newPages + beginGap; acontext->alloc_ptr = allocationStart + size; acontext->alloc_limit = newPages + GrowthSize; * acontext->alloc_bytes += GrowthSize; * registerSegment(newPages); printf("GCLOG: Segment crated %p-%p\n", acontext->alloc_ptr, acontext->alloc_limit); return (Object*)(allocationStart); } ``` --- class: middle, center ## And the story continues... --- ### But... -- * .large[**Upsilon GC** is on the Mark & Sweep island] -- * .large[it is not "Zero GC" anymore] -- * .large[thus we need: Mark phase and Sweep phase] -- * .large[and let's assume non-concurrent approach] -- * .large[thus *Upsilon GC "Zero"* will:] -- * .large[trigger GC at allocation path (`UpsilonGCHeap::Alloc`)] -- * .large[mark all objects in all segments] -- * .large[sweep unused objects into *free space*] -- * .large[reuse *free space* as new allocation contexts] --- ### Upsilon GC "Zero" - trigger GC at allocation path That's easy... ```cpp Object * UpsilonGCHeap::Alloc(gc_alloc_context * acontext, size_t size, uint32_t flags) { uint8_t* result = acontext->alloc_ptr; uint8_t* advance = result + size; if (advance <= acontext->alloc_limit) { acontext->alloc_ptr = advance; return (Object* )result; } * if (acontext->alloc_limit != nullptr) * { * // Some allocation context filled, start GC * ... * } int beginGap = 24; uint8_t* newPages = (uint8_t*)VirtualAlloc(NULL, GrowthSize, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE); uint8_t* allocationStart = newPages + beginGap; acontext->alloc_ptr = allocationStart + size; acontext->alloc_limit = newPages + GrowthSize; acontext->alloc_bytes += GrowthSize; registerSegment(newPages); printf("GCLOG: Segment crated %p-%p\n", acontext->alloc_ptr, acontext->alloc_limit); return (Object*)(allocationStart); } ``` --- ### Upsilon GC "Zero" - mark all objects in all segments To non-concurrent Mark, we need to suspend all threads, do marking and resume: ```cpp ... if (acontext->alloc_limit != nullptr) { // Some allocation context filled, start GC gcToCLR->SuspendEE(SUSPEND_FOR_GC); // ... gcToCLR->RestartEE(true); } ``` --- class: middle, center ## And... --- class: center, middle  --- ### Upsilon GC "Zero" - mark all objects in all segments ```cpp bool UpsilonGCHeap::gcInProgress = false; ``` ```cpp bool UpsilonGCHeap::IsGCInProgressHelper(bool bConsiderGCStart) { return gcInProgress; } ``` ```cpp ... if (acontext->alloc_limit != nullptr) { // Some allocation context filled, start GC gcInProgress = true; gcToCLR->SuspendEE(SUSPEND_FOR_GC); // ... gcInProgress = false; gcToCLR->RestartEE(bFinishedGC: true); } ``` --- class: middle, center ## And... --- class: center, middle  --- ### Upsilon GC "Zero" - mark all objects in all segments This is at proper time called by the `RestartEE` itself: ```cpp void UpsilonGCHeap::SetGCInProgress(bool fInProgress) { gcInProgress = fInProgress; } ``` So it works! ```cpp ... if (acontext->alloc_limit != nullptr) { // Some allocation context filled, start GC gcInProgress = true; gcToCLR->SuspendEE(SUSPEND_FOR_GC); // ... gcToCLR->RestartEE(bFinishedGC: true); } ``` --- ### Upsilon GC "Zero" - mark all objects in all segments Let's start from stack roots: ```cpp ... if (acontext->alloc_limit != nullptr) { // Some allocation context filled, start GC gcInProgress = true; gcToCLR->SuspendEE(SUSPEND_FOR_GC); * ScanContext sc; * printf("GCLOG: Scan stack roots\n"); * gcToCLR->GcScanRoots(UpsilonGCHeap::MarkReachableRoot, 0, 0, &sc); gcToCLR->RestartEE(bFinishedGC: true); } ``` ```cpp void UpsilonGCHeap::MarkReachableRoot(Object** ppObject, ScanContext* sc, uint32_t flags) { Object* obj = *ppObject; if (obj == nullptr) return; MethodTable* pMT = (*ppObject)->GetMethodTable(); printf("GCLOG: Reachable root at %p MT %p (flags: %d)\n", obj, pMT, flags); } ``` --- ### Upsilon GC "Zero" - mark all objects in all segments And add handles roots: ```cpp ... if (acontext->alloc_limit != nullptr) { // Some allocation context filled, start GC gcInProgress = true; gcToCLR->SuspendEE(SUSPEND_FOR_GC); ScanContext sc; printf("GCLOG: Scan stack roots\n"); gcToCLR->GcScanRoots(UpsilonGCHeap::MarkReachableRoot, 0, 0, &sc); * handleManager->ScanHandles(UpsilonGCHeap::MarkReachableRoot, &sc); gcToCLR->RestartEE(bFinishedGC: true); } ``` ```cpp void UpsilonGCHandleStore::ScanHandles(promote_func* pf, ScanContext* sc) { for (int i = 0; i < handlesCount; ++i) { if (handles[i] != nullptr) { pf((Object**)&handles[i], sc, 0); } } } ``` --- class: center, middle  --- class: center, middle  --- class: center, middle  --- ### Upsilon GC "Zero" - mark all objects in all segments But where are "handle types" here? ```cpp void UpsilonGCHandleStore::ScanHandles(promote_func* pf, ScanContext* sc) { for (int i = 0; i < handlesCount; ++i) { if (handles[i] != nullptr) { pf((Object**)&handles[i], sc, 0); } } } ``` -- Nowhere:) Currently I'm just ignoring them - important feature for Upsilon GC! ```cpp OBJECTHANDLE UpsilonGCHandleStore::CreateHandleOfType(Object * object, HandleType type) { handles[handlesCount] = (OBJECTHANDLE__*)object; return (OBJECTHANDLE)&handles[handlesCount++]; } ``` --- ### Upsilon GC "Zero" - mark all objects in all segments So we have: ```cpp ... if (acontext->alloc_limit != nullptr) { // Some allocation context filled, start GC gcInProgress = true; gcToCLR->SuspendEE(SUSPEND_FOR_GC); ScanContext sc; printf("GCLOG: Scan stack roots\n"); gcToCLR->GcScanRoots(UpsilonGCHeap::MarkReachableRoot, 0, 0, &sc); printf("GCLOG: Scan handles roots\n"); handleManager->ScanHandles(UpsilonGCHeap::MarkReachableRoot, &sc); gcToCLR->RestartEE(bFinishedGC: true); } ``` ```cpp void UpsilonGCHeap::MarkReachableRoot(Object** ppObject, ScanContext* sc, uint32_t flags) { Object* obj = *ppObject; if (obj == nullptr) return; MethodTable* pMT = (*ppObject)->GetMethodTable(); printf("GCLOG: Reachable root at %p MT %p (flags: %d)\n", obj, pMT, flags); } ``` --- class: middle, center ## And... --- class: center, middle  --- ### Upsilon GC "Zero" - mark all objects in all segments Currently only discovering roots, we must mark them transitively: ```cpp void UpsilonGCHeap::MarkReachableRoot(Object** ppObject, ScanContext* sc, uint32_t flags) { Object* obj = *ppObject; if (obj == nullptr) return; MethodTable* pMT = (*ppObject)->GetMethodTable(); printf("GCLOG: Reachable root at %p MT %p (flags: %d)\n", obj, pMT, flags); * MarkObjectTransitively(obj, sc, flags); } ``` --- ### Upsilon GC "Zero" - mark all objects in all segments Beginning is easy... ```cpp void UpsilonGCHeap::MarkObjectTransitively(Object* obj, ScanContext* sc, uint32_t flags) { if (obj->IsMarked()) { printf("GCLOG: Mark - already marked\n"); return; } obj->SetMarked(); // Travers outgoing references } ``` ```cpp #define GC_MARKED (size_t)0x1 class Object { MethodTable* m_pMethTab; public: // ... bool IsMarked() { return !!(((size_t)RawGetMethodTable()) & GC_MARKED); } void SetMarked() { RawSetMethodTable((MethodTable*)(((size_t)RawGetMethodTable()) | GC_MARKED)); } }; ``` --- ### Upsilon GC "Zero" - mark all objects in all segments And now the true story begins... ```cpp void UpsilonGCHeap::MarkObjectTransitively(Object* obj, ScanContext* sc, uint32_t flags) { if (obj->IsMarked()) { printf("GCLOG: Mark - already marked\n"); return; } obj->SetMarked(); MethodTable* pMT = obj->RawGetMethodTable(); if (pMT->IsCollectible()) { printf("GCLOG: Mark - collectible type\n"); // TODO } if (pMT->ContainsPointers()) { printf("GCLOG: Mark - containing pointers type at %p MT %p\n", obj, pMT); // TODO } } ``` --- ### Upsilon GC "Zero" - mark all objects in all segments Lack of "enumerate all references in an object" API ```cpp if (pMT->ContainsPointers()) { printf("GCLOG: Mark - containing pointers type at %p MT %p\n", obj, pMT); int start_useful = 0; uint8_t* start = (uint8_t*)obj; uint32_t size = MethodTable::GetTotalSize(obj); CGCDesc* map = CGCDesc::GetCGCDescFromMT(pMT); CGCDescSeries* cur = map->GetHighestSeries(); ptrdiff_t cnt = (ptrdiff_t)map->GetNumSeries(); if (cnt >= 0) { CGCDescSeries* last = map->GetLowestSeries(); uint8_t** parm = 0; do { assert(parm <= (uint8_t**)((obj)+cur->GetSeriesOffset())); parm = (uint8_t * *)((obj)+cur->GetSeriesOffset()); uint8_t** ppstop = (uint8_t * *)((uint8_t*)parm + cur->GetSeriesSize() + (size)); if (!start_useful || (uint8_t*)ppstop > (start)) { if (start_useful && (uint8_t*)parm < (start)) parm = (uint8_t * *)(start); while (parm < ppstop) { //exp parm++; ... ``` --- class: section ### Summary -- * .large[it is ADD development :(] -- * .large[it is easy to utilize suspend/restart EE capability] -- * .large[it is easy to find stack roots] -- * .large[it is even easier to find handle roots (as we manage them)] -- * .large[it is really hard to traverse object references...] -- * .large[the rest is *The Sea of Unknown* now] --- class: center, middle # So... what's next? --- class: center, middle  --- class: center, middle  --- class: section ### Upsilon GC Summary .large[Eventually it should:] -- * .large[trigger GC at allocation path (`UpsilonGCHeap::Alloc`)] -- * .large[mark all objects in all segments] -- * .large[track free space in all segments (Sweep)] -- * .large[copy those with free space threshold into a new one (compaction by copying)] -- * .large[and we will see...] --- class: center, middle  --- class: center, middle  --- ###Literature: * The Garbage Collection Handbook (http://gchandbook.org) - Richard Jones, Antony Hosking, Eliot Moss .center[] * Pro .NET Memory Management Management (https://prodotnetmemory.com) - Konrad Kokosa .center[] * http://tooslowexception.com/zero-garbage-collector-for-net-core/ * http://tooslowexception.com/zero-garbage-collector-for-net-core-2-1-and-asp-net-core-2-1/ --- class: center, middle .center[] --- class: center, middle # That's all! Thank .red[**you**]! Any questions?!