class: center, middle # High-performance code design patterns in `C#` ### Konrad Kokosa #### @konradkokosa https://prodotnetmemory.com --- class: center, middle .center[] --- class: center, middle ## .red[High-performance] code design patterns in `C#` --- class: center, middle ## High-performance in `C#`?! --- ### High-performance in `C#` Having said that, this talk is... -- * not about APM -- * not about frontend & architecture (like horizontal scaling) -- * not about caching (like Redis), DB -- * about C# **code-level** performance --- class: center, middle ## High-performance code .red[design patterns] in `C#` --- ## Design Pattern A design pattern: -- - is **a common solution to a problem** that occurs in many different contexts -- - contains expert knowledge about "best practices" -- - allows us to avoid "reinventing the wheel" -- - is based on more general, underlying .red[principles] -- - like in case of OOP design patterns (SOLID, modularity, loose coupling, ...) --- exclude: true class: center, middle # Principles -- exclude: true #### Those principles/rules are valuable by their own, feel free to use them (like SOLID, clean code,. ...) --- exclude: true class: center, middle ### Principle 0 - Memory/cpu disrepancy #### The Mother Of All Principles --- exclude: true #### Principle 0 - Memory/cpu disrepancy .pull-left[] .pull-right[] --- exclude: true class: middle, center ### Principle 1 - Fit the cache line ### Principle 2 - Fit the highest cache level ### Principle 3 - Design for sequential access ### Principle 4 - Avoid GC overhead ### Principle 5 - Avoid calls ### Principle 6 - Design against *false sharing* --- ### Design Patterns Each *design pattern* is defined in a standard template: -- * **Name**: Is... the name. -- * **Problem:** What problem this pattern tries to solve? -- * **Solution:** How do we solve the problem? -- * **Benefits:** are there any addional benefits of applying this pattern? Besides solving above-mentioned problem. -- * **Consequences:** Are there any potential drawbacks of applying this patterns? --- class: center, middle # Design Patterns --- class: center, middle ## Pattern #1 - Frugal Object --- ### Pattern #1 - Frugal Object **Problem:** We need to store efficiently set of data that can take various forms. **Solution:** Instead of creating an object for each form separately, create kind of *discriminated union*. **Benefits:** ... **Consequences:** ... --- #### Pattern #1 - Frugal Object - ASP.NET Core example `Extensions/src/Primitives/src/StringValues.cs`: ```cs /// Represents zero/null, one, or many strings in an efficient way. public readonly struct StringValues : IList
, ... { private readonly object _values; ... public string this[int index] { [MethodImpl(MethodImplOptions.AggressiveInlining)] get { var value = _values; if (index == 0 && value is string str) { return str; } else if (value != null) { // Not string, not null, can only be string[] return Unsafe.As
(value)[index]; // may throw } else { return OutOfBounds(); // throws } } } } ``` --- #### Pattern #1 - Frugal Object .center[] --- #### Pattern #1 - Frugal Object .center[] --- #### Pattern #1 - Frugal Object .center[] --- exclude: true #### Pattern #1 - Frugal Object - WPF example `./src/Microsoft.DotNet.Wpf/src/Shared/MS/Utility/FrugalList.cs`: ```cs // These classes implement a frugal storage model for lists of type
. // Performance measurements show that Avalon has many lists that contain // a limited number of entries, and frequently zero or a single entry. ... // The code is also structured to perform well from a CPU standpoint. Perf // analysis shows that the reduced number of processor cache misses makes // FrugalList faster than ArrayList or List
, especially for lists of 6 // or fewer items. Timing differ with the size of
. ``` ```cs internal sealed class SingleItemList
: FrugalListBase
{ public override T EntryAt(int index) { return _loneEntry; } ... private T _loneEntry; } ``` --- exclude: true #### Pattern #1 - Frugal objects - WPF example ```cs internal sealed class ThreeItemList
: FrugalListBase
{ public override T EntryAt(int index) { switch (index) { case 0: return _entry0; case 1: return _entry1; case 2: return _entry2; default: throw new ArgumentOutOfRangeException(nameof(index)); } } ... private T _entry0; private T _entry1; private T _entry2; } ``` --- exclude: true #### Pattern #1 - Frugal objects - WPF example ```cs internal class FrugalObjectList
{ public T this[int index] { get { // If no entry, default(T) is returned if ((null != _listStore) && ((index < _listStore.Count) && (index >= 0))) { return _listStore.EntryAt(index); } throw new ArgumentOutOfRangeException(nameof(index)); } set { ... } } internal FrugalListBase
_listStore; } ``` --- ### Pattern #1 - Frugal Object **Problem:** We need to store efficiently set of data that can take various forms. **Solution:** Instead of creating an object for each form separately, create kind of *discriminated union*. **Benefits:** (Sometimes) smaller memory usage and better data locality. Better JIT optimizations - bound checks etc. **Consequences:** Sometimes more complex API comparing to the plain approach. Some performance overhead due to type checks and/or additional wrappers. --- class: center, middle ## Pattern #2 - Pooling --- ### Pattern #2 - Pooling **Problem:** There is a non-trivial cost of creating objects very often. This cost may be direct (allocation/initialization overhead), indirect (added GC overhead, fragmentation, ...) or both. **Solution:** Instead of creating an object each time, reuse some objects from a pre-allocated pool. **Benefits:** ... **Consequences:** ... --- #### Pattern #2 - Pooling Array pooling: * instead of creating an array each time, reuse pooled array: ```cs var pool = ArrayPool
.Shared; int[] buffer = pool.Rent(minLength); try { Consume(buffer); } finally { pool.Return(buffer); } ``` * `Rent` returns an array with **at least** the specified length * `Shared` instance has: * 17 buckets, with arrays of lengths: 16, 32, 64, 128, 256, 512, 1,024, 2,048, 4,096, 8,192, 16,384, 3,2768, 65,536, 131,072, 262,144, 524,288 and 1,048,576, each bucket contains maximum of 50 arrays * they are created on demand * **not cleared** when rented or returned (but `Return(T[] array, bool clearArray = false)`) * **trimming mechanism** - explicit `Trim()` method ??? `ArrayPool
.Shared` uses `TlsOverPerCoreLockedStacksArrayPool
` class - there is a small per-thread cache of each array size and a cache shared by all threads split into per-core stacks (hence its name). --- #### Pattern #2 - Pooling ```cs [Benchmark] public double VersionObjectArray() { DataClass[] data = new DataClass[_itemsCount]; for (int i = 0; i < _itemsCount; ++i) { data[i] = new DataClass(); data[i].Age = _items[i].Age; data[i].Gender = Helper(_items[i].Description) ? Gender.Female : Gender.Male; } return ProcessBatch(data); } ``` -- ```bash | Method | Mean | Gen 0 | Gen 1 | Gen 2 | Allocated | |---------------------- |-----------:|-------:|------:|------:|----------:| | VersionObjectArray | 1,384.1 ns | 0.9689 | - | - | 4056 B | ``` --- #### Pattern #2 - Pooling ```cs [Benchmark] public double VersionClassArrayPool() { DataClass[] data = ArrayPool
.Shared.Rent(_itemsCount); for (int i = 0; i < _itemsCount; ++i) { data[i] ??= new DataClass(); data[i].Age = _items[i].Age; data[i].Gender = Helper(_items[i].Description) ? Gender.Female : Gender.Male; } double result = ProcessBatch(data, _itemsCount); ArrayPool
.Shared.Return(data); return result; } ``` -- ```bash | Method | Mean | Gen 0 | Gen 1 | Gen 2 | Allocated | |---------------------- |-----------:|-------:|------:|------:|----------:| | VersionObjectArray | 1,384.1 ns | 0.9689 | - | - | 4056 B | | VersionClassArrayPool | 848.8 ns | 0.0076 | - | - | 32 B | ``` --- #### Pattern #2 - Pooling * abstract `MemoryPool
` - represents a pool of memory blocks * `MemoryPool
.Shared` - based on `ArrayPool` * `SlabMemoryPool : MemoryPool
` - the memory pool behind Kestrel. * Object pooling: * quite often not necessary - cost of handling outweighs benefits * maybe for heavy, costly in creation objects * or tremendously often created * no standard implementation yet * internal [`ObjectPool
`](https://github.com/dotnet/roslyn/blob/master/src/Dependencies/PooledObjects/ObjectPool%601.cs) from Roslyn * ... * [Core support for object pooling](https://github.com/dotnet/coreclr/issues/23342) CoreCLR issue... is closed --- ### Pattern #2 - Pooling **Problem:** There is a non-trivial cost of creating objects very often. This cost may be direct (allocation/initialization overhead), indirect (added GC overhead, fragmentation, ...) or both. **Solution:** Instead of creating an object each time, reuse some objects from a pre-allocated pool. **Benefits:** Smaller GC overhead, no allocation/initialization cost, better data locality. **Consequences:** Trimming policy may be not trivial. --- class: center, middle ## Pattern #3 - Zero-copy (Avoid copying) --- ### Pattern #3 - Zero-copy (Avoid copying) **Problem:** Oparations on "sub-data" require a temporary object to represent it, which introduces a lot of short-living objects. Typical examples are `string.Substring()` or sub-arrays/sub-lists. **Solution:** Introduce special types that allow *slicing* - represnting sub-regions without a memory copy. **Benefits:** ... **Consequences:** ... --- #### Pattern #3 - Zero-copy (Avoid copying) - `Span
` It contains a *managed pointer* and a length: ```cs public ref struct Span
{ * internal ref T _pointer; private int _length; ... } ``` .center[] --- #### Pattern #3 - Zero-copy (Avoid copying) - `Span
` So we have our beloved `Span
`: -- ```cs var array = new int[64]; Span
span1 = new Span
(array); Span
span2 = new Span
(array, start: 8, length: 4); ``` -- ```cs Span
span3 = stackalloc[] { 1, 2, 3, 4, 5 }; ``` -- ```cs IntPtr memory = Marshal.AllocHGlobal(64); void* ptr = memory.ToPointer(); Span
span4 = new Span
(ptr, 64); ``` -- ```cs string str = "Hello world"; ReadOnlySpan
span5 = str.AsSpan(); ``` -- Zero-copy slicing 😍: -- ```cs var span6 = span.Slice(0, 4); ``` --- #### Pattern #3 - Zero-copy (Avoid copying) - `Span
` * `Span
` contains *managed pointer*, it cannot appear on the Managed Heap -- * the main concern - `async` methods with their *boxing* state machines -- ```cs // Does not compile with an error: // "Parameters or locals of type 'Span
' cannot be declared in async methods or lambda expressions" public async Task DoAsync(Span
data) { Do(data.Slice(0, 10)); await DoOtherAsync(); } ``` -- Use `Memory
` instead - general-purpose slicing of arrays, strings and ... ```cs // Does compile - Span is not a local variable here! public async Task DoAsync(Memory
data) { DoSync(data.Span.Slice(0, 10)); await DoOtherAsync(); } private void DoSync(Span
data) => Console.WriteLine(data.Length); ``` --- exclude: true #### Pattern #3 - Zero-copy (Avoid copying) - `Span
` ```cs public unsafe Span
rocksdb_get_span( IntPtr db, IntPtr read_options, byte[] key, long keyLength, out IntPtr errptr, ColumnFamilyHandle cf = null) { ... var resultPtr = rocksdb_get(db, read_options, key, skLength, out UIntPtr valueLength, out errptr) ... return new Span
((void*)resultPtr, (int)valueLength); } public unsafe void dangerous_rocksdb_release_span(in Span
span) { ref byte ptr = ref MemoryMarshal.GetReference(span); IntPtr intPtr = new IntPtr(Unsafe.AsPointer(ref ptr)); rocksdb_free(intPtr); } ``` --- #### Pattern #3 - Zero-copy (Avoid copying) Other examples: * System.IO.Pipelines * `ref` returning and argument passing --- exclude: true #### Pattern #3 - Zero-copy (Avoid copying) - ref returning collection ```cs public class RefValueBookCollection { public ValueBook[] books = { new ValueBook { Title = "Call of the Wild, The", Author = "Jack London" }, new ValueBook { Title = "Tale of Two Cities, A", Author = "Charles Dickens" } }; private ValueBook nobook = default; public ref ValueBook GetBookByTitle(string title) { for (int ctr = 0; ctr < books.Length; ctr++) if (title == books[ctr].Title) return ref books[ctr]; return ref nobook; } public static void Run() { var collection = new RefValueBookCollection(); ref var book = ref collection.GetBookByTitle("Call of the Wild, The"); book.Author = "Konrad Kokosa"; Console.WriteLine(collection.books[0].Author); } } ``` --- ### Pattern #3 - Zero-copy (Avoid copying) **Problem:** Oparations on "sub-data" require a temporary object to represent it, which introduces a lot of short-living objects. Typical examples are `string.Substring()` or sub-arrays/sub-lists. **Solution:** Introduce special types that allow *slicing* - represnting sub-regions without a memory copy. **Benefits:** You can operate on subset of data without overhead of memory copy - especially useful in various "descendant" parsers. **Consequences:** API and code is being polluted with technical-specific types, instead of using generic, intuitive ones. --- class: center, middle ## Pattern #4 - Struct of Arrays --- ### Pattern #4 - Struct of Arrays **Problem:** Processing a lot of data is not efficient due to the memory access time. **Solution:** Design data structures and processing steps to leverage data locality and sequential access. Most typically, in the form of plain arrays of data. **Benefits:** ... **Consequences:** ... --- #### Pattern #4 - Struct of Arrays "Array of Structs": ```cs class CustomerClassRepository { List
customers = new List
(); public void UpdateScorings() { foreach (var customer in customers) customer.UpdateScoring(); } } public class CustomerClass { private double Earnings; private DateTime DateOfBirth; private bool IsSmoking; private double Scoring; private HealthData Health; private AuxiliaryData Auxiliary; private Company Employer; public void UpdateScoring() { Scoring = Earnings * (IsSmoking ? 0.8 : 1.0) * ProcessAge(DateOfBirth); } private double ProcessAge(DateTime dateOfBirth) => ...; } ``` --- #### Pattern #4 - Struct of Arrays "Array of Structs": .center[] -- ```bash Method | Mean | Allocated | LlcMisses/Op | LlcReference/Op | ------------- |-----------:|----------:|-------------:|----------------:| ObjectStyle | 2,152.5 us | 0 B | 24680 | 30031 | ``` --- #### Pattern #4 - Struct of Arrays "Array of Structs" (better): ```cs class CustomerValueRepository { List
customers = new List
(); public void UpdateScorings() { foreach (var customer in customers) customer.UpdateScoring(); } } [StructLayout(LayoutKind.Sequential)] public struct CustomerValue { double Earnings; double Scoring; int YearOfBirth; bool IsSmoking; int HealthDataId; int AuxiliaryDataId; int EmployerId; public void UpdateScoring() { Scoring = Earnings * (IsSmoking ? 0.8 : 1.0) * ProcessAge(YearOfBirth); } private double ProcessAge(int yearOfBirth) => ...; } ``` --- #### Pattern #4 - Struct of Arrays "Array of Structs" (better): .center[] -- ```bash Method | Mean | Allocated | LlcMisses/Op | LlcReference/Op | ------------- |-----------:|----------:|-------------:|----------------:| ObjectStyle | 2,152.5 us | 0 B | 24680 | 30031 | StructStyle | 1,442.0 us | 0 B | 1986 | 3937 | ``` --- #### Pattern #4 - Struct of Arrays ```cs class CustomerRepositoryDOD { int NumberOfCustomers; double[] Scoring; double[] Earnings; bool[] IsSmoking; int[] YearOfBirth; DateTime[] DateOfBirth; public void UpdateScorings() { for (int i = 0; i < NumberOfCustomers; ++i) Scoring[i] = Earnings[i] * (IsSmoking[i] ? 0.8 : 1.0) * ProcessAge(YearOfBirth[i]); } public double ProcessAge(int yearOfBirth) => ...; } ``` .center[] --- #### Pattern #4 - Struct of Arrays .center[] ```cs Method | Mean | Allocated | LlcMisses/Op | LlcReference/Op | ------------- |-----------:|----------:|-------------:|----------------:| ObjectStyle | 2,152.5 us | 0 B | 24680 | 30031 | StructStyle | 1,442.0 us | 0 B | 1986 | 3937 | DoDStyle | 213.9 us | 0 B | 272 | 816 | ``` -- ##.center[.red[10x faster!]] --- #### Pattern #4 - Struct of Arrays Extensions: * tree processing - flatten tree navigating it in pre-order depth-first manner... * Entity Component System --- exclude: true #### Pattern #4 - Struct of Arrays (Entity Component System) .center[] --- exclude: true #### Pattern #4 - Struct of Arrays (Entity Component System) .center[] ??? Examples from the real-world: * Unity ECS * Entitas --- ### Pattern #4 - Struct of Arrays **Problem:** Processing a lot of data is not efficient due to the memory access time. **Solution:** Design data structures and processing steps to leverage data locality and sequential access. Most typically, in the form of plain arrays of data. **Benefits:** Much better memory access performance. **Consequences:** Much worse design, violating object-oriented principles like encapsulation. Can be slightly worked around with the help of ECS. --- class: center, middle ## Pattern #5 - Stack-based data --- ### Pattern #5 - Stack-based data **Problem:** We allocate a lot of small, temporary objects, which puts a high pressure on the GC. **Solution:** We avoid using Manged Heap by utilizing various possibilities like: `structs`, `ref structs`, `stackalloc`. **Benefits:** ... **Consequences:** ... --- ### Pattern #5 - Stack-based data * structs * good data locality * can be *enregistered* .green[:-)] * can be *boxed* .red[:-(] * we love `ValueTuple` or `ValueTask` -- * stackalloc -- * ref structs -- * fixed sized buffers --- #### Pattern #5 - Stack-based data - `stackalloc` `stackalloc` operator allocates a block of memory on the stack: ```cs stackalloc T[E] ``` where `T` must be an *unmanaged type* and `E` must be an expression of type `int` PS. Doesn't have to be pinned. --- #### Pattern #5 - Stack-based data - `stackalloc` ```cs unsafe { byte* ptr = stackalloc byte[128]; } ``` Or C# 7.0+: ```cs Span
span = stackalloc byte[128]; ``` -- What about `StackOverflowException`? -- exclude: true ``` RuntimeHelpers.EnsureSufficientExecutionStack() ``` *"ensures that the remaining stack space is large enough to execute **the average .NET Framework function**"* --- #### Pattern #5 - Stack-based data - `stackalloc` ```cs [Benchmark] public double VersionStructStackalloc() { Span
data = stackalloc DataStruct[_itemsCount]; for (int i = 0; i < _items.Count; ++i) { data[i].Age = _items[i].Age; data[i].Gender = Helper(_items[i].Description) ? Gender.Female : Gender.Male; } return ProcessBatch(data); } ``` -- ```bash | Method | Mean | Gen 0 | Gen 1 | Gen 2 | Allocated | |------------------------ |-----------:|-------:|------:|------:|----------:| | VersionObjectArray | 1,384.1 ns | 0.9689 | - | - | 4056 B | | VersionClassArrayPool | 848.8 ns | 0.0076 | - | - | 32 B | | VersionStructStackalloc | 538.9 ns | - | - | - | - | ``` -- * ~33% faster! -- * Mostly, the cost of stack memory zeroing... --- #### Pattern #5 - Stack-based data - `stackalloc` ```cs [Benchmark] [LocalsInit(false)] public double VersionStructStackallocEx() { Span
data = stackalloc DataStruct[_itemsCount]; for (int i = 0; i < _items.Count; ++i) { data[i].Age = _items[i].Age; data[i].Gender = Helper(_items[i].Description) ? Gender.Female : Gender.Male; } return ProcessBatch(data); } ``` --- #### Pattern #5 - Stack-based data - `stackalloc` ```cs [Benchmark] *[LocalsInit(false)] public double VersionStructStackallocEx() { Span
data = stackalloc DataStruct[_itemsCount]; for (int i = 0; i < _items.Count; ++i) { data[i].Age = _items[i].Age; data[i].Gender = Helper(_items[i].Description) ? Gender.Female : Gender.Male; } return ProcessBatch(data); } ``` -- ```bash | Method | Mean | Gen 0 | Gen 1 | Gen 2 | Allocated | |-------------------------- |-----------:|-------:|------:|------:|----------:| | VersionObjectArray | 1,384.1 ns | 0.9689 | - | - | 4056 B | | VersionClassArrayPool | 848.8 ns | 0.0076 | - | - | 32 B | | VersionStructStackalloc | 538.9 ns | - | - | - | - | | VersionStructStackallocEx | 475.2 ns | - | - | - | - | ``` -- * Again, ~10% faster. -- * Using [LocalsInit.Fody](https://github.com/ltrzesniewski/LocalsInit.Fody) plugin for Fody code weaver: *"lets you control the value of the `localsinit` flag on methods. This is most useful to eliminate the zero-initialization of memory returned by `stackalloc`"* --- exclude: true #### Pattern #5 - Stack-based data - Ref struct (aka *byref-like type*) ```cs public ref struct RefBook { public string Title; public string Author; } ``` -- exclude: true * a bunch of limitations: -- exclude: true * it cannot be a field of a class or struct (boxing!) -- exclude: true * it cannot be a static field -- exclude: true * it cannot be an array type element (boxing!) -- exclude: true * it cannot be *boxed* explicitly -- exclude: true * it cannot implement an interface -- exclude: true * it cannot be a generic type argument -- exclude: true * it cannot be a local variable in `async` method -- exclude: true * it cannot be a part of *closure* -- exclude: true * everything **to not occur on the Managed Heap** so it **can contain *managed pointer*** --- exclude: true #### Pattern #5 - Stack-based data - Ref struct (aka *byref-like type*) So what does it give us? -- exclude: true * they are never *heap-allocated* -- exclude: true * fast allocation/deallocation guaranteed (stack/CPU) -- exclude: true * guaranteed no GC overhead - never leaks through boxing -- exclude: true * can contain *managed pointer* -- exclude: true * guaranteed safe thread access -- exclude: true * only single thread can use it -- exclude: true * no need for any synchronization -- exclude: true * guaranteed lifetime -- exclude: true * as long as you have it as local variable - for every subsequent call --- exclude: true #### Pattern #5 - Stack-based data - Ref struct (aka *byref-like type*) It **can contain *managed pointer***, thus also `Span
`: ```cs public ref struct RefData { public int Id; public Span
Data; } ``` It can be a field of other *ref struct*: ```cs public ref struct RefBook { public string Title; public RefData Publisher; } ``` It can be a local variable and method's argument (including *byref*): ```cs public void Test(ref RefBook refBook) { ref RefData refPublisher = ref refBook.Publisher; } ``` --- #### Pattern #5 - Stack-based data - Fixed size buffers .pull-left[ ```cs struct StructWithArray { public char[] Text; public int F2; // other fields... } ``` ] -- .pull-right[ ```cs unsafe struct StructWithFixedBuffer { public fixed char Text[128]; public int F2; // other fields... } ``` ] -- .pull-left[  ] -- .pull-right[  ] -- .clear-left[ And we can `stackalloc` them! Or box in an array! ] --- #### Pattern #5 - Stack-based data - Fixed size buffers ```cs unsafe ref struct FileSystemEntry { private fixed char _fileNameBuffer[NameBufferSize]; private ReadOnlySpan
_fileName; ... public ReadOnlySpan
FileName { get { fixed (char* c = _fileNameBuffer) { Span
buffer = new Span
(c, NameBufferSize); _fileName = _directoryEntry.GetName(buffer); } return _fileName; } } } ``` --- ### Pattern #5 - Stack-based data **Problem:** We allocate a lot of small, temporary objects, which puts a high pressure on the GC. **Solution:** We avoid using Manged Heap by utilizing various possibilities like: `structs`, `ref structs`, `stackalloc`. **Benefits:** No GC overhead makes data processing much faster. **Consequences:** Operating on the stack may be dangerous due to "uncatchable" `StackOverflowException`. Moreover, using the stack has its own cost of memory zeroing. --- class: center, middle ## Pattern #6 - Buffered Builder --- ### Pattern #6 - Buffered Builder **Problem:** We generate a lot of temporary objects due to many operations on immutable data. **Solution:** We create "builder" to manipulate data of immutable nature as "mutable". Ultra popular example is `StringBuilder`. **Benefits:** ... **Consequences:** ... --- #### Pattern #6 - Buffered Builder .pull-left[ ```cs [HttpGet] [Route("values/concatenated/{count}")] public string GetConcatenated(int count) { Random rand = new Random(); var result = "
"; for (int i = 0; i <= count; i++) { result += "
"; } result += "
"; return result; } ``` ] .pull-right[ ```cs [HttpGet] [Route("values/builder/{count}")] public string GetBuilder(int count) { Random rand = new Random(); var sb = new StringBuilder("
"); for (int i = 0; i <= count; i++) { sb.Append("
"); } sb.Append("
"); return sb.ToString(); } ``` ] --- #### Pattern #6 - Buffered Builder .center[] --- #### Pattern #6 - Buffered Builder .center[] --- #### Pattern #6 - Buffered Builder .center[] --- #### Pattern #6 - Buffered Builder For example - `BigInteger`. Proposal [Expose a BigInteger.Builder to help avoid so many copies/allocations](https://github.com/dotnet/corefx/issues/37204): *"`System.Numerics.BigInteger` is an immutable struct that frequently works with "big data". This can cause it to be very ineffecient to use given that you need to create a new immutable struct for every operation. As such, I propose we expose a new `BigInteger.Builder` type which allows for multiple operations to be performed on a mutable class before ultimately serializing the final result back to a regular `BigInteger`"* ```cs public sealed class Builder : ... { public static Builder Abs(Builder value); public static Builder Add(Builder left, Builder right); public static Builder Divide(Builder dividend, Builder divisor); public static Builder Multiply(Builder left, Builder right); public static Builder Negate(Builder value); public static Builder Pow(Builder value, int exponent); public static Builder Remainder(Builder dividend, Builder divisor); ... public BigInteger ToBigInteger(); } ``` --- ```cs public ref struct ValueStringBuilder { private char[] _arrayToReturnToPool; private Span
_chars; private int _pos; public ValueStringBuilder(Span
initialBuffer) { _arrayToReturnToPool = null; _chars = initialBuffer; _pos = 0; } public ref char this[int index] => ref _chars[index]; public void Append(char c) { int pos = _pos; if (pos < _chars.Length) { _chars[pos] = c; _pos = pos + 1; } else GrowAndAppend(c); } private void GrowAndAppend(char c) { Grow(1); Append(c); } ... ``` ??? Combination of this and #4 and #5: ValueStringBuilder (stacklloc internal buffer) --- ```cs ... private void Grow(int requiredAdditionalCapacity) { Debug.Assert(requiredAdditionalCapacity > 0); char[] poolArray = ArrayPool
.Shared.Rent(Math.Max(_pos + requiredAdditionalCapacity, _chars.Length * 2)); _chars.CopyTo(poolArray); char[] toReturn = _arrayToReturnToPool; _chars = _arrayToReturnToPool = poolArray; if (toReturn != null) { ArrayPool
.Shared.Return(toReturn); } } public void Dispose() { char[] toReturn = _arrayToReturnToPool; if (toReturn != null) { ArrayPool
.Shared.Return(toReturn); } } public void Append(string str) { foreach (char c in str) Append(c); } public override string ToString() => new string(_chars.Slice(0, _pos)); ... } ``` --- #### Pattern #6 - Buffered Builder `ValueStringBuilder` example usage: ```cs static string UseValueStringBuilder(string data) { Span
initialBuffer = stackalloc char[32]; using var builder = new ValueStringBuilder(initialBuffer); builder.Append("
"); builder.Append(data); builder.Append("
"); return builder.ToString(); } ``` -- Benchmarks: ```bash | Method | Mean | Gen 0/1k Op | Gen 1 | Gen 2 | Allocated/Op | |---------------------- |---------:|------------:|------:|------:|-------------:| | UseValueStringBuilder | 44.23 ns | 0.0134 | - | - | 56 B | | UseStringBuilder | 51.53 ns | 0.0459 | - | - | 192 B | ``` --- ### Pattern #6 - Buffered Builder **Problem:** We generate a lot of temporary objects due to many operations on immutable data. **Solution:** We create "builder" to manipulate data of immutable nature as "mutable". Ultra popular example is `StringBuilder`. **Benefits:** A lot less pressure on the GC, better data locality (internal buffer). **Consequences:** Slightly more complex API than just using plain immutable data. --- class: center, middle ### Pattern candidates: Lightweight wrapper/RAII, Metaprogramming, Caching, Batching, SIMD (esp. Hardware Intrinsics in C#) --- class: center, middle ## https://ProDotNetMemory.com/slides/PerformancePatternsLong --- class: center, middle # And... that's all! Thank .red[**you**]! Any questions?! --- class: center, middle ## What's .red[the point] of using C# for high-performance, instead of using C/C++?! -- ### 1. only small percent of code requires such low-level - why use C/C++ for all the system then? -- ### 2. (pro) C# job market is much bigger than (pro) C/C++ job market -- ### 3. safety -- ### Example: [Ixy](https://github.com/ixy-languages/ixy-languages) network device drivers written in high-level languages