SlideShare a Scribd company logo
1 of 64
Bit packing like a mad man
Amaury SECHET
@deadalnix
Memory is slow
• About 300 cycles to hit memory
• Bandwidth still increasing
• Latency only marginally increasing
Memory is slow - Caching
• Add faster memory on CPU.
• Various size and speed
– Signal needs time to travel
– L1: 3-4 cycles, 32kb
• Instruction
• Data
– L2: 8-14 cycles, 256kb
– L3: tens of cycles, few Mb, often shared
– Cache line: 64 bytes
But first a small story…
The king is throwing a party
He has 1000 bottles
in his cellar
An evil man poisoned
a bottle with his
secret recipe with 11
herbs and spices !
• The poison will kill anyone
even in small doses.
• It takes several hours for
someone to die from
poisoning.
• The King has 1000 servants
and 20 prisoners.
• He would like to avoid killing
servants if possible, but
killing prisoners is fine.
• What should the king do ?
The answer
• The king can use 10 prisoners.
• Number each bottle in binary
• Each prisoner will drink from multiple bottles
– Prisoner n will drink bottle where the nth digit is 1
• The prisoner ding will give the result in binary.
The king’s party was a real success !
Bit packing
• Reduce memory waste
• Increase cache utilization
• Minimal CPU cost
• Not a replacement for better algorithms
– Instantiating less objects saves a lot of memory !
Alignment
• Ensure that load/store do not
– Cross cache line
– Cross pages boundaries
• Unaligned access: severe penalties
– Bad performances on some CPU, loss of atomicity
• Hardware is doing 2 accesses
– Hard error on others (SIGBUS or alike)
• Defined by ABI
Alignment – Rule of thumb
• Integral types smaller than size_t
– T.sizeof
• Integral types bigger than size_t
– size_t.sizeof
– Compiler will decompose memory accesses
• Structs
– Max(alignment of each field)
– Add padding to respect alignment
Struct padding
struct S {
bool f1;
uint f2;
bool f3;
}
f1 f2pad f3 pad
12 bytes, 6 wasted
Struct padding
struct S {
uint f2;
bool f1;
bool f3;
}
f3f2 f1 pad
8 bytes, 2 wasted
Padding tips
• Start with fields with high alignment
• Know where pads are
• Enforce assumptions using static assert
– alignof
– sizeof
• Classes, like structs, but
– Implicit fields
• Vtable
• Monitor
– At least pointer size alignment
Information density
• How much actual information ?
• Bool
– 1 bit of information
– 8 bits of storage
• Object
– 45 bits of information
– 64 bits of storage
• Dump memory and zip it
– Aim for that size
Bit packing
• Trade memory consumption for CPU
– Usually a good deal
• Use one integral as storage
– Store several elements in that integral
– Use bitwise operations to manipulate elements
• std.bitmanip can help
Struct packing
f1
4 bytes, 0 wasted
import std.bitmanip;
struct S {
mixin(bitfield!(
uint, "f1", 30,
bool, "f2", 1,
bool, "f3", 1,
));
}
f2 f3
• f1 is now 30 bits instead of 32 bits
• Now about 1B max
• Fields aren’t atomic anymore
• bitfield does all the magic
enum ReadMask = (1 << S) – 1;
enum WriteMask = ReadMask << N;
@property uint entry() {
return (data >> N) & ReadMask;
}
@property void entry(uint val) in {
assert(val & ReadMask == val);
} body {
data = (data & ~WriteMask) | ((val << N) & WriteMask);
}
Bit packing intergals
entry
32 NN + S 0
Data:
enum Mask = 1 << N;
@property bool entry() {
return (data & Mask) != 0;
}
@property entry(bool val) {
if (val) {
data = data | Mask;
} else {
data = data & ~Mask;
}
}
Bit packing bools
entry
32 NN + 1 0
Data:
Note: data ^ Mask will flip the bit
It is sometime faster than to set it.
Bitfield layout
• 2 special spots
– Rightmost : mask only
– Leftmost : shift only
• Large elements require large mask
– Put them on the left most
• Bools always use masks
– Can be checked in leftmost with signed < 0
– Don’t put them in special spots unless very hot
Bitfield layout
• We want :
– One flag
– One 2 bits enum E
– A 29 bits integral
• What is the best layout ?
Bitfield layout
enum E { E0, E1, E2, E3 }
struct S {
import std.bitmanip;
mixin(bitfield!(
E, "e", 2,
bool, "flag", 1,
uint, "integral", 29,
));
}
e = cast(E) (data & 0x03);
flag = (data & 0x04) != 0;
integral = data >> 3;
Codegen :
Unused bits
• Sometime, the whole bitfield is not needed
– Create a nameless field
• uint, "", 29
– Make it usable for out struct/subclasses
• uint, ”_derived", 29
• Ideally make it private/protected
• Or use in private struct elements
• Need to implement the remaining fields manually
• Feature request: bitfield with explicit storage
Unused bits - example
class Symbol : Node {
Name name;
Name mangle;
import std.bitmanip;
mixin(bitfields!(
Step, "step", 2,
Linkage, "linkage", 3,
Visibility, "visibility", 3,
InTemplate, "inTemplate", 1,
bool, "hasThis", 1,
bool, "hasContext", 1,
bool, "isPoisoned", 1,
bool, "isAbstract", 1,
bool, "isProperty", 1,
uint, "derived", 18,
));
}
class Field : Symbol {
// ...
this(..., uint index, ... ) {
// ...
this.derived = index;
// Always true for fields.
this.hasThis = true;
}
@property index() const {
// Only 262 143 fields possible !
return derived;
}
}
Tagging pointers - @trusted
• Least significant bits are known to be 0
– How many depends on alignment
– Log2(T.alignof)
– At least 3 bits on Objects (2 on 32 bits systems)
• Once again, std.bitmanip can help
– taggedPointer/taggedClassRef
– Checks alignment constraints at compiler time
– Misaligned pointers are not safe
Tagging pointers - @trusted
enum Color { Black, Red }
struct Link(T) {
import std.bitmanip;
mixin(taggedPointer!(
T*, "child",
Color, "color", 1,
));
}
struct Node(T) {
Link!T left;
Link!T right;
}
pointed
child
• Actual pointer points at the object
• Tagged pointer point within the object
• GC knows about interior pointers
Tagging pointers - @system
• Allocate in the lower 32bits of address space
– Truncate pointer to 32 bits
– Limited to 4Gb
– Jemalloc can do that for you
– Used by HHVM for codegen
• On X86 most significant 16bits are zeros
– Hijack them !
– Confuse the GC !
– Try to not SEGFAULT
Intermission – Germany loves D !
They even put stickers on their cars !
Let’s use a context
• Useful for cold but often reused data
• For instance, identifiers in a compiler
– Usually don’t care about the actual value
• Context store identifiers, provide a unique id
– 32 bits vs 128 bits
– Equality can be tested with an int compare
– Can be its own hash for hastable lookups
• Make the GC happy
– less pointers
– More noscan !
Let’s use a context
struct Name {
private:
uint id;
this(uint id) {
this.id = id;
}
public:
string toString(const Context c) const {
return c.names[id]
}
immutable(char)* toStringz(const Context c) const {
auto s = toString();
assert(s.ptr[s.length] == '0', "Expected a zero terminated string");
return s.ptr;
}
}
class Context {
private:
string[] names;
uint[string] lookups;
public:
auto getName(const(char)[] str) {
if (auto id = str in lookups) {
return Name(*id);
}
// As we are cloning, make sure it is 0 terminated as to pass to C.
import std.string;
auto s = str.toStringz()[0 .. str.length];
auto id = lookups[s] = cast(uint) names.length;
names ~= s;
return Name(id);
}
}
Let’s use a context
Context prefill
• Useful to pin some id at compile time
• Can be used without lookup in the context
• Generated identifiers
• object.d
• Linkage/Version/Scope/Attribute
Context prefill
enum Reserved = [
"__ctor", "__dtor", "__postblit", "__vtbl",
];
enum Prefill = [
// Linkages
"C", "D", "C++", "Windows", "System",
// Generated
"init", "length", "max", "min",
"ptr", "sizeof", "alignof",
// Scope
"exit", "success", "failure",
// Defined in object
"object", "size_t", "ptrdiff_t", "string",
"Object",
"TypeInfo", "ClassInfo",
"Throwable", "Exception", "Error",
// Attribute
"property", "safe", "trusted", "system", "nogc",
// ...
];
auto getNames() {
import d.lexer;
auto identifiers = [""];
foreach(k, _; getOperatorsMap()) {
identifiers ~= k;
}
foreach(k, _; getKeywordsMap()) {
identifiers ~= k;
}
return identifiers ~ Reserved ~ Prefill;
}
enum Names = getNames();
Context prefill
auto getLookups() {
uint[string] lookups;
foreach(uint i, id; Names) {
lookups[id] = i;
}
return lookups;
}
enum Lookups = getLookups();
template BuiltinName(
string name,
) {
private enum id = Lookups
.get(name, uint.max);
static assert(
id < uint.max,
name ~ " is not a builtin
name.",
);
enum BuiltinName = Name(id);
}
More context !
• Track locations in a compiler
– They are everywhere
• Register file in the context
– Allocate a range of value from N to N + sizeof(file)
– A position for each byte in the file !
• Add a flag for mixin (D) / macros (C++)
– Register expansions in the context.
More context !
• Use cases:
– Emit debug infos
– Error messages
• Perfs do not matter for errors
• Access pattern mostly predictable for debug
• Find file/line from location using
– One element cache
– Linear search (8 elements)
– Binary search
More context !
File 2 File 3 EmptyFile 1
Mixin 2
Mixin
3
Empty
Mixin
1
0 2B
-2B -1
Context store file boundaries and line position within files
More context !
• A position is 31 bits number + a flag
– Up to 2Gb of source code + 2 Gb of macros/mixin
• A pair of positions is a location
– Used for tokens/expressions/symbols/statements
• Lexer only need to bump the position value
for each token by the length of the token
• Strategy used by clang / SDC
Polymorphism
Tagged reference
• Useful to encapsulate several reference types
• Can provide methods forwarding to elements
– Use reflection to do so
– Avoid vtable lookups/cascaded loads
– No common layout in the referenced object
• Number of elements limited by alignement
– Easy to get up to 8 on X64
• LLVM’s call/invoke
Tagged reference
template TagFields(uint i, U...) {
import std.conv;
static if (U.length == 0) {
enum TagFields = "nt" ~ T.stringof ~ " = “
~ to!string(i) ~ ",";
} else {
enum S = U[0].stringof;
static assert(
(S[0] & 0x80) == 0,
S ~ " must not start with an unicode.",
);
static assert(
U[0].sizeof <= size_t.sizeof,
"Elements must be of pointer size or smaller.",
);
import std.ascii;
enum Name = (S == "typeof(null)")
? "Undefined"
: toUpper(S[0]) ~ S[1 .. $];
enum TagFields = "nt" ~ Name ~ " = "
~ to!string(i) ~ "," ~ TagFields!(i + 1, U[1 .. $]);
}
}
mixin("enum Tag {" ~ TagFields!(0, U) ~ "n}");
import std.traits;
alias Tags = EnumMembers!Tag;
import std.typetuple;
alias TagTuple = TypeTuple!(uint, "tag", EnumSize!Tag);
Tagged reference
struct TaggedRef(U...) {
private:
import std.bitmanip;
mixin(taggedPointer!(
void*, "ptr", TagTuple));
public:
auto get(Tag E)() in {
assert(tag == E);
} body {
static union Helper {
void* __ptr;
U u;
}
return Helper(ptr).u[E];
}
template opDispatch(string s, T...) {
auto opDispatch(A...)(A args) {
final switch(tag) {
foreach(T; Tags) {
case T:
auto r = get!T();
return mixin("r." ~ s)(args);
}
}
}
}
}
Value Type Polymorphism
• All subtypes fit under a given size budget
• A tag is used to differentiate them
• The whole thing is wrapped in an nice API
• Being able to hide atrocities behind a nice
façade, that’s the power of D
• Example: Representing D types
Value Type Polymorphism
template SizeOfBitField(T...) {
static if (T.length < 2) {
enum SizeOfBitField = 0;
} else {
enum SizeOfBitField =
T[2] + SizeOfBitField!(T[3 .. $]);
}
}
enum EnumSize(E) =
computeEnumSize!E();
size_t computeEnumSize(E)() {
size_t size = 0;
import std.traits;
foreach (m; EnumMembers!E) {
size_t ms = 0;
while ((m >> ms) != 0) {
ms++;
}
import std.algorithm;
size = max(size, ms);
}
return size;
}
Value Type Polymorphism
struct TypeDescriptor(K, T...) {
enum DataSize = ulong.sizeof * 8 - 3 - EnumSize!K - SizeOfBitField!T;
import std.bitmanip;
mixin(bitfields!(
K, "kind", EnumSize!K,
TypeQualifier, "qualifier", 3,
ulong, "data", DataSize,
T,
));
static assert(TypeDescriptor.sizeof == ulong.sizeof);
this(K k, TypeQualifier q, ulong d = 0) {
kind = k;
qualifier = q;
data = d;
}
}
Value Type Polymorphism
• A type is a TypeDescriptor + an indirection field
• Data depend on the kind
– If it doesn’t fit, use indirection field
• There are many type kind:
– Builtin
– Struct
– Class
– Alias
– Function
– …
• Common API switch on kind to do the right thing
Value Type Polymorphism
data Qualifier Kind
Indirection
• 128 bits budget
• Indirection is used when
• The type need extra space (Function)
• The type need to refers to a symbol (Aggregate, Alias)
• Otherwise null
• Replaced the type class hierarchy advantageously
• Significant memory consumption reduction
• Significantly faster runtime (about 20%)
Value Type Polymorphism
• You can nest, effectively creating hierarcies
• For instance, Identifiable is
– A type
– An expression
– A symbol
• More packing !
Value Type Polymorphism
data Qualifier Kind
Indirection/Expression/Symbol
Tag
• Tag is used to discriminate between
• Type
• Expression
• Symbol
• Tag is zeroed out to find the type
• Saved 70 Mb (!) of template bloat in SDC
Value Type Polymorphism
import d.semantic.identifier;
Identifiable i = ...;
i.apply!(delegate Expression(identified) {
alias T = typeof(identified);
static if (is(T : Expression)) {
return identified;
} else {
return getError(
identified,
location,
t.name.toString(pass.context) ~ " isn't callable",
);
}
})();
Value Type Polymorphism
Identifiable
Type Expression Symbol
Builtin Class AliasStruct Pointer Function …
Value Type - ABI
• Struct up to 2 fields
– Up to pointer sized
– Slice !
– No float/integral mixing
• Common anti pattern 2 pointers + a bool
– std.bigint.BigInt is a slice + a bool
– Passed in memory instead of registers 
• More than one pointer tends to use 2
– Use either 1 or 2 pointer sized struct
Classless Polymorphism
Classless Polymorphism
• Create a base struct
• All substruct use it as first field
• Contains a tag describing the type
– The tag can be part of a bitfield
• Use mixin in all substruct
– Include static assert to check this is done right
– Alias this the base
Classless Polymorphism
• Each leaf of the hierarchy has a tag value
• Each non leaf has a range of tag value
• The root match all values
• The hierarchy must be know at compile time
• Use a bunch of mixin templates
– Add the boilerplate
– A ton of static asserts
Classless Polymorphism
struct Child {
mixin Parent!Root;
}
struct Root {
mixin Childs!(Child, SubStruct);
}
struct SubStruct {
mixin GrandChilds!(
Root,
SubChild,
);
}
struct SubChild {
mixin Parent!SubStruct;
}
Classless Polymorphism
Root
Root Child’s fields
Root SubStruct’s fields
Root SubStruct’s fields SubChild’s fields
Classless Polymorphism
• Child share the parent’s part of the layout
– It is safe to upcast
– Done via alias this
• Downcast to a leaf: check tag’s value
– Cheap
– Easy pattern matching
• Downcast to substruct: check tag range
– Cheap
• No typeid pointer chasing
Virtualish Dispatch
• No virtual table
• Get function pointer in a table
– One table per method
– One entry per leaf type
– Using the tag as an index
• Used by HHVM for PHP arrays
– Creative datastructure
– Is a vector/hashmap/set/tuple/whatever…
Regular Virtual Dispatch
f1 f2 f3 f4
Vtable
pointer
T1’s data
g1 g2 g3 g4
Vtable
pointer
T2’s data
• One vtable per type
• Vtable has one entry per method
• Load vtable then load function address
Virtualish Dispatch
f1 g1 h1 i1
Tag T1’s data
f2 g2 h2 i2
Tag T2’s data
• One vtable per method
• Vtable has one entry per type
• Load tag then use it as index in per function table
Virtualish Dispatch
• Usually better locality
– Calling the same method on objects of various
types more common than calling various method
on objects of the same type
• Often worked around by sorting by type
– Classless get most of the benefit without sorting
– Still helps branch prediction
• Tables can be generated using reflection in D
Classless visitors !
• Regular class hierarchy need to know all
method at compile time
– Can add types dynamically
• Classless hierarchy need to know all types at
compile time
– Can add method dynamically
• Visitor can create a visit method’s table
– And use the tag to dispatch
• Closed extensibility one way, opened it
another way
Bit packing like a mad man

More Related Content

What's hot

Design Patterns in Modern C++
Design Patterns in Modern C++Design Patterns in Modern C++
Design Patterns in Modern C++Dmitri Nesteruk
 
Go Programming Language (Golang)
Go Programming Language (Golang)Go Programming Language (Golang)
Go Programming Language (Golang)Ishin Vin
 
Generics Past, Present and Future (Latest)
Generics Past, Present and Future (Latest)Generics Past, Present and Future (Latest)
Generics Past, Present and Future (Latest)RichardWarburton
 
Cryptographic algorithms
Cryptographic algorithmsCryptographic algorithms
Cryptographic algorithmsAnamika Singh
 
Groovy Ast Transformations (greach)
Groovy Ast Transformations (greach)Groovy Ast Transformations (greach)
Groovy Ast Transformations (greach)HamletDRC
 
AES effecitve software implementation
AES effecitve software implementationAES effecitve software implementation
AES effecitve software implementationRoman Oliynykov
 
Generics Past, Present and Future
Generics Past, Present and FutureGenerics Past, Present and Future
Generics Past, Present and FutureRichardWarburton
 
What make Swift Awesome
What make Swift AwesomeWhat make Swift Awesome
What make Swift AwesomeSokna Ly
 
Code GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersCode GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersMarina Kolpakova
 
ClojureScript loves React, DomCode May 26 2015
ClojureScript loves React, DomCode May 26 2015ClojureScript loves React, DomCode May 26 2015
ClojureScript loves React, DomCode May 26 2015Michiel Borkent
 
ClojureScript for the web
ClojureScript for the webClojureScript for the web
ClojureScript for the webMichiel Borkent
 

What's hot (20)

Design Patterns in Modern C++
Design Patterns in Modern C++Design Patterns in Modern C++
Design Patterns in Modern C++
 
Full Stack Clojure
Full Stack ClojureFull Stack Clojure
Full Stack Clojure
 
Go Programming Language (Golang)
Go Programming Language (Golang)Go Programming Language (Golang)
Go Programming Language (Golang)
 
Go. Why it goes
Go. Why it goesGo. Why it goes
Go. Why it goes
 
Generics Past, Present and Future (Latest)
Generics Past, Present and Future (Latest)Generics Past, Present and Future (Latest)
Generics Past, Present and Future (Latest)
 
Cryptographic algorithms
Cryptographic algorithmsCryptographic algorithms
Cryptographic algorithms
 
Groovy Ast Transformations (greach)
Groovy Ast Transformations (greach)Groovy Ast Transformations (greach)
Groovy Ast Transformations (greach)
 
AES effecitve software implementation
AES effecitve software implementationAES effecitve software implementation
AES effecitve software implementation
 
Collections forceawakens
Collections forceawakensCollections forceawakens
Collections forceawakens
 
Generics Past, Present and Future
Generics Past, Present and FutureGenerics Past, Present and Future
Generics Past, Present and Future
 
Protostar VM - Heap3
Protostar VM - Heap3Protostar VM - Heap3
Protostar VM - Heap3
 
What make Swift Awesome
What make Swift AwesomeWhat make Swift Awesome
What make Swift Awesome
 
Memory management
Memory managementMemory management
Memory management
 
Code GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersCode GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limiters
 
tokyotalk
tokyotalktokyotalk
tokyotalk
 
ClojureScript loves React, DomCode May 26 2015
ClojureScript loves React, DomCode May 26 2015ClojureScript loves React, DomCode May 26 2015
ClojureScript loves React, DomCode May 26 2015
 
ClojureScript for the web
ClojureScript for the webClojureScript for the web
ClojureScript for the web
 
RealmDB for Android
RealmDB for AndroidRealmDB for Android
RealmDB for Android
 
Managing Memory
Managing MemoryManaging Memory
Managing Memory
 
Nicety of Java 8 Multithreading
Nicety of Java 8 MultithreadingNicety of Java 8 Multithreading
Nicety of Java 8 Multithreading
 

Similar to Bit packing like a mad man

Overview on Cryptography and Network Security
Overview on Cryptography and Network SecurityOverview on Cryptography and Network Security
Overview on Cryptography and Network SecurityDr. Rupa Ch
 
Learning python
Learning pythonLearning python
Learning pythonFraboni Ec
 
Learning python
Learning pythonLearning python
Learning pythonJames Wong
 
Intermediate code generation in Compiler Design
Intermediate code generation in Compiler DesignIntermediate code generation in Compiler Design
Intermediate code generation in Compiler DesignKuppusamy P
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory OptimizationWei Lin
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimizationguest3eed30
 
Fundamentals of Information Encryption
Fundamentals of Information EncryptionFundamentals of Information Encryption
Fundamentals of Information EncryptionAmna Magzoub
 
(4) cpp abstractions references_copies_and_const-ness
(4) cpp abstractions references_copies_and_const-ness(4) cpp abstractions references_copies_and_const-ness
(4) cpp abstractions references_copies_and_const-nessNico Ludwig
 
CNIT 126: 13: Data Encoding
CNIT 126: 13: Data EncodingCNIT 126: 13: Data Encoding
CNIT 126: 13: Data EncodingSam Bowne
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Julien Le Dem
 
Basics of Python
Basics of PythonBasics of Python
Basics of PythonEase3
 
CNIT 126 13: Data Encoding
CNIT 126 13: Data EncodingCNIT 126 13: Data Encoding
CNIT 126 13: Data EncodingSam Bowne
 

Similar to Bit packing like a mad man (20)

Data type
Data typeData type
Data type
 
Overview on Cryptography and Network Security
Overview on Cryptography and Network SecurityOverview on Cryptography and Network Security
Overview on Cryptography and Network Security
 
Learning python
Learning pythonLearning python
Learning python
 
Learning python
Learning pythonLearning python
Learning python
 
Learning python
Learning pythonLearning python
Learning python
 
Learning python
Learning pythonLearning python
Learning python
 
Learning python
Learning pythonLearning python
Learning python
 
Learning python
Learning pythonLearning python
Learning python
 
Learning python
Learning pythonLearning python
Learning python
 
aspice
aspiceaspice
aspice
 
Intermediate code generation in Compiler Design
Intermediate code generation in Compiler DesignIntermediate code generation in Compiler Design
Intermediate code generation in Compiler Design
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimization
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimization
 
Fundamentals of Information Encryption
Fundamentals of Information EncryptionFundamentals of Information Encryption
Fundamentals of Information Encryption
 
(4) cpp abstractions references_copies_and_const-ness
(4) cpp abstractions references_copies_and_const-ness(4) cpp abstractions references_copies_and_const-ness
(4) cpp abstractions references_copies_and_const-ness
 
CNIT 126: 13: Data Encoding
CNIT 126: 13: Data EncodingCNIT 126: 13: Data Encoding
CNIT 126: 13: Data Encoding
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
 
Basics of Python
Basics of PythonBasics of Python
Basics of Python
 
CNIT 126 13: Data Encoding
CNIT 126 13: Data EncodingCNIT 126 13: Data Encoding
CNIT 126 13: Data Encoding
 

More from Andrei Alexandrescu

Andrei Alexandrescu keynote at CSDN 2007 in Beijing, China
Andrei Alexandrescu keynote at CSDN 2007 in Beijing, ChinaAndrei Alexandrescu keynote at CSDN 2007 in Beijing, China
Andrei Alexandrescu keynote at CSDN 2007 in Beijing, ChinaAndrei Alexandrescu
 
ACCU Keynote by Andrei Alexandrescu
ACCU Keynote by Andrei AlexandrescuACCU Keynote by Andrei Alexandrescu
ACCU Keynote by Andrei AlexandrescuAndrei Alexandrescu
 
DConf 2016: What Parnas72 Means for D by Luis Marques
DConf 2016: What Parnas72 Means for D by Luis MarquesDConf 2016: What Parnas72 Means for D by Luis Marques
DConf 2016: What Parnas72 Means for D by Luis MarquesAndrei Alexandrescu
 
DConf 2016: Sociomantic & D by Leandro Lucarella (extended version)
DConf 2016: Sociomantic & D by Leandro Lucarella (extended version)DConf 2016: Sociomantic & D by Leandro Lucarella (extended version)
DConf 2016: Sociomantic & D by Leandro Lucarella (extended version)Andrei Alexandrescu
 
DConf 2016: Sociomantic & D by Leandro Lucarella
DConf 2016: Sociomantic & D by Leandro LucarellaDConf 2016: Sociomantic & D by Leandro Lucarella
DConf 2016: Sociomantic & D by Leandro LucarellaAndrei Alexandrescu
 
DaNode - A home made web server in D
DaNode - A home made web server in DDaNode - A home made web server in D
DaNode - A home made web server in DAndrei Alexandrescu
 

More from Andrei Alexandrescu (7)

Andrei Alexandrescu keynote at CSDN 2007 in Beijing, China
Andrei Alexandrescu keynote at CSDN 2007 in Beijing, ChinaAndrei Alexandrescu keynote at CSDN 2007 in Beijing, China
Andrei Alexandrescu keynote at CSDN 2007 in Beijing, China
 
ACCU Keynote by Andrei Alexandrescu
ACCU Keynote by Andrei AlexandrescuACCU Keynote by Andrei Alexandrescu
ACCU Keynote by Andrei Alexandrescu
 
DConf 2016: What Parnas72 Means for D by Luis Marques
DConf 2016: What Parnas72 Means for D by Luis MarquesDConf 2016: What Parnas72 Means for D by Luis Marques
DConf 2016: What Parnas72 Means for D by Luis Marques
 
DConf 2016: Sociomantic & D by Leandro Lucarella (extended version)
DConf 2016: Sociomantic & D by Leandro Lucarella (extended version)DConf 2016: Sociomantic & D by Leandro Lucarella (extended version)
DConf 2016: Sociomantic & D by Leandro Lucarella (extended version)
 
DConf 2016: Sociomantic & D by Leandro Lucarella
DConf 2016: Sociomantic & D by Leandro LucarellaDConf 2016: Sociomantic & D by Leandro Lucarella
DConf 2016: Sociomantic & D by Leandro Lucarella
 
DaNode - A home made web server in D
DaNode - A home made web server in DDaNode - A home made web server in D
DaNode - A home made web server in D
 
DConf 2016 Opening Keynote
DConf 2016 Opening KeynoteDConf 2016 Opening Keynote
DConf 2016 Opening Keynote
 

Recently uploaded

CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 

Recently uploaded (20)

CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 

Bit packing like a mad man

  • 1. Bit packing like a mad man Amaury SECHET @deadalnix
  • 2. Memory is slow • About 300 cycles to hit memory • Bandwidth still increasing • Latency only marginally increasing
  • 3. Memory is slow - Caching • Add faster memory on CPU. • Various size and speed – Signal needs time to travel – L1: 3-4 cycles, 32kb • Instruction • Data – L2: 8-14 cycles, 256kb – L3: tens of cycles, few Mb, often shared – Cache line: 64 bytes
  • 4. But first a small story…
  • 5. The king is throwing a party He has 1000 bottles in his cellar
  • 6. An evil man poisoned a bottle with his secret recipe with 11 herbs and spices ! • The poison will kill anyone even in small doses. • It takes several hours for someone to die from poisoning. • The King has 1000 servants and 20 prisoners. • He would like to avoid killing servants if possible, but killing prisoners is fine. • What should the king do ?
  • 7. The answer • The king can use 10 prisoners. • Number each bottle in binary • Each prisoner will drink from multiple bottles – Prisoner n will drink bottle where the nth digit is 1 • The prisoner ding will give the result in binary.
  • 8. The king’s party was a real success !
  • 9. Bit packing • Reduce memory waste • Increase cache utilization • Minimal CPU cost • Not a replacement for better algorithms – Instantiating less objects saves a lot of memory !
  • 10. Alignment • Ensure that load/store do not – Cross cache line – Cross pages boundaries • Unaligned access: severe penalties – Bad performances on some CPU, loss of atomicity • Hardware is doing 2 accesses – Hard error on others (SIGBUS or alike) • Defined by ABI
  • 11. Alignment – Rule of thumb • Integral types smaller than size_t – T.sizeof • Integral types bigger than size_t – size_t.sizeof – Compiler will decompose memory accesses • Structs – Max(alignment of each field) – Add padding to respect alignment
  • 12. Struct padding struct S { bool f1; uint f2; bool f3; } f1 f2pad f3 pad 12 bytes, 6 wasted
  • 13. Struct padding struct S { uint f2; bool f1; bool f3; } f3f2 f1 pad 8 bytes, 2 wasted
  • 14. Padding tips • Start with fields with high alignment • Know where pads are • Enforce assumptions using static assert – alignof – sizeof • Classes, like structs, but – Implicit fields • Vtable • Monitor – At least pointer size alignment
  • 15. Information density • How much actual information ? • Bool – 1 bit of information – 8 bits of storage • Object – 45 bits of information – 64 bits of storage • Dump memory and zip it – Aim for that size
  • 16. Bit packing • Trade memory consumption for CPU – Usually a good deal • Use one integral as storage – Store several elements in that integral – Use bitwise operations to manipulate elements • std.bitmanip can help
  • 17. Struct packing f1 4 bytes, 0 wasted import std.bitmanip; struct S { mixin(bitfield!( uint, "f1", 30, bool, "f2", 1, bool, "f3", 1, )); } f2 f3 • f1 is now 30 bits instead of 32 bits • Now about 1B max • Fields aren’t atomic anymore • bitfield does all the magic
  • 18. enum ReadMask = (1 << S) – 1; enum WriteMask = ReadMask << N; @property uint entry() { return (data >> N) & ReadMask; } @property void entry(uint val) in { assert(val & ReadMask == val); } body { data = (data & ~WriteMask) | ((val << N) & WriteMask); } Bit packing intergals entry 32 NN + S 0 Data:
  • 19. enum Mask = 1 << N; @property bool entry() { return (data & Mask) != 0; } @property entry(bool val) { if (val) { data = data | Mask; } else { data = data & ~Mask; } } Bit packing bools entry 32 NN + 1 0 Data: Note: data ^ Mask will flip the bit It is sometime faster than to set it.
  • 20. Bitfield layout • 2 special spots – Rightmost : mask only – Leftmost : shift only • Large elements require large mask – Put them on the left most • Bools always use masks – Can be checked in leftmost with signed < 0 – Don’t put them in special spots unless very hot
  • 21. Bitfield layout • We want : – One flag – One 2 bits enum E – A 29 bits integral • What is the best layout ?
  • 22. Bitfield layout enum E { E0, E1, E2, E3 } struct S { import std.bitmanip; mixin(bitfield!( E, "e", 2, bool, "flag", 1, uint, "integral", 29, )); } e = cast(E) (data & 0x03); flag = (data & 0x04) != 0; integral = data >> 3; Codegen :
  • 23. Unused bits • Sometime, the whole bitfield is not needed – Create a nameless field • uint, "", 29 – Make it usable for out struct/subclasses • uint, ”_derived", 29 • Ideally make it private/protected • Or use in private struct elements • Need to implement the remaining fields manually • Feature request: bitfield with explicit storage
  • 24. Unused bits - example class Symbol : Node { Name name; Name mangle; import std.bitmanip; mixin(bitfields!( Step, "step", 2, Linkage, "linkage", 3, Visibility, "visibility", 3, InTemplate, "inTemplate", 1, bool, "hasThis", 1, bool, "hasContext", 1, bool, "isPoisoned", 1, bool, "isAbstract", 1, bool, "isProperty", 1, uint, "derived", 18, )); } class Field : Symbol { // ... this(..., uint index, ... ) { // ... this.derived = index; // Always true for fields. this.hasThis = true; } @property index() const { // Only 262 143 fields possible ! return derived; } }
  • 25. Tagging pointers - @trusted • Least significant bits are known to be 0 – How many depends on alignment – Log2(T.alignof) – At least 3 bits on Objects (2 on 32 bits systems) • Once again, std.bitmanip can help – taggedPointer/taggedClassRef – Checks alignment constraints at compiler time – Misaligned pointers are not safe
  • 26. Tagging pointers - @trusted enum Color { Black, Red } struct Link(T) { import std.bitmanip; mixin(taggedPointer!( T*, "child", Color, "color", 1, )); } struct Node(T) { Link!T left; Link!T right; } pointed child • Actual pointer points at the object • Tagged pointer point within the object • GC knows about interior pointers
  • 27. Tagging pointers - @system • Allocate in the lower 32bits of address space – Truncate pointer to 32 bits – Limited to 4Gb – Jemalloc can do that for you – Used by HHVM for codegen • On X86 most significant 16bits are zeros – Hijack them ! – Confuse the GC ! – Try to not SEGFAULT
  • 28. Intermission – Germany loves D ! They even put stickers on their cars !
  • 29. Let’s use a context • Useful for cold but often reused data • For instance, identifiers in a compiler – Usually don’t care about the actual value • Context store identifiers, provide a unique id – 32 bits vs 128 bits – Equality can be tested with an int compare – Can be its own hash for hastable lookups • Make the GC happy – less pointers – More noscan !
  • 30. Let’s use a context struct Name { private: uint id; this(uint id) { this.id = id; } public: string toString(const Context c) const { return c.names[id] } immutable(char)* toStringz(const Context c) const { auto s = toString(); assert(s.ptr[s.length] == '0', "Expected a zero terminated string"); return s.ptr; } }
  • 31. class Context { private: string[] names; uint[string] lookups; public: auto getName(const(char)[] str) { if (auto id = str in lookups) { return Name(*id); } // As we are cloning, make sure it is 0 terminated as to pass to C. import std.string; auto s = str.toStringz()[0 .. str.length]; auto id = lookups[s] = cast(uint) names.length; names ~= s; return Name(id); } } Let’s use a context
  • 32. Context prefill • Useful to pin some id at compile time • Can be used without lookup in the context • Generated identifiers • object.d • Linkage/Version/Scope/Attribute
  • 33. Context prefill enum Reserved = [ "__ctor", "__dtor", "__postblit", "__vtbl", ]; enum Prefill = [ // Linkages "C", "D", "C++", "Windows", "System", // Generated "init", "length", "max", "min", "ptr", "sizeof", "alignof", // Scope "exit", "success", "failure", // Defined in object "object", "size_t", "ptrdiff_t", "string", "Object", "TypeInfo", "ClassInfo", "Throwable", "Exception", "Error", // Attribute "property", "safe", "trusted", "system", "nogc", // ... ]; auto getNames() { import d.lexer; auto identifiers = [""]; foreach(k, _; getOperatorsMap()) { identifiers ~= k; } foreach(k, _; getKeywordsMap()) { identifiers ~= k; } return identifiers ~ Reserved ~ Prefill; } enum Names = getNames();
  • 34. Context prefill auto getLookups() { uint[string] lookups; foreach(uint i, id; Names) { lookups[id] = i; } return lookups; } enum Lookups = getLookups(); template BuiltinName( string name, ) { private enum id = Lookups .get(name, uint.max); static assert( id < uint.max, name ~ " is not a builtin name.", ); enum BuiltinName = Name(id); }
  • 35. More context ! • Track locations in a compiler – They are everywhere • Register file in the context – Allocate a range of value from N to N + sizeof(file) – A position for each byte in the file ! • Add a flag for mixin (D) / macros (C++) – Register expansions in the context.
  • 36. More context ! • Use cases: – Emit debug infos – Error messages • Perfs do not matter for errors • Access pattern mostly predictable for debug • Find file/line from location using – One element cache – Linear search (8 elements) – Binary search
  • 37. More context ! File 2 File 3 EmptyFile 1 Mixin 2 Mixin 3 Empty Mixin 1 0 2B -2B -1 Context store file boundaries and line position within files
  • 38. More context ! • A position is 31 bits number + a flag – Up to 2Gb of source code + 2 Gb of macros/mixin • A pair of positions is a location – Used for tokens/expressions/symbols/statements • Lexer only need to bump the position value for each token by the length of the token • Strategy used by clang / SDC
  • 40. Tagged reference • Useful to encapsulate several reference types • Can provide methods forwarding to elements – Use reflection to do so – Avoid vtable lookups/cascaded loads – No common layout in the referenced object • Number of elements limited by alignement – Easy to get up to 8 on X64 • LLVM’s call/invoke
  • 41. Tagged reference template TagFields(uint i, U...) { import std.conv; static if (U.length == 0) { enum TagFields = "nt" ~ T.stringof ~ " = “ ~ to!string(i) ~ ","; } else { enum S = U[0].stringof; static assert( (S[0] & 0x80) == 0, S ~ " must not start with an unicode.", ); static assert( U[0].sizeof <= size_t.sizeof, "Elements must be of pointer size or smaller.", ); import std.ascii; enum Name = (S == "typeof(null)") ? "Undefined" : toUpper(S[0]) ~ S[1 .. $]; enum TagFields = "nt" ~ Name ~ " = " ~ to!string(i) ~ "," ~ TagFields!(i + 1, U[1 .. $]); } } mixin("enum Tag {" ~ TagFields!(0, U) ~ "n}"); import std.traits; alias Tags = EnumMembers!Tag; import std.typetuple; alias TagTuple = TypeTuple!(uint, "tag", EnumSize!Tag);
  • 42. Tagged reference struct TaggedRef(U...) { private: import std.bitmanip; mixin(taggedPointer!( void*, "ptr", TagTuple)); public: auto get(Tag E)() in { assert(tag == E); } body { static union Helper { void* __ptr; U u; } return Helper(ptr).u[E]; } template opDispatch(string s, T...) { auto opDispatch(A...)(A args) { final switch(tag) { foreach(T; Tags) { case T: auto r = get!T(); return mixin("r." ~ s)(args); } } } } }
  • 43. Value Type Polymorphism • All subtypes fit under a given size budget • A tag is used to differentiate them • The whole thing is wrapped in an nice API • Being able to hide atrocities behind a nice façade, that’s the power of D • Example: Representing D types
  • 44. Value Type Polymorphism template SizeOfBitField(T...) { static if (T.length < 2) { enum SizeOfBitField = 0; } else { enum SizeOfBitField = T[2] + SizeOfBitField!(T[3 .. $]); } } enum EnumSize(E) = computeEnumSize!E(); size_t computeEnumSize(E)() { size_t size = 0; import std.traits; foreach (m; EnumMembers!E) { size_t ms = 0; while ((m >> ms) != 0) { ms++; } import std.algorithm; size = max(size, ms); } return size; }
  • 45. Value Type Polymorphism struct TypeDescriptor(K, T...) { enum DataSize = ulong.sizeof * 8 - 3 - EnumSize!K - SizeOfBitField!T; import std.bitmanip; mixin(bitfields!( K, "kind", EnumSize!K, TypeQualifier, "qualifier", 3, ulong, "data", DataSize, T, )); static assert(TypeDescriptor.sizeof == ulong.sizeof); this(K k, TypeQualifier q, ulong d = 0) { kind = k; qualifier = q; data = d; } }
  • 46. Value Type Polymorphism • A type is a TypeDescriptor + an indirection field • Data depend on the kind – If it doesn’t fit, use indirection field • There are many type kind: – Builtin – Struct – Class – Alias – Function – … • Common API switch on kind to do the right thing
  • 47. Value Type Polymorphism data Qualifier Kind Indirection • 128 bits budget • Indirection is used when • The type need extra space (Function) • The type need to refers to a symbol (Aggregate, Alias) • Otherwise null • Replaced the type class hierarchy advantageously • Significant memory consumption reduction • Significantly faster runtime (about 20%)
  • 48. Value Type Polymorphism • You can nest, effectively creating hierarcies • For instance, Identifiable is – A type – An expression – A symbol • More packing !
  • 49. Value Type Polymorphism data Qualifier Kind Indirection/Expression/Symbol Tag • Tag is used to discriminate between • Type • Expression • Symbol • Tag is zeroed out to find the type • Saved 70 Mb (!) of template bloat in SDC
  • 50. Value Type Polymorphism import d.semantic.identifier; Identifiable i = ...; i.apply!(delegate Expression(identified) { alias T = typeof(identified); static if (is(T : Expression)) { return identified; } else { return getError( identified, location, t.name.toString(pass.context) ~ " isn't callable", ); } })();
  • 51. Value Type Polymorphism Identifiable Type Expression Symbol Builtin Class AliasStruct Pointer Function …
  • 52. Value Type - ABI • Struct up to 2 fields – Up to pointer sized – Slice ! – No float/integral mixing • Common anti pattern 2 pointers + a bool – std.bigint.BigInt is a slice + a bool – Passed in memory instead of registers  • More than one pointer tends to use 2 – Use either 1 or 2 pointer sized struct
  • 54. Classless Polymorphism • Create a base struct • All substruct use it as first field • Contains a tag describing the type – The tag can be part of a bitfield • Use mixin in all substruct – Include static assert to check this is done right – Alias this the base
  • 55. Classless Polymorphism • Each leaf of the hierarchy has a tag value • Each non leaf has a range of tag value • The root match all values • The hierarchy must be know at compile time • Use a bunch of mixin templates – Add the boilerplate – A ton of static asserts
  • 56. Classless Polymorphism struct Child { mixin Parent!Root; } struct Root { mixin Childs!(Child, SubStruct); } struct SubStruct { mixin GrandChilds!( Root, SubChild, ); } struct SubChild { mixin Parent!SubStruct; }
  • 57. Classless Polymorphism Root Root Child’s fields Root SubStruct’s fields Root SubStruct’s fields SubChild’s fields
  • 58. Classless Polymorphism • Child share the parent’s part of the layout – It is safe to upcast – Done via alias this • Downcast to a leaf: check tag’s value – Cheap – Easy pattern matching • Downcast to substruct: check tag range – Cheap • No typeid pointer chasing
  • 59. Virtualish Dispatch • No virtual table • Get function pointer in a table – One table per method – One entry per leaf type – Using the tag as an index • Used by HHVM for PHP arrays – Creative datastructure – Is a vector/hashmap/set/tuple/whatever…
  • 60. Regular Virtual Dispatch f1 f2 f3 f4 Vtable pointer T1’s data g1 g2 g3 g4 Vtable pointer T2’s data • One vtable per type • Vtable has one entry per method • Load vtable then load function address
  • 61. Virtualish Dispatch f1 g1 h1 i1 Tag T1’s data f2 g2 h2 i2 Tag T2’s data • One vtable per method • Vtable has one entry per type • Load tag then use it as index in per function table
  • 62. Virtualish Dispatch • Usually better locality – Calling the same method on objects of various types more common than calling various method on objects of the same type • Often worked around by sorting by type – Classless get most of the benefit without sorting – Still helps branch prediction • Tables can be generated using reflection in D
  • 63. Classless visitors ! • Regular class hierarchy need to know all method at compile time – Can add types dynamically • Classless hierarchy need to know all types at compile time – Can add method dynamically • Visitor can create a visit method’s table – And use the tag to dispatch • Closed extensibility one way, opened it another way