The document discusses techniques for optimizing memory usage through bit packing and value type polymorphism. It describes:
1. Bit packing techniques like storing multiple values in a single integer using bitwise operations to reduce memory usage. This includes examples of packing booleans and enums.
2. Using a "tagged union" approach to represent different value types polymorphically by storing a type tag and common data in a single value.
3. The concept of "value type polymorphism" where subtypes all fit within a size budget by using a tag to differentiate them while presenting a common API. This allows efficiently representing types in a compiler.
2. Memory is slow
• About 300 cycles to hit memory
• Bandwidth still increasing
• Latency only marginally increasing
3. Memory is slow - Caching
• Add faster memory on CPU.
• Various size and speed
– Signal needs time to travel
– L1: 3-4 cycles, 32kb
• Instruction
• Data
– L2: 8-14 cycles, 256kb
– L3: tens of cycles, few Mb, often shared
– Cache line: 64 bytes
5. The king is throwing a party
He has 1000 bottles
in his cellar
6. An evil man poisoned
a bottle with his
secret recipe with 11
herbs and spices !
• The poison will kill anyone
even in small doses.
• It takes several hours for
someone to die from
poisoning.
• The King has 1000 servants
and 20 prisoners.
• He would like to avoid killing
servants if possible, but
killing prisoners is fine.
• What should the king do ?
7. The answer
• The king can use 10 prisoners.
• Number each bottle in binary
• Each prisoner will drink from multiple bottles
– Prisoner n will drink bottle where the nth digit is 1
• The prisoner ding will give the result in binary.
9. Bit packing
• Reduce memory waste
• Increase cache utilization
• Minimal CPU cost
• Not a replacement for better algorithms
– Instantiating less objects saves a lot of memory !
10. Alignment
• Ensure that load/store do not
– Cross cache line
– Cross pages boundaries
• Unaligned access: severe penalties
– Bad performances on some CPU, loss of atomicity
• Hardware is doing 2 accesses
– Hard error on others (SIGBUS or alike)
• Defined by ABI
11. Alignment – Rule of thumb
• Integral types smaller than size_t
– T.sizeof
• Integral types bigger than size_t
– size_t.sizeof
– Compiler will decompose memory accesses
• Structs
– Max(alignment of each field)
– Add padding to respect alignment
14. Padding tips
• Start with fields with high alignment
• Know where pads are
• Enforce assumptions using static assert
– alignof
– sizeof
• Classes, like structs, but
– Implicit fields
• Vtable
• Monitor
– At least pointer size alignment
15. Information density
• How much actual information ?
• Bool
– 1 bit of information
– 8 bits of storage
• Object
– 45 bits of information
– 64 bits of storage
• Dump memory and zip it
– Aim for that size
16. Bit packing
• Trade memory consumption for CPU
– Usually a good deal
• Use one integral as storage
– Store several elements in that integral
– Use bitwise operations to manipulate elements
• std.bitmanip can help
17. Struct packing
f1
4 bytes, 0 wasted
import std.bitmanip;
struct S {
mixin(bitfield!(
uint, "f1", 30,
bool, "f2", 1,
bool, "f3", 1,
));
}
f2 f3
• f1 is now 30 bits instead of 32 bits
• Now about 1B max
• Fields aren’t atomic anymore
• bitfield does all the magic
19. enum Mask = 1 << N;
@property bool entry() {
return (data & Mask) != 0;
}
@property entry(bool val) {
if (val) {
data = data | Mask;
} else {
data = data & ~Mask;
}
}
Bit packing bools
entry
32 NN + 1 0
Data:
Note: data ^ Mask will flip the bit
It is sometime faster than to set it.
20. Bitfield layout
• 2 special spots
– Rightmost : mask only
– Leftmost : shift only
• Large elements require large mask
– Put them on the left most
• Bools always use masks
– Can be checked in leftmost with signed < 0
– Don’t put them in special spots unless very hot
21. Bitfield layout
• We want :
– One flag
– One 2 bits enum E
– A 29 bits integral
• What is the best layout ?
22. Bitfield layout
enum E { E0, E1, E2, E3 }
struct S {
import std.bitmanip;
mixin(bitfield!(
E, "e", 2,
bool, "flag", 1,
uint, "integral", 29,
));
}
e = cast(E) (data & 0x03);
flag = (data & 0x04) != 0;
integral = data >> 3;
Codegen :
23. Unused bits
• Sometime, the whole bitfield is not needed
– Create a nameless field
• uint, "", 29
– Make it usable for out struct/subclasses
• uint, ”_derived", 29
• Ideally make it private/protected
• Or use in private struct elements
• Need to implement the remaining fields manually
• Feature request: bitfield with explicit storage
24. Unused bits - example
class Symbol : Node {
Name name;
Name mangle;
import std.bitmanip;
mixin(bitfields!(
Step, "step", 2,
Linkage, "linkage", 3,
Visibility, "visibility", 3,
InTemplate, "inTemplate", 1,
bool, "hasThis", 1,
bool, "hasContext", 1,
bool, "isPoisoned", 1,
bool, "isAbstract", 1,
bool, "isProperty", 1,
uint, "derived", 18,
));
}
class Field : Symbol {
// ...
this(..., uint index, ... ) {
// ...
this.derived = index;
// Always true for fields.
this.hasThis = true;
}
@property index() const {
// Only 262 143 fields possible !
return derived;
}
}
25. Tagging pointers - @trusted
• Least significant bits are known to be 0
– How many depends on alignment
– Log2(T.alignof)
– At least 3 bits on Objects (2 on 32 bits systems)
• Once again, std.bitmanip can help
– taggedPointer/taggedClassRef
– Checks alignment constraints at compiler time
– Misaligned pointers are not safe
26. Tagging pointers - @trusted
enum Color { Black, Red }
struct Link(T) {
import std.bitmanip;
mixin(taggedPointer!(
T*, "child",
Color, "color", 1,
));
}
struct Node(T) {
Link!T left;
Link!T right;
}
pointed
child
• Actual pointer points at the object
• Tagged pointer point within the object
• GC knows about interior pointers
27. Tagging pointers - @system
• Allocate in the lower 32bits of address space
– Truncate pointer to 32 bits
– Limited to 4Gb
– Jemalloc can do that for you
– Used by HHVM for codegen
• On X86 most significant 16bits are zeros
– Hijack them !
– Confuse the GC !
– Try to not SEGFAULT
29. Let’s use a context
• Useful for cold but often reused data
• For instance, identifiers in a compiler
– Usually don’t care about the actual value
• Context store identifiers, provide a unique id
– 32 bits vs 128 bits
– Equality can be tested with an int compare
– Can be its own hash for hastable lookups
• Make the GC happy
– less pointers
– More noscan !
30. Let’s use a context
struct Name {
private:
uint id;
this(uint id) {
this.id = id;
}
public:
string toString(const Context c) const {
return c.names[id]
}
immutable(char)* toStringz(const Context c) const {
auto s = toString();
assert(s.ptr[s.length] == '0', "Expected a zero terminated string");
return s.ptr;
}
}
31. class Context {
private:
string[] names;
uint[string] lookups;
public:
auto getName(const(char)[] str) {
if (auto id = str in lookups) {
return Name(*id);
}
// As we are cloning, make sure it is 0 terminated as to pass to C.
import std.string;
auto s = str.toStringz()[0 .. str.length];
auto id = lookups[s] = cast(uint) names.length;
names ~= s;
return Name(id);
}
}
Let’s use a context
32. Context prefill
• Useful to pin some id at compile time
• Can be used without lookup in the context
• Generated identifiers
• object.d
• Linkage/Version/Scope/Attribute
34. Context prefill
auto getLookups() {
uint[string] lookups;
foreach(uint i, id; Names) {
lookups[id] = i;
}
return lookups;
}
enum Lookups = getLookups();
template BuiltinName(
string name,
) {
private enum id = Lookups
.get(name, uint.max);
static assert(
id < uint.max,
name ~ " is not a builtin
name.",
);
enum BuiltinName = Name(id);
}
35. More context !
• Track locations in a compiler
– They are everywhere
• Register file in the context
– Allocate a range of value from N to N + sizeof(file)
– A position for each byte in the file !
• Add a flag for mixin (D) / macros (C++)
– Register expansions in the context.
36. More context !
• Use cases:
– Emit debug infos
– Error messages
• Perfs do not matter for errors
• Access pattern mostly predictable for debug
• Find file/line from location using
– One element cache
– Linear search (8 elements)
– Binary search
37. More context !
File 2 File 3 EmptyFile 1
Mixin 2
Mixin
3
Empty
Mixin
1
0 2B
-2B -1
Context store file boundaries and line position within files
38. More context !
• A position is 31 bits number + a flag
– Up to 2Gb of source code + 2 Gb of macros/mixin
• A pair of positions is a location
– Used for tokens/expressions/symbols/statements
• Lexer only need to bump the position value
for each token by the length of the token
• Strategy used by clang / SDC
40. Tagged reference
• Useful to encapsulate several reference types
• Can provide methods forwarding to elements
– Use reflection to do so
– Avoid vtable lookups/cascaded loads
– No common layout in the referenced object
• Number of elements limited by alignement
– Easy to get up to 8 on X64
• LLVM’s call/invoke
41. Tagged reference
template TagFields(uint i, U...) {
import std.conv;
static if (U.length == 0) {
enum TagFields = "nt" ~ T.stringof ~ " = “
~ to!string(i) ~ ",";
} else {
enum S = U[0].stringof;
static assert(
(S[0] & 0x80) == 0,
S ~ " must not start with an unicode.",
);
static assert(
U[0].sizeof <= size_t.sizeof,
"Elements must be of pointer size or smaller.",
);
import std.ascii;
enum Name = (S == "typeof(null)")
? "Undefined"
: toUpper(S[0]) ~ S[1 .. $];
enum TagFields = "nt" ~ Name ~ " = "
~ to!string(i) ~ "," ~ TagFields!(i + 1, U[1 .. $]);
}
}
mixin("enum Tag {" ~ TagFields!(0, U) ~ "n}");
import std.traits;
alias Tags = EnumMembers!Tag;
import std.typetuple;
alias TagTuple = TypeTuple!(uint, "tag", EnumSize!Tag);
42. Tagged reference
struct TaggedRef(U...) {
private:
import std.bitmanip;
mixin(taggedPointer!(
void*, "ptr", TagTuple));
public:
auto get(Tag E)() in {
assert(tag == E);
} body {
static union Helper {
void* __ptr;
U u;
}
return Helper(ptr).u[E];
}
template opDispatch(string s, T...) {
auto opDispatch(A...)(A args) {
final switch(tag) {
foreach(T; Tags) {
case T:
auto r = get!T();
return mixin("r." ~ s)(args);
}
}
}
}
}
43. Value Type Polymorphism
• All subtypes fit under a given size budget
• A tag is used to differentiate them
• The whole thing is wrapped in an nice API
• Being able to hide atrocities behind a nice
façade, that’s the power of D
• Example: Representing D types
46. Value Type Polymorphism
• A type is a TypeDescriptor + an indirection field
• Data depend on the kind
– If it doesn’t fit, use indirection field
• There are many type kind:
– Builtin
– Struct
– Class
– Alias
– Function
– …
• Common API switch on kind to do the right thing
47. Value Type Polymorphism
data Qualifier Kind
Indirection
• 128 bits budget
• Indirection is used when
• The type need extra space (Function)
• The type need to refers to a symbol (Aggregate, Alias)
• Otherwise null
• Replaced the type class hierarchy advantageously
• Significant memory consumption reduction
• Significantly faster runtime (about 20%)
48. Value Type Polymorphism
• You can nest, effectively creating hierarcies
• For instance, Identifiable is
– A type
– An expression
– A symbol
• More packing !
49. Value Type Polymorphism
data Qualifier Kind
Indirection/Expression/Symbol
Tag
• Tag is used to discriminate between
• Type
• Expression
• Symbol
• Tag is zeroed out to find the type
• Saved 70 Mb (!) of template bloat in SDC
50. Value Type Polymorphism
import d.semantic.identifier;
Identifiable i = ...;
i.apply!(delegate Expression(identified) {
alias T = typeof(identified);
static if (is(T : Expression)) {
return identified;
} else {
return getError(
identified,
location,
t.name.toString(pass.context) ~ " isn't callable",
);
}
})();
52. Value Type - ABI
• Struct up to 2 fields
– Up to pointer sized
– Slice !
– No float/integral mixing
• Common anti pattern 2 pointers + a bool
– std.bigint.BigInt is a slice + a bool
– Passed in memory instead of registers
• More than one pointer tends to use 2
– Use either 1 or 2 pointer sized struct
54. Classless Polymorphism
• Create a base struct
• All substruct use it as first field
• Contains a tag describing the type
– The tag can be part of a bitfield
• Use mixin in all substruct
– Include static assert to check this is done right
– Alias this the base
55. Classless Polymorphism
• Each leaf of the hierarchy has a tag value
• Each non leaf has a range of tag value
• The root match all values
• The hierarchy must be know at compile time
• Use a bunch of mixin templates
– Add the boilerplate
– A ton of static asserts
58. Classless Polymorphism
• Child share the parent’s part of the layout
– It is safe to upcast
– Done via alias this
• Downcast to a leaf: check tag’s value
– Cheap
– Easy pattern matching
• Downcast to substruct: check tag range
– Cheap
• No typeid pointer chasing
59. Virtualish Dispatch
• No virtual table
• Get function pointer in a table
– One table per method
– One entry per leaf type
– Using the tag as an index
• Used by HHVM for PHP arrays
– Creative datastructure
– Is a vector/hashmap/set/tuple/whatever…
60. Regular Virtual Dispatch
f1 f2 f3 f4
Vtable
pointer
T1’s data
g1 g2 g3 g4
Vtable
pointer
T2’s data
• One vtable per type
• Vtable has one entry per method
• Load vtable then load function address
61. Virtualish Dispatch
f1 g1 h1 i1
Tag T1’s data
f2 g2 h2 i2
Tag T2’s data
• One vtable per method
• Vtable has one entry per type
• Load tag then use it as index in per function table
62. Virtualish Dispatch
• Usually better locality
– Calling the same method on objects of various
types more common than calling various method
on objects of the same type
• Often worked around by sorting by type
– Classless get most of the benefit without sorting
– Still helps branch prediction
• Tables can be generated using reflection in D
63. Classless visitors !
• Regular class hierarchy need to know all
method at compile time
– Can add types dynamically
• Classless hierarchy need to know all types at
compile time
– Can add method dynamically
• Visitor can create a visit method’s table
– And use the tag to dispatch
• Closed extensibility one way, opened it
another way