SlideShare a Scribd company logo
1 of 48
Download to read offline
Optimizing array-based 
data structures 
to the limit 
Roman Leventov 
Higher Frequency Trading Ltd. 
leventov@ya.ru 
August 28, 2014
Overview 
Indexing 
Encoding of distinct entry states 
Object data 
Primitive data 
Layout of tuples of primitives
Benchmarking environments 
1. AMD K10 (2007), 
L1 cache: 128 KB, L2: 512 KB, L3: 6 MB 
2. Intel Sandy Bridge (2011), 
L1: 64 KB, L2: 256 KB, L3: 20 MB 
3. Intel Haswell (2013), 
L1: 64 KB, L2: 256 KB, L3: 3 MB 
64-bit Java 1.8.0-b129–8u20 
JMH ??–0.9.8 
If not specified, measurements are in CPU clock 
cycles per operation or loop iteration.
Section 1 
Indexing
Indexing 
Simple 
int e = a[i]; 
vs. 
Unsafe 
long off = 
(( long ) i) << INT_SCALE_SHIFT ; 
int e = U. getInt (a, INT_BASE + off );
Whyever unsafe indexing? 
HotSpot JIT doesn’t eliminate bound checks 
as perfectly as you probably think.
Whyever simple indexing? 
In performance-critical code 
Simple 
; cmp r8d , ebx 
; jae <IOOBE location > 
mov r11 , [r9 + r8 *4 + 16] 
Unsafe 
mov r10 , r8 
shl r10 , 2 
mov r11 , [r9 + r10 + 16] 
%r9—a; %r8—i 
16—INT_BASE: object header (12 bytes) + 
array length field (4 bytes)
Iteration over parallel arrays 
Indexing case #1 
@Benchmark 
public int _2_simple ( State st) { 
int [] xs = st.xs , ys = st.ys; 
int dummy = 0; 
for (int i = xs. length ; i --> 0;) 
dummy ^= xs[i] + ys[i]; 
return dummy ; 
} 
Bound checks are fully eliminated!
Iteration over parallel arrays 
Indexing case #1 
@Benchmark 
public int _2_unsafe ( State st) { 
int [] xs = st.xs , ys = st.ys; 
int dummy = 0; 
long off = xs. length * INT_SCALE ; 
while (( off -= INT_SCALE ) >= 0) 
dummy ^= 
U. getInt (xs , INT_BASE + off) + 
U. getInt (ys , INT_BASE + off ); 
return dummy ; 
}
Iteration over parallel arrays 
Indexing case #1 
# of arrays 1 2 3 4 
SB 
Simple 0.78 1.3 2.2 3.4 
Unsafe 1.6 1.8 2.5 3.2 
HW 
Simple 1.2 2.1 3.3 4.9 
Unsafe 2.1 2.6 3.2 4.3 
K10 
Simple 1.6 5.8 13.1 19.5 
Unsafe 2.9 6.4 11.8 17.1 
Unsafe indexing is slower when there is a single 
or 2-3 parallel arrays because of an odd instruction 
in the tight loop. JIT compiler fault?
Binary heap 
Indexing case #2
Binary heap 
Indexing case #2 
int leftChildI = parentI * 2 + 1; 
int rightChildI = leftChildI + 1; 
long leftChildOff = 
parentOff * 2 + INT_SCALE ; 
long rightChildOff = 
leftChildOff + INT_SCALE ;
Binary heap sort 
Indexing case #2 
Heapsort version with unsafe indexing is faster 
by 12–13% on 4 KB array and by 7–10% on 4 MB 
array. 
With simple indexing lower bound checks 
are eliminated, but upper mostly aren’t.
Linear hash 
Indexing case #3 
def any_lhash_op (key[, ...]): 
i = hash (key) % table_size 
while True : 
if is_empty_slot (i): ... 
if key_at (i) == key: ... 
i = (i + 1) % table_size 
First access is random, then sequential. 
Table size is a power of 2, therefore bitwise 
masking & (table_size - 1) is used 
instead of modulo.
Quadratic hash 
Indexing case #3 
def any_qhash_op (key[, ...]): 
i = hash (key) % table_size 
step = 0 
while True : 
if is_empty_slot (i): ... 
if key_at (i) == key: ... 
step += 1 
i = (i + step ) % table_size 
Random, then local, then non-local access. 
Two-way modification of this algorithm is tested, 
in which table size isn’t a power of 2: one integral 
division per op.
Double hash 
Indexing case #3 
def any_dhash_op (key[, ...]): 
i = hash (key) % table_size 
step = hash2 (key) 
while True : 
if is_empty_slot (i): ... 
if key_at (i) == key: ... 
i = (i + step ) % table_size 
Random access. 
Table size isn’t a power of 2, one or two 
(on collisions) integral divisions per op.
Composite hash benchmark 
Indexing case #3 
load factor 0.3 0.6 0.9 
L. 
SB 1:9  1:0 1:7  1:0 2:1  1:1 
HW 5:5  1:3 4:9  0:7 4:3  0:9 
K10 10:3  0:5 8:2  0:2 1:6  0:7 
Q. 
SB 0:2  1:9 2:0  1:8 0:9  1:9 
HW 2:3  2:3 2:7  1:4 0:3  1:5 
K10 1:6  0:5 0:5  0:2 5:6  0:3 
D. 
SB 11:5  2:5 15:2  1:1 23:7  1:3 
HW 9:9  2:3 13:5  1:2 26:2  1:0 
K10 4:3  0:2 9:4  0:1 17:6  0:4 
Relative diff of unsafe indexing time to simple, 
in percent.
Indexing: bottom line 
Unsafe indexing is worth considering in the hottest 
methods. Tried to avoid this, but: measure don’t 
guess. 
Was not investigated: 
I Performance of unsafe indexing on 32-bit VMs 
and CPUs, all results should be rechecked. 
I Interference of unsafe indexing with loop 
unrolling and vectorization.
Section 2 
Encoding of distinct 
entry states
Use-cases of entry states 
Full state + data, or empty state: 
I Open hash table implementations 
(taken/empty slots) 
I Nullable non-object data in the subject 
domain 
I Lists or queues with half-lazy in-place 
filtering 
Collections of tuples of primitive/object and 
boolean (or binary state).
Object data 
Obvious: null in slots of empty state, domain 
objects in full slots. 
But what if domain objects are nullable 
themselves?
What if nullable Object data? 
Special empty object 
static final Object EMPTY_SLOT = 
new Object (); 
Domain nulls - as is. 
Masking domain nulls 
static final Object NULL_MASK = 
new Object (); 
... 
Object maskedData = data != null ? 
data : NULL_MASK ; 
null in slots of empty state.
What if nullable Object data? 
The rule: null should be more frequently stored in 
memory or compared to other objects, than the 
special object. Often the right option for both goals 
is the same.
Why store nulls 
Nullable Object data + states 
Don’t forget about amortized costs of storing 
Objects rather than nulls. At least one extra 
dereference and check per each location during 
garbage collection. 
Array shouldn’t be filled with nulls after 
initialization.
Why compare to null 
Nullable Object data + states 
Explicit null checks are almost always costless, 
merged with VM-generated ones (to throw NPE). 
In the rest cases comparison to null is still 
cheaper than to the special object, because 
I null shouldn’t be read from anyware 
in advance 
I Checks against zero are featured on x86
And what if nullable Object data? 
In hash tables, domain null (at most one!) should 
be masked, empty slots should be filled with 
nulls. But the implementation is harder, than with 
special empty object. 
Got it right: java.util.IdentityHashMap. 
Got it wrong: almost all other open hash 
implementations.
Primitive data 
No natural way to express nullabulity. Even no 
natural word :) 
Arrays of boxed primitives
Separate byte state 
Primitive data + states 
boolean[] or byte[] and data arrays in parallel: 
if ( used [i]) 
doSomething ( data [i ])); 
The easiest to implement.
Separate bit state 
Primitive data + states 
Hand-written bit set and data arrays in parallel: 
long word = bitWords [i  6]; 
if (( word  (1  i)) != 0) 
doSomething ( data [i ]));
Advantages of separate bit state 
Primitive data + states 
Almost no additional memory is used. 
Sequential state checks often doesn’t requere 
memory reads (until the word is exausted). 
Iteration could employ very cheap 
numberOfLeading(Trailing)Zeros intrinsic. 
Intel: Haswell+ 
AMD: Leading—K10+, Trailing—Piledriver+
Disadvantages of separate bit state 
Primitive data + states 
Only for binary state. 
On pure random access, no advantage over byte 
states except memory usage, just perform extra 
work to extract bits. 
Relatively tricky to implement. 
(java.util.BitSet—no way.)
Special value as a state 
Primitive data + states 
long d = data [i]; 
if (d != EMPTY ) 
doSomething (d); 
Suitable only when there is a full state and one or 
several empty states.
Advantages of special values 
Primitive data + states 
Zero memory overhead. 
All entry data could reside the single memory 
location: 
I less memory reads are required 
I Cache-friendly 
I Possibility of atomic updates
Special value management 
Primitive data + states 
When data domain is bounded, special values 
is a clear winner for enconding states, just pick up 
a constant out of the data domain, preferably 0, 
as a special value.
Special value management 
Primitive data + states 
However, if the data domain is unbounded, 
a number of disadvantages of special values 
as states appear: 
I Special value should be stored within the data 
structure and being read on each query. 
I Comparison to non-constant is slower, 
especially than comparison to zero. 
I On collision, special value should be 
replaced, that is impossible without locking, 
if the data structure should be thread-safe, 
or if it is offline in any meaning. 
I Implementation become more complicated.
Zero value as a state 
Primitive data + states 
An attempt to resolve one of the dynamic special 
values problems - data is compared to zero, and 
when zero is passed as a data itself, it is masked 
with another value: 
if ( data == zeroMask ) changeZeroMask (); 
data = data == 0 ? zeroMask : data ; 
... 
long d = data [i]; 
if (d != 0) 
doSomething (d); 
But now data should be masked/unmasked all the 
time and impelementation is getting even more 
complicated.
Byte along state 
Primitive data + states 
Like separate byte state, but more memory-local: 
On the other hand: 
I Only unsafe access (see section 1) 
I Tiring to implement 
I Cross cache line bounray memory IO, which 
1) has penalty on many CPUs, 2) is not 
atomic, out-of-the-air values could appear, if 
the data structure is not synchronized, or IO 
performed only via CAS ops (Nitsan Wakart).
Benchmarking LHash queries, 
random queries 
Primitive data + states 
All the hash data is in L1: 
I Load factors 0.3-0.6: typically byte states win 
I Load factor 0.9: bit states win, sometimes 
special values 
Big hashes (don’t fit caches): 
I Successful queries: special values win 
I Unsuccessful queries, including insertions: 
bit states win 
I Byte along states outperform simple byte 
states 
Zero states (with replacement) is never an option.
Benchmarking LHash queries, 
iteration 
Primitive data + states 
Internal iteration (forEach): special values win. 
External iteration (iterators): byte states win. 
But on Haswell and K10, of cause, bit states beat 
them all. 
Byte along states and zero states with 
replacement always lose.
Enconding of distinct entry states: 
bottom line 
Object[] arrays: more nulls. 
Primitive arrays: special values as states, when 
applicable. Bit states for iteration on Haswell+ and 
K10+.
Section 3 
Layout of tuples of primitives
Layout of tuples of primitives 
When random access is needed, we always strive 
for memory locality.
Two fields of the same length 
Layout of tuples of primitives 
byte+byte, char+short, long+double 
(longBitsToDouble() is a no-op). 
For up to 8 bytes, use arrays of the longer 
primitive, ex. long[] for int+int tuples. 
I Guarantees the tuple lies on the same cache 
line. 
I Allows to approach Java array size limits 
closer.
One field is two times longer than 
another 
Layout of tuples of primitives 
byte+short, int+double, ... 
If cross cache line boundary IO is not an option, 
use the following layout: 
Reqires to access individual fields via Unsafe.
One field is 4-8 times longer than 
another 
Layout of tuples of primitives 
If cross cache line boundary IO is not an option, 
the only reasonable approach is: 
k1 , long , 8 bytes 
k2 , long , 8 bytes 
k3 , long , 8 bytes 
v1 , short , 2 bytes 
v2 , short , 2 bytes 
v3 , short , 2 bytes 
2 bytes gap 
k4 , long , 8 bytes 
...
One field is 4-8 times longer than 
another 
Layout of tuples of primitives 
Fields of the same tuple will anyway lie on different 
cache lines with some probability. 
Indexing: 
long kOff = (i / 3) * 32L + 
(i % 3) * 8; 
long vOff = kOff + 24; 
Integral division :(
Integral division by small constant 
— Maybe this will help? (see Hacker’s Delight) 
long quot = (i * 0 x55555556L )  32; 
long rem = i - quot * 3; 
long kOff = quot * 32 + rem * 8; 
long vOff = kOff + 24; 
— No, it won’t, because we need to obtain 
reminder as well as quotient.
The End

More Related Content

What's hot

C++ Standard Template Library
C++ Standard Template LibraryC++ Standard Template Library
C++ Standard Template LibraryIlio Catallo
 
An Introduction to Part of C++ STL
An Introduction to Part of C++ STLAn Introduction to Part of C++ STL
An Introduction to Part of C++ STL乐群 陈
 
19. Java data structures algorithms and complexity
19. Java data structures algorithms and complexity19. Java data structures algorithms and complexity
19. Java data structures algorithms and complexityIntro C# Book
 
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)PROIDEA
 
18. Dictionaries, Hash-Tables and Set
18. Dictionaries, Hash-Tables and Set18. Dictionaries, Hash-Tables and Set
18. Dictionaries, Hash-Tables and SetIntro C# Book
 
Vector class in C++
Vector class in C++Vector class in C++
Vector class in C++Jawad Khan
 
13. Java text processing
13.  Java text processing13.  Java text processing
13. Java text processingIntro C# Book
 
Collection frame work
Collection  frame workCollection  frame work
Collection frame workRahul Kolluri
 
Memory Management C++ (Peeling operator new() and delete())
Memory Management C++ (Peeling operator new() and delete())Memory Management C++ (Peeling operator new() and delete())
Memory Management C++ (Peeling operator new() and delete())Sameer Rathoud
 
16. Arrays Lists Stacks Queues
16. Arrays Lists Stacks Queues16. Arrays Lists Stacks Queues
16. Arrays Lists Stacks QueuesIntro C# Book
 
Advanced data structures slide 2 2+
Advanced data structures slide 2 2+Advanced data structures slide 2 2+
Advanced data structures slide 2 2+jomerson remorosa
 
18. Java associative arrays
18. Java associative arrays18. Java associative arrays
18. Java associative arraysIntro C# Book
 
Mementopython3 english
Mementopython3 englishMementopython3 english
Mementopython3 englishssuser442080
 
Building a website in Haskell coming from Node.js
Building a website in Haskell coming from Node.jsBuilding a website in Haskell coming from Node.js
Building a website in Haskell coming from Node.jsNicolas Hery
 

What's hot (20)

C++ Standard Template Library
C++ Standard Template LibraryC++ Standard Template Library
C++ Standard Template Library
 
Smart Pointers
Smart PointersSmart Pointers
Smart Pointers
 
An Introduction to Part of C++ STL
An Introduction to Part of C++ STLAn Introduction to Part of C++ STL
An Introduction to Part of C++ STL
 
19. Java data structures algorithms and complexity
19. Java data structures algorithms and complexity19. Java data structures algorithms and complexity
19. Java data structures algorithms and complexity
 
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
 
18. Dictionaries, Hash-Tables and Set
18. Dictionaries, Hash-Tables and Set18. Dictionaries, Hash-Tables and Set
18. Dictionaries, Hash-Tables and Set
 
Vector class in C++
Vector class in C++Vector class in C++
Vector class in C++
 
Memory Management In C++
Memory Management In C++Memory Management In C++
Memory Management In C++
 
13. Java text processing
13.  Java text processing13.  Java text processing
13. Java text processing
 
Porting to Python 3
Porting to Python 3Porting to Python 3
Porting to Python 3
 
glTF 2.0 Reference Guide
glTF 2.0 Reference GuideglTF 2.0 Reference Guide
glTF 2.0 Reference Guide
 
07. Arrays
07. Arrays07. Arrays
07. Arrays
 
Collection frame work
Collection  frame workCollection  frame work
Collection frame work
 
Memory Management C++ (Peeling operator new() and delete())
Memory Management C++ (Peeling operator new() and delete())Memory Management C++ (Peeling operator new() and delete())
Memory Management C++ (Peeling operator new() and delete())
 
16. Arrays Lists Stacks Queues
16. Arrays Lists Stacks Queues16. Arrays Lists Stacks Queues
16. Arrays Lists Stacks Queues
 
Advanced data structures slide 2 2+
Advanced data structures slide 2 2+Advanced data structures slide 2 2+
Advanced data structures slide 2 2+
 
Introduction to Julia Language
Introduction to Julia LanguageIntroduction to Julia Language
Introduction to Julia Language
 
18. Java associative arrays
18. Java associative arrays18. Java associative arrays
18. Java associative arrays
 
Mementopython3 english
Mementopython3 englishMementopython3 english
Mementopython3 english
 
Building a website in Haskell coming from Node.js
Building a website in Haskell coming from Node.jsBuilding a website in Haskell coming from Node.js
Building a website in Haskell coming from Node.js
 

Viewers also liked

DeltaV Electronic Marshalling
DeltaV Electronic MarshallingDeltaV Electronic Marshalling
DeltaV Electronic MarshallingSumeet Goel
 
10 ways to nurture your spiritual life
10 ways to nurture your spiritual life10 ways to nurture your spiritual life
10 ways to nurture your spiritual lifeBASKARAN P
 
Rubanomics - Corporate Presentation
Rubanomics - Corporate PresentationRubanomics - Corporate Presentation
Rubanomics - Corporate PresentationRheetam Mitra
 
Private Sector Leads Virgin Islands to Solar
Private Sector Leads Virgin Islands to SolarPrivate Sector Leads Virgin Islands to Solar
Private Sector Leads Virgin Islands to SolarDon Buchanan
 
Dispositivos de entrada y salida presentación
Dispositivos de entrada y salida presentaciónDispositivos de entrada y salida presentación
Dispositivos de entrada y salida presentaciónKattia Rodriguez
 
Probak egiten ari naiz
Probak egiten ari naizProbak egiten ari naiz
Probak egiten ari naizhelenaaldaz
 
Очаковский ЖБИ каталог
Очаковский ЖБИ каталогОчаковский ЖБИ каталог
Очаковский ЖБИ каталогAl Maks
 
Continued Operation Tecnology
Continued Operation TecnologyContinued Operation Tecnology
Continued Operation Tecnologytesla_eng
 
賈伯斯與禪
賈伯斯與禪賈伯斯與禪
賈伯斯與禪rita710
 
Прайс лист Очаковского ЖБИ в Рязани
Прайс лист Очаковского ЖБИ в РязаниПрайс лист Очаковского ЖБИ в Рязани
Прайс лист Очаковского ЖБИ в РязаниAl Maks
 
Uploading resources to_mbc
Uploading resources to_mbcUploading resources to_mbc
Uploading resources to_mbcMaryAnn Medved
 
Урожай – Витязь
Урожай – ВитязьУрожай – Витязь
Урожай – ВитязьAl Maks
 
Pinterest
PinterestPinterest
Pinterestcmhagc
 
Gasteizko irteera 2D THOR
Gasteizko irteera 2D THORGasteizko irteera 2D THOR
Gasteizko irteera 2D THORarbelar
 

Viewers also liked (18)

DeltaV Electronic Marshalling
DeltaV Electronic MarshallingDeltaV Electronic Marshalling
DeltaV Electronic Marshalling
 
10 ways to nurture your spiritual life
10 ways to nurture your spiritual life10 ways to nurture your spiritual life
10 ways to nurture your spiritual life
 
Vizitka navros v_n
Vizitka navros v_nVizitka navros v_n
Vizitka navros v_n
 
Rubanomics - Corporate Presentation
Rubanomics - Corporate PresentationRubanomics - Corporate Presentation
Rubanomics - Corporate Presentation
 
Private Sector Leads Virgin Islands to Solar
Private Sector Leads Virgin Islands to SolarPrivate Sector Leads Virgin Islands to Solar
Private Sector Leads Virgin Islands to Solar
 
Security in Cloud-based Cyber-physical Systems
Security in Cloud-based Cyber-physical SystemsSecurity in Cloud-based Cyber-physical Systems
Security in Cloud-based Cyber-physical Systems
 
Dispositivos de entrada y salida presentación
Dispositivos de entrada y salida presentaciónDispositivos de entrada y salida presentación
Dispositivos de entrada y salida presentación
 
Probak egiten ari naiz
Probak egiten ari naizProbak egiten ari naiz
Probak egiten ari naiz
 
My life
My lifeMy life
My life
 
Очаковский ЖБИ каталог
Очаковский ЖБИ каталогОчаковский ЖБИ каталог
Очаковский ЖБИ каталог
 
Continued Operation Tecnology
Continued Operation TecnologyContinued Operation Tecnology
Continued Operation Tecnology
 
賈伯斯與禪
賈伯斯與禪賈伯斯與禪
賈伯斯與禪
 
Прайс лист Очаковского ЖБИ в Рязани
Прайс лист Очаковского ЖБИ в РязаниПрайс лист Очаковского ЖБИ в Рязани
Прайс лист Очаковского ЖБИ в Рязани
 
Stepway
StepwayStepway
Stepway
 
Uploading resources to_mbc
Uploading resources to_mbcUploading resources to_mbc
Uploading resources to_mbc
 
Урожай – Витязь
Урожай – ВитязьУрожай – Витязь
Урожай – Витязь
 
Pinterest
PinterestPinterest
Pinterest
 
Gasteizko irteera 2D THOR
Gasteizko irteera 2D THORGasteizko irteera 2D THOR
Gasteizko irteera 2D THOR
 

Similar to Optimizing array-based data structures to the limit

Unit I Advanced Java Programming Course
Unit I   Advanced Java Programming CourseUnit I   Advanced Java Programming Course
Unit I Advanced Java Programming Courseparveen837153
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory OptimizationWei Lin
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimizationguest3eed30
 
PPU Optimisation Lesson
PPU Optimisation LessonPPU Optimisation Lesson
PPU Optimisation Lessonslantsixgames
 
Advance data structure & algorithm
Advance data structure & algorithmAdvance data structure & algorithm
Advance data structure & algorithmK Hari Shankar
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)RichardWarburton
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonJAXLondon2014
 
Understanding Javascript Engines
Understanding Javascript Engines Understanding Javascript Engines
Understanding Javascript Engines Parashuram N
 
10 -bits_and_bytes
10  -bits_and_bytes10  -bits_and_bytes
10 -bits_and_bytesHector Garzo
 
Haskell for data science
Haskell for data scienceHaskell for data science
Haskell for data scienceJohn Cant
 
Java basic tutorial by sanjeevini india
Java basic tutorial by sanjeevini indiaJava basic tutorial by sanjeevini india
Java basic tutorial by sanjeevini indiasanjeeviniindia1186
 
Java basic tutorial by sanjeevini india
Java basic tutorial by sanjeevini indiaJava basic tutorial by sanjeevini india
Java basic tutorial by sanjeevini indiaSanjeev Tripathi
 
computer notes - Data Structures - 35
computer notes - Data Structures - 35computer notes - Data Structures - 35
computer notes - Data Structures - 35ecomputernotes
 
Java Pitfalls and Good-to-Knows
Java Pitfalls and Good-to-KnowsJava Pitfalls and Good-to-Knows
Java Pitfalls and Good-to-KnowsMiquel Martin
 
Computer notes - Hashing
Computer notes - HashingComputer notes - Hashing
Computer notes - Hashingecomputernotes
 

Similar to Optimizing array-based data structures to the limit (20)

Lockless
LocklessLockless
Lockless
 
Unit I Advanced Java Programming Course
Unit I   Advanced Java Programming CourseUnit I   Advanced Java Programming Course
Unit I Advanced Java Programming Course
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimization
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimization
 
PPU Optimisation Lesson
PPU Optimisation LessonPPU Optimisation Lesson
PPU Optimisation Lesson
 
Advance data structure & algorithm
Advance data structure & algorithmAdvance data structure & algorithm
Advance data structure & algorithm
 
linkedlist.pptx
linkedlist.pptxlinkedlist.pptx
linkedlist.pptx
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
 
Understanding Javascript Engines
Understanding Javascript Engines Understanding Javascript Engines
Understanding Javascript Engines
 
Memory model
Memory modelMemory model
Memory model
 
Failure Of DEP And ASLR
Failure Of DEP And ASLRFailure Of DEP And ASLR
Failure Of DEP And ASLR
 
10 -bits_and_bytes
10  -bits_and_bytes10  -bits_and_bytes
10 -bits_and_bytes
 
Haskell for data science
Haskell for data scienceHaskell for data science
Haskell for data science
 
Java basic tutorial by sanjeevini india
Java basic tutorial by sanjeevini indiaJava basic tutorial by sanjeevini india
Java basic tutorial by sanjeevini india
 
Java basic tutorial by sanjeevini india
Java basic tutorial by sanjeevini indiaJava basic tutorial by sanjeevini india
Java basic tutorial by sanjeevini india
 
computer notes - Data Structures - 35
computer notes - Data Structures - 35computer notes - Data Structures - 35
computer notes - Data Structures - 35
 
Java Pitfalls and Good-to-Knows
Java Pitfalls and Good-to-KnowsJava Pitfalls and Good-to-Knows
Java Pitfalls and Good-to-Knows
 
Session2
Session2Session2
Session2
 
Computer notes - Hashing
Computer notes - HashingComputer notes - Hashing
Computer notes - Hashing
 

Recently uploaded

Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profileakrivarotava
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 

Recently uploaded (20)

Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profile
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 

Optimizing array-based data structures to the limit

  • 1. Optimizing array-based data structures to the limit Roman Leventov Higher Frequency Trading Ltd. leventov@ya.ru August 28, 2014
  • 2. Overview Indexing Encoding of distinct entry states Object data Primitive data Layout of tuples of primitives
  • 3. Benchmarking environments 1. AMD K10 (2007), L1 cache: 128 KB, L2: 512 KB, L3: 6 MB 2. Intel Sandy Bridge (2011), L1: 64 KB, L2: 256 KB, L3: 20 MB 3. Intel Haswell (2013), L1: 64 KB, L2: 256 KB, L3: 3 MB 64-bit Java 1.8.0-b129–8u20 JMH ??–0.9.8 If not specified, measurements are in CPU clock cycles per operation or loop iteration.
  • 5. Indexing Simple int e = a[i]; vs. Unsafe long off = (( long ) i) << INT_SCALE_SHIFT ; int e = U. getInt (a, INT_BASE + off );
  • 6. Whyever unsafe indexing? HotSpot JIT doesn’t eliminate bound checks as perfectly as you probably think.
  • 7. Whyever simple indexing? In performance-critical code Simple ; cmp r8d , ebx ; jae <IOOBE location > mov r11 , [r9 + r8 *4 + 16] Unsafe mov r10 , r8 shl r10 , 2 mov r11 , [r9 + r10 + 16] %r9—a; %r8—i 16—INT_BASE: object header (12 bytes) + array length field (4 bytes)
  • 8. Iteration over parallel arrays Indexing case #1 @Benchmark public int _2_simple ( State st) { int [] xs = st.xs , ys = st.ys; int dummy = 0; for (int i = xs. length ; i --> 0;) dummy ^= xs[i] + ys[i]; return dummy ; } Bound checks are fully eliminated!
  • 9. Iteration over parallel arrays Indexing case #1 @Benchmark public int _2_unsafe ( State st) { int [] xs = st.xs , ys = st.ys; int dummy = 0; long off = xs. length * INT_SCALE ; while (( off -= INT_SCALE ) >= 0) dummy ^= U. getInt (xs , INT_BASE + off) + U. getInt (ys , INT_BASE + off ); return dummy ; }
  • 10. Iteration over parallel arrays Indexing case #1 # of arrays 1 2 3 4 SB Simple 0.78 1.3 2.2 3.4 Unsafe 1.6 1.8 2.5 3.2 HW Simple 1.2 2.1 3.3 4.9 Unsafe 2.1 2.6 3.2 4.3 K10 Simple 1.6 5.8 13.1 19.5 Unsafe 2.9 6.4 11.8 17.1 Unsafe indexing is slower when there is a single or 2-3 parallel arrays because of an odd instruction in the tight loop. JIT compiler fault?
  • 12. Binary heap Indexing case #2 int leftChildI = parentI * 2 + 1; int rightChildI = leftChildI + 1; long leftChildOff = parentOff * 2 + INT_SCALE ; long rightChildOff = leftChildOff + INT_SCALE ;
  • 13. Binary heap sort Indexing case #2 Heapsort version with unsafe indexing is faster by 12–13% on 4 KB array and by 7–10% on 4 MB array. With simple indexing lower bound checks are eliminated, but upper mostly aren’t.
  • 14. Linear hash Indexing case #3 def any_lhash_op (key[, ...]): i = hash (key) % table_size while True : if is_empty_slot (i): ... if key_at (i) == key: ... i = (i + 1) % table_size First access is random, then sequential. Table size is a power of 2, therefore bitwise masking & (table_size - 1) is used instead of modulo.
  • 15. Quadratic hash Indexing case #3 def any_qhash_op (key[, ...]): i = hash (key) % table_size step = 0 while True : if is_empty_slot (i): ... if key_at (i) == key: ... step += 1 i = (i + step ) % table_size Random, then local, then non-local access. Two-way modification of this algorithm is tested, in which table size isn’t a power of 2: one integral division per op.
  • 16. Double hash Indexing case #3 def any_dhash_op (key[, ...]): i = hash (key) % table_size step = hash2 (key) while True : if is_empty_slot (i): ... if key_at (i) == key: ... i = (i + step ) % table_size Random access. Table size isn’t a power of 2, one or two (on collisions) integral divisions per op.
  • 17. Composite hash benchmark Indexing case #3 load factor 0.3 0.6 0.9 L. SB 1:9 1:0 1:7 1:0 2:1 1:1 HW 5:5 1:3 4:9 0:7 4:3 0:9 K10 10:3 0:5 8:2 0:2 1:6 0:7 Q. SB 0:2 1:9 2:0 1:8 0:9 1:9 HW 2:3 2:3 2:7 1:4 0:3 1:5 K10 1:6 0:5 0:5 0:2 5:6 0:3 D. SB 11:5 2:5 15:2 1:1 23:7 1:3 HW 9:9 2:3 13:5 1:2 26:2 1:0 K10 4:3 0:2 9:4 0:1 17:6 0:4 Relative diff of unsafe indexing time to simple, in percent.
  • 18. Indexing: bottom line Unsafe indexing is worth considering in the hottest methods. Tried to avoid this, but: measure don’t guess. Was not investigated: I Performance of unsafe indexing on 32-bit VMs and CPUs, all results should be rechecked. I Interference of unsafe indexing with loop unrolling and vectorization.
  • 19. Section 2 Encoding of distinct entry states
  • 20. Use-cases of entry states Full state + data, or empty state: I Open hash table implementations (taken/empty slots) I Nullable non-object data in the subject domain I Lists or queues with half-lazy in-place filtering Collections of tuples of primitive/object and boolean (or binary state).
  • 21. Object data Obvious: null in slots of empty state, domain objects in full slots. But what if domain objects are nullable themselves?
  • 22. What if nullable Object data? Special empty object static final Object EMPTY_SLOT = new Object (); Domain nulls - as is. Masking domain nulls static final Object NULL_MASK = new Object (); ... Object maskedData = data != null ? data : NULL_MASK ; null in slots of empty state.
  • 23. What if nullable Object data? The rule: null should be more frequently stored in memory or compared to other objects, than the special object. Often the right option for both goals is the same.
  • 24. Why store nulls Nullable Object data + states Don’t forget about amortized costs of storing Objects rather than nulls. At least one extra dereference and check per each location during garbage collection. Array shouldn’t be filled with nulls after initialization.
  • 25. Why compare to null Nullable Object data + states Explicit null checks are almost always costless, merged with VM-generated ones (to throw NPE). In the rest cases comparison to null is still cheaper than to the special object, because I null shouldn’t be read from anyware in advance I Checks against zero are featured on x86
  • 26. And what if nullable Object data? In hash tables, domain null (at most one!) should be masked, empty slots should be filled with nulls. But the implementation is harder, than with special empty object. Got it right: java.util.IdentityHashMap. Got it wrong: almost all other open hash implementations.
  • 27. Primitive data No natural way to express nullabulity. Even no natural word :) Arrays of boxed primitives
  • 28. Separate byte state Primitive data + states boolean[] or byte[] and data arrays in parallel: if ( used [i]) doSomething ( data [i ])); The easiest to implement.
  • 29. Separate bit state Primitive data + states Hand-written bit set and data arrays in parallel: long word = bitWords [i 6]; if (( word (1 i)) != 0) doSomething ( data [i ]));
  • 30. Advantages of separate bit state Primitive data + states Almost no additional memory is used. Sequential state checks often doesn’t requere memory reads (until the word is exausted). Iteration could employ very cheap numberOfLeading(Trailing)Zeros intrinsic. Intel: Haswell+ AMD: Leading—K10+, Trailing—Piledriver+
  • 31. Disadvantages of separate bit state Primitive data + states Only for binary state. On pure random access, no advantage over byte states except memory usage, just perform extra work to extract bits. Relatively tricky to implement. (java.util.BitSet—no way.)
  • 32. Special value as a state Primitive data + states long d = data [i]; if (d != EMPTY ) doSomething (d); Suitable only when there is a full state and one or several empty states.
  • 33. Advantages of special values Primitive data + states Zero memory overhead. All entry data could reside the single memory location: I less memory reads are required I Cache-friendly I Possibility of atomic updates
  • 34. Special value management Primitive data + states When data domain is bounded, special values is a clear winner for enconding states, just pick up a constant out of the data domain, preferably 0, as a special value.
  • 35. Special value management Primitive data + states However, if the data domain is unbounded, a number of disadvantages of special values as states appear: I Special value should be stored within the data structure and being read on each query. I Comparison to non-constant is slower, especially than comparison to zero. I On collision, special value should be replaced, that is impossible without locking, if the data structure should be thread-safe, or if it is offline in any meaning. I Implementation become more complicated.
  • 36. Zero value as a state Primitive data + states An attempt to resolve one of the dynamic special values problems - data is compared to zero, and when zero is passed as a data itself, it is masked with another value: if ( data == zeroMask ) changeZeroMask (); data = data == 0 ? zeroMask : data ; ... long d = data [i]; if (d != 0) doSomething (d); But now data should be masked/unmasked all the time and impelementation is getting even more complicated.
  • 37. Byte along state Primitive data + states Like separate byte state, but more memory-local: On the other hand: I Only unsafe access (see section 1) I Tiring to implement I Cross cache line bounray memory IO, which 1) has penalty on many CPUs, 2) is not atomic, out-of-the-air values could appear, if the data structure is not synchronized, or IO performed only via CAS ops (Nitsan Wakart).
  • 38. Benchmarking LHash queries, random queries Primitive data + states All the hash data is in L1: I Load factors 0.3-0.6: typically byte states win I Load factor 0.9: bit states win, sometimes special values Big hashes (don’t fit caches): I Successful queries: special values win I Unsuccessful queries, including insertions: bit states win I Byte along states outperform simple byte states Zero states (with replacement) is never an option.
  • 39. Benchmarking LHash queries, iteration Primitive data + states Internal iteration (forEach): special values win. External iteration (iterators): byte states win. But on Haswell and K10, of cause, bit states beat them all. Byte along states and zero states with replacement always lose.
  • 40. Enconding of distinct entry states: bottom line Object[] arrays: more nulls. Primitive arrays: special values as states, when applicable. Bit states for iteration on Haswell+ and K10+.
  • 41. Section 3 Layout of tuples of primitives
  • 42. Layout of tuples of primitives When random access is needed, we always strive for memory locality.
  • 43. Two fields of the same length Layout of tuples of primitives byte+byte, char+short, long+double (longBitsToDouble() is a no-op). For up to 8 bytes, use arrays of the longer primitive, ex. long[] for int+int tuples. I Guarantees the tuple lies on the same cache line. I Allows to approach Java array size limits closer.
  • 44. One field is two times longer than another Layout of tuples of primitives byte+short, int+double, ... If cross cache line boundary IO is not an option, use the following layout: Reqires to access individual fields via Unsafe.
  • 45. One field is 4-8 times longer than another Layout of tuples of primitives If cross cache line boundary IO is not an option, the only reasonable approach is: k1 , long , 8 bytes k2 , long , 8 bytes k3 , long , 8 bytes v1 , short , 2 bytes v2 , short , 2 bytes v3 , short , 2 bytes 2 bytes gap k4 , long , 8 bytes ...
  • 46. One field is 4-8 times longer than another Layout of tuples of primitives Fields of the same tuple will anyway lie on different cache lines with some probability. Indexing: long kOff = (i / 3) * 32L + (i % 3) * 8; long vOff = kOff + 24; Integral division :(
  • 47. Integral division by small constant — Maybe this will help? (see Hacker’s Delight) long quot = (i * 0 x55555556L ) 32; long rem = i - quot * 3; long kOff = quot * 32 + rem * 8; long vOff = kOff + 24; — No, it won’t, because we need to obtain reminder as well as quotient.