2. No more capacity limits!
Google had 147GB of data in 1998.
Now, ~100$ buys you a 128GB
microSD, and that is in your phone!
Storage is pervasive, abundant
and cheap.
With 64bit multicore CPUs, even
phones may store and process lots
of data.
!
Network is the bottleneck now!
20141998
3. Users wait for the data to
load too long, too often.
Web/mobile apps go data-heavy...
but RTT* does not improve.
!
Mobile devices rely on wireless...
which is unreliable by its nature.
!
A user has many devices...
so instant sync is expected.
!
The network is often slow and
unreliable! So?
* network round-trip time
4. Solution: cache everything,
sync it as needed
Once the data is delivered, caching is free.
Once data is prefetched and cached:
• there are no "loading" stalls;
• offline mode is OK;
• intermittent connection is also OK.
So, huge UX improvement!
!
But, total caching poses a challenge:
• the data is changed on both sides;
• invalidation no longer works;
• need versioning and synchronization!
!
5. CRDT enables total caching
and incremental sync
CRDT (commutative replicated data types)
• real-time background sync
• versioned data (detects new and seen)
• offline work, caching, prefetching
• conflict-free merge for concurrent changes
• CRDTs are used by Cassandra, Riak
Causal trees: collaborative real-time editing
• a CRDT replacement for OT*
• offline-first, perfectly cacheable
• in-browser (JavaScript, contentEditable)
• authorship attribution (who wrote what)
• change detection (what has been changed?)
• initially, developed for letters.yandex.ru
* Operational Transformation
6. Swarm: client-side CRDT
implementation
Swarm: real-time synchronized object cache
• a replicated model library, M of MVC
• think of "Dropbox for objects"
• client-side: JavaScript (ObjC, Java is planned)
• server-side: node.js (Java is planned)
• Backbonish, 2KLoC
Citrea: collaborative real-time editor
• builds on regular contentEditable
• advanced versioning/authorship tracking
• think of "Google Docs, embedded"
7. Building a total cache system
from scratch is man-years
• "There are only two hard things
in Computer Science: cache
invalidation and naming things"
-- attributed to P.Karlton
• Data on the client turns a Web
system (simple) into an AP*
system (complex)
• That is man-years.
* by the CAP theorem
8. Team: we implement CRDTs
faster than the theory is written! *
Victor Grishchenko, PhD, USU and Delft
University of Technology, Bank of Russia,
Yandex, does rocket science.
Alexei Balandin, USU, Beeline, AT
Consulting, gosuslugi.ru e-gov, does
enterprisey stuff.
* we actually do sometimes, as we found at PaPEC'14
9. Swarm
Mail us, call us:
victor.grishchenko@gmail.com
!
Victor +7 926 102 33 94