Session replay technology refers to the process of recording and playing back the user sessions on a website or application, almost like a screen recording. This technology is vital for Product and user experience teams to analyze user behavior throughout their product and uncover potential problems.
Building it, however, presents a number of difficulties, particularly from the point of view of data and scale. Some of these challenges include:
1. Data Capture: Capturing all Page (DOM) changes, mouse movements and user activity in real time with almost no scope for buffering at client side. We discuss the impact of websockets vs http data transfer and queueing to protect against any data loss.
2. Data privacy: Recording and storing user sessions can raise privacy concerns. Session replay technology must be designed in a way that protects user privacy and complies with data protection laws. This involves challenges both at the source as well as at rest and requires geographical distribution of data.
3. Data Storage: Session replay technology generates large amounts of data as it records every user interaction on a website or application. Storing this data can be a challenge, particularly when dealing with high traffic websites or applications. We use ScyllaDb for our storage and experimented with different compaction strategies for our use case.
4. User Experience: Lastly, watching replays of sessions should be simple and speedy for optimum user experience. It implies that every recording should be playable again in near real-time and impacts our data sharding.
In this talk, we'll discuss how our team at Browsee approached these problems and what we discovered along the way.
4. Mutation Observers
● Starts with the initial page HTML
● Record all changes on a page
● Record all mouse movements, scroll and
clicks with their elements and positions
https://developer.mozilla.org/en-US/docs/Web/API/MutationObserver
5. Websockets vs HTTP
● Even transmitting this
information over HTTP would
have significant overhead
● Websockets allow fast
transmission of this data over
a persistent connection
Source:
http://blog.arungupta.me/rest-vs-websocket-comparison-benchmarks/
6. Data for User Experience
● About 1TB of recordings data every day
persisted upto a year
● For every heatmap, we aggregate every
click and every scroll along with time
spent by the user in their heatmaps
along with.
● About ~2TB of time series data
● About 1TB of Full Page snapshots
7. Data Storage
● Initially we used Cassandra as our
primary storage for replay data
● We soon moved to ScyllaDB and
were getting better throughput
per node as well as read latency.
Source:
https://www.scylladb.com/2021/08/24/apache-cassandra-4-0-vs-scyll
a-4-4-comparing-performance//
8. Size Tiered Compaction
● The default replication strategies
is the Size Tiered Compaction
Strategy
● This is the oldest and the default
strategy for both ScyllaDB and
Cassandra.
9. Size Tiered Compaction
● Major Drawback is it requires about
25-50% free space for compaction.
● Storage is one of our largest cost
head.
● 25-50% disk wastage directly
impacts cost
10. Level Tiered Compaction
● Keeps small SStables each of size
160MB
● Keeps adding data to next level
sstables. Creates new table if size
is breached.
11. Level Tiered Compaction
● In our experience it significantly
increased the write processing.
● No improvement in read latency
● The cost we were saving in disk
was getting lost in compute cycles
12. Incremental Compaction
Strategy
● A variant of Size Tiered Compaction. It
only addresses the largest compaction
step which requires free space.
● At that step, it breaks the compacting
sstables into runs and keeps
compacting the runs while deleting the
completed ones. So much lesser.
13. Data Security at Rest
● Users can choose which data
center to save and process their
data.
● We also have blacklist caches at
queueing as well as rest to
immediately respond to Data
cleanup requests.
14. Session Tagging
● Browsee marks several user insights
on session recordings depending on
the user’s Behavior
● Temporal click events like rage clicks
● Repeated events or quickly browsing
through several pages
15. Live Sessions
● Kafka’s Real time processors allow us to
process session tagging in near real
time.
● Users need real time session views
● We also need incremental session
updates on client side to fetch new data
as the users are watching a live session
16. Roadmap
● Navigation graph - A graph based time series data which scales
exponentially with site’s pages
● Bot Detection Models
● More User Behavior Models like Multi User Analysis, User Defined Tags