Session Replays: Unlocking User Experience with Data

•

0 likes•2 views

Session replay technology refers to the process of recording and playing back the user sessions on a website or application, almost like a screen recording. This technology is vital for Product and user experience teams to analyze user behavior throughout their product and uncover potential problems. Building it, however, presents a number of difficulties, particularly from the point of view of data and scale. Some of these challenges include: 1. Data Capture: Capturing all Page (DOM) changes, mouse movements and user activity in real time with almost no scope for buffering at client side. We discuss the impact of websockets vs http data transfer and queueing to protect against any data loss. 2. Data privacy: Recording and storing user sessions can raise privacy concerns. Session replay technology must be designed in a way that protects user privacy and complies with data protection laws. This involves challenges both at the source as well as at rest and requires geographical distribution of data. 3. Data Storage: Session replay technology generates large amounts of data as it records every user interaction on a website or application. Storing this data can be a challenge, particularly when dealing with high traffic websites or applications. We use ScyllaDb for our storage and experimented with different compaction strategies for our use case. 4. User Experience: Lastly, watching replays of sessions should be simple and speedy for optimum user experience. It implies that every recording should be playable again in near real-time and impacts our data sharding. In this talk, we'll discuss how our team at Browsee approached these problems and what we discovered along the way.

Software

Session Replays: Unlocking
User Experience with Data
Saurabh Mathur, Founder, browsee.io

About Me
● Founder Browsee.io
● Previously, build a Customer Data Platform startup
called Connecto. Acquired by Cardekho.com
● Ex - Google

Session Replays
● Video Like Replay of User
Behavior
● How is this made possible ?

Mutation Observers
● Starts with the initial page HTML
● Record all changes on a page
● Record all mouse movements, scroll and
clicks with their elements and positions
https://developer.mozilla.org/en-US/docs/Web/API/MutationObserver

Websockets vs HTTP
● Even transmitting this
information over HTTP would
have signiﬁcant overhead
● Websockets allow fast
transmission of this data over
a persistent connection
Source:
http://blog.arungupta.me/rest-vs-websocket-comparison-benchmarks/

Data for User Experience
● About 1TB of recordings data every day
persisted upto a year
● For every heatmap, we aggregate every
click and every scroll along with time
spent by the user in their heatmaps
along with.
● About ~2TB of time series data
● About 1TB of Full Page snapshots

Data Storage
● Initially we used Cassandra as our
primary storage for replay data
● We soon moved to ScyllaDB and
were getting better throughput
per node as well as read latency.
Source:
https://www.scylladb.com/2021/08/24/apache-cassandra-4-0-vs-scyll
a-4-4-comparing-performance//

Size Tiered Compaction
● The default replication strategies
is the Size Tiered Compaction
Strategy
● This is the oldest and the default
strategy for both ScyllaDB and
Cassandra.

Size Tiered Compaction
● Major Drawback is it requires about
25-50% free space for compaction.
● Storage is one of our largest cost
head.
● 25-50% disk wastage directly
impacts cost

Level Tiered Compaction
● Keeps small SStables each of size
160MB
● Keeps adding data to next level
sstables. Creates new table if size
is breached.

Level Tiered Compaction
● In our experience it signiﬁcantly
increased the write processing.
● No improvement in read latency
● The cost we were saving in disk
was getting lost in compute cycles

Incremental Compaction
Strategy
● A variant of Size Tiered Compaction. It
only addresses the largest compaction
step which requires free space.
● At that step, it breaks the compacting
sstables into runs and keeps
compacting the runs while deleting the
completed ones. So much lesser.

Data Security at Rest
● Users can choose which data
center to save and process their
data.
● We also have blacklist caches at
queueing as well as rest to
immediately respond to Data
cleanup requests.

Session Tagging
● Browsee marks several user insights
on session recordings depending on
the user’s Behavior
● Temporal click events like rage clicks
● Repeated events or quickly browsing
through several pages

Live Sessions
● Kafka’s Real time processors allow us to
process session tagging in near real
time.
● Users need real time session views
● We also need incremental session
updates on client side to fetch new data
as the users are watching a live session

Roadmap
● Navigation graph - A graph based time series data which scales
exponentially with site’s pages
● Bot Detection Models
● More User Behavior Models like Multi User Analysis, User Deﬁned Tags

Similar to Session Replays: Unlocking User Experience with Data

Adventures in Observability - Clickhouse and InstanaMarcel Birkner

Adventures in Observability: How in-house ClickHouse deployment enabled Inst...Altinity Ltd

La vita nella corsia di sorpasso; A tutta velocità, XPages!Ulrich Krause

Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent

OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...NETWAYS

Web Performance OptimizationLivares Technologies Pvt Ltd

Performance Monitoring at SpreadshirtMartin Breest

Intro to XPages for Administrators (DanNotes, November 28, 2012)Per Henrik Lausten

Data Lessons Learned at ScaleCharlie Reverte

Log FilesHeinrich Hartmann

CookieSamit Kumar Kapat

Caching for Microservices Architectures: Session II - Caching PatternsVMware Tanzu

Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA

Adm07 The Health Check Extravaganza for IBM Social and Collaboration Environm...Kim Greene

Managing Memory & Locks - Series 1 Memory ManagementDAGEOP LTD

Final presentasi gnome asiaAnton Siswo

A Real Time Web Analytics SystemMahesh Patwardhan

OSMC 2019 | How to improve database Observability by Charles JudithNETWAYS

Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...Flink Forward

Data Science in the Cloud @StitchFixC4Media

Similar to Session Replays: Unlocking User Experience with Data (20)

Adventures in Observability - Clickhouse and Instana

Adventures in Observability: How in-house ClickHouse deployment enabled Inst...

La vita nella corsia di sorpasso; A tutta velocità, XPages!

Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...

OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...

Web Performance Optimization

Performance Monitoring at Spreadshirt

Intro to XPages for Administrators (DanNotes, November 28, 2012)

Data Lessons Learned at Scale

Log Files

Caching for Microservices Architectures: Session II - Caching Patterns

Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...

Adm07 The Health Check Extravaganza for IBM Social and Collaboration Environm...

Managing Memory & Locks - Series 1 Memory Management

Final presentasi gnome asia

A Real Time Web Analytics System

OSMC 2019 | How to improve database Observability by Charles Judith

Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...

Data Science in the Cloud @StitchFix

Recently uploaded

The Evolution of Karaoke From Analog to App.pdfPower Karaoke

XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan

Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171

5 Signs You Need a Fashion PLM Software.pdfWave PLM

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

What is Fashion PLM and Why Do You Need ItWave PLM

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin

DNT_Corporate presentation know about usDynamic Netsoft

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.

Asset Management Software - InfographicHr365.us smith

chapter--4-software-project-planning.pptkotipi9215

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3

Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH

Recently uploaded (20)

The Evolution of Karaoke From Analog to App.pdf

XpertSolvers: Your Partner in Building Innovative Software Solutions

Advancing Engineering with AI through the Next Generation of Strategic Projec...

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf

5 Signs You Need a Fashion PLM Software.pdf

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...

HR Software Buyers Guide in 2024 - HRSoftware.com

What is Fashion PLM and Why Do You Need It

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

Hand gesture recognition PROJECT PPT.pptx

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...

DNT_Corporate presentation know about us

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data

Asset Management Software - Infographic

chapter--4-software-project-planning.ppt

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data

Der Spagat zwischen BIAS und FAIRNESS (2024)

Session Replays: Unlocking User Experience with Data

1. Session Replays: Unlocking User Experience with Data Saurabh Mathur, Founder, browsee.io

2. About Me ● Founder Browsee.io ● Previously, build a Customer Data Platform startup called Connecto. Acquired by Cardekho.com ● Ex - Google

3. Session Replays ● Video Like Replay of User Behavior ● How is this made possible ?

4. Mutation Observers ● Starts with the initial page HTML ● Record all changes on a page ● Record all mouse movements, scroll and clicks with their elements and positions https://developer.mozilla.org/en-US/docs/Web/API/MutationObserver

5. Websockets vs HTTP ● Even transmitting this information over HTTP would have signiﬁcant overhead ● Websockets allow fast transmission of this data over a persistent connection Source: http://blog.arungupta.me/rest-vs-websocket-comparison-benchmarks/

6. Data for User Experience ● About 1TB of recordings data every day persisted upto a year ● For every heatmap, we aggregate every click and every scroll along with time spent by the user in their heatmaps along with. ● About ~2TB of time series data ● About 1TB of Full Page snapshots

7. Data Storage ● Initially we used Cassandra as our primary storage for replay data ● We soon moved to ScyllaDB and were getting better throughput per node as well as read latency. Source: https://www.scylladb.com/2021/08/24/apache-cassandra-4-0-vs-scyll a-4-4-comparing-performance//

8. Size Tiered Compaction ● The default replication strategies is the Size Tiered Compaction Strategy ● This is the oldest and the default strategy for both ScyllaDB and Cassandra.

9. Size Tiered Compaction ● Major Drawback is it requires about 25-50% free space for compaction. ● Storage is one of our largest cost head. ● 25-50% disk wastage directly impacts cost

10. Level Tiered Compaction ● Keeps small SStables each of size 160MB ● Keeps adding data to next level sstables. Creates new table if size is breached.

11. Level Tiered Compaction ● In our experience it signiﬁcantly increased the write processing. ● No improvement in read latency ● The cost we were saving in disk was getting lost in compute cycles

12. Incremental Compaction Strategy ● A variant of Size Tiered Compaction. It only addresses the largest compaction step which requires free space. ● At that step, it breaks the compacting sstables into runs and keeps compacting the runs while deleting the completed ones. So much lesser.

13. Data Security at Rest ● Users can choose which data center to save and process their data. ● We also have blacklist caches at queueing as well as rest to immediately respond to Data cleanup requests.

14. Session Tagging ● Browsee marks several user insights on session recordings depending on the user’s Behavior ● Temporal click events like rage clicks ● Repeated events or quickly browsing through several pages

15. Live Sessions ● Kafka’s Real time processors allow us to process session tagging in near real time. ● Users need real time session views ● We also need incremental session updates on client side to fetch new data as the users are watching a live session

16. Roadmap ● Navigation graph - A graph based time series data which scales exponentially with site’s pages ● Bot Detection Models ● More User Behavior Models like Multi User Analysis, User Deﬁned Tags

17. Thanks!

Session Replays: Unlocking User Experience with Data

Recommended

Recommended

More Related Content

Similar to Session Replays: Unlocking User Experience with Data

Similar to Session Replays: Unlocking User Experience with Data (20)

Recently uploaded

Recently uploaded (20)

Session Replays: Unlocking User Experience with Data