SlideShare a Scribd company logo
1 of 71
Download to read offline
From CRUD to Event Sourcing
an Investible Stock Universe
O'Reilly Software Architecture Conference March 18, 2015 #oreillysacon
Marc Siegel
Team Lead (@ms_yd)
Brian Roberts
Senior Developer / Team Lead (@flicken)
TIM Group
What's in it for you?
● Answer the Unanswerable
○ both now and future
Vocabulary
1. Domain-Driven Design (DDD)
○ Event
○ Command
○ Aggregate
2. Event Sourcing (ES)
○ Projection
○ Read Model
How does Event Sourcing Work?
(Image credit: Wikipedia)
How does Event Sourcing Work?
(Quotes from Greg Young)
"State transitions are an important part of our problem
space and should be modeled within our domain"
How does Event Sourcing Work?
(Quotes from Greg Young)
"State transitions are an important part of our problem
space and should be modeled within our domain"
"Event Sourcing says all state is transient and you only
store facts."
Vocabulary
1. Domain-Driven Design (DDD)
○ Event: something that happened in the past; a fact; a state transition
○ Command
○ Aggregate
2. Event Sourcing (ES)
○ Projection
○ Read Model
Lifecycle of a Listing (Simplified)
t
1995 2000 2005 2010 2015
IPET (Pets.com)
MSFT (Microsoft)
Q: How to represent the meaningful state
transitions of the domain as events?
FB (Facebook)
Listing Lifecycle Events
IPET (Pets.com)
TickerListed
ticker: MSFT
date: 1986
TickerListed
ticker: IPET
date: 2000
TickerDelisted
ticker: IPET
date: 2000
TickerListed
ticker: FB
date: 2012
Events:
FB (Facebook)
t
1995 2000 2005 2010 2015
MSFT (Microsoft)
Determine Current State
t
1995 2000 2005 2010 2015
IPET (Pets.com)
MSFT (Microsoft)
Q: How do we
determine the state as
of a given year? Or as
of today?
FB (Facebook)
Determine Current State
t
1995 2000 2005 2010 2015
IPET (Pets.com)
MSFT (Microsoft)
Q: How do we
determine the state as
of a given year? Or as
of today?
A: "When we talk about Event
Sourcing, current state is a
left-fold of previous behaviors"
-- Greg Young
FB (Facebook)
Current State is a Left Fold of Events
● FP: Left Fold aggregates a collection via a function and an initial value
○ Ruby: [1, 2, 3].inject(0, :+) == 6 # symbol fn name
○ Scala: List(1, 2, 3).foldLeft(0)(_ + _) == 6 // anon function
● Provide an initial state s0
and a function f : (S, E) => S
● Current State after event e3
is:
○ = leftFold( [e1
, e2
, e3
], s0
, f )
○ = f( f( f( s0
, e1
), e2
), e3
)
Replay to Earlier State: 1998
t
1995 2000 2005 2010 2015
IPET (Pets.com)
MSFT (Microsoft)
State in 1998 =
[].apply(
TickerListed
ticker: MSFT
date: 1986
)
FB (Facebook)
Replay to Earlier State: 1998
t
1995 2000 2005 2010 2015
IPET (Pets.com)
MSFT (Microsoft)
MSFT
Listed
State in 1998 =
[ ]
FB (Facebook)
Replay to Earlier State: 2000
IPET (Pets.com)
MSFT
Listed
State in 2000 =
[ ] TickerListed
ticker: IPET
date: 2000
.apply( )
FB (Facebook)
MSFT (Microsoft)
t
1995 2000 2005 2010 2015
Replay to Earlier State: 2000
t
1995 2000 2005 2010 2015
IPET (Pets.com)
MSFT (Microsoft)
MSFT
Listed
IPET
Listed
State in 2000 =
[ ]
FB (Facebook)
Replay to Earlier State: 2001
t
1995 2000 2005 2010 2015
IPET (Pets.com)
MSFT (Microsoft)
MSFT
Listed
IPET
Listed
State in 2001 =
[ ] TickerDelisted
ticker: IPET
date: 2000
.apply( )
FB (Facebook)
Replay to Earlier State: 2001
t
1995 2000 2005 2010 2015
IPET (Pets.com)
MSFT (Microsoft)
MSFT
Listed
IPET
Delisted
State in 2001 =
[ ]
FB (Facebook)
Replay to Earlier State: 2012
t
1995 2000 2005 2010 2015
IPET (Pets.com)
MSFT (Microsoft)
FB (Facebook)
MSFT
Listed
IPET
Delisted
State in 2012 =
[ ] TickerListed
ticker: FB
date: 2012
.apply( )
Replay to Earlier State: 2012
t
1995 2000 2005 2010 2015
IPET (Pets.com)
MSFT (Microsoft)
MSFT
Listed
IPET
Delisted
FB
Listed
State in 2012 =
[ ]
FB (Facebook)
Review - Only the Events are Stored
t
1995 2000 2005 2010 2015
IPET (Pets.com)
MSFT (Microsoft)
TickerListed
ticker: MSFT
date: 1986
TickerListed
ticker: IPET
date: 2000
TickerDelisted
ticker: IPET
date: 2000
TickerListed
ticker: FB
date: 2012
Events:
FB (Facebook)
Potential Benefits
● Answer the unanswerable (via history replay)
● Debugging of historical states deterministically (via history replay)
● Never Lose Information (write-only store)
● Edit the Past (via new events effective at older times)
● Optimize reads (purpose-built read models)
● Enhanced Analytics (analyze all history as it occurred)
Potential Drawbacks
● Eventual Consistency
● No built-in querying of domain models (SELECT name WHERE …)
● Risks of using a new architectural pattern
● Lack of agreement on tools and infrastructure
● Increased storage requirements
What is different from CRUD?
From CRUD to Event Sourcing an Investible Stock Universe
CRUD Micro-Service
REST API
Data Store
Controllers
ORM Model ORM Model
Background
Process
Create / Update
External
Market
Data
APIs
Client #3Client #2Client #1
Update / DeleteRead #1 Read #2
ORM Model
Event-sourced Micro-Service
Read Model #1
(files in S3)
Read Model #2
(specific database)
Write
Events
Projection #1 Projection #2
Command
Processor
Domain Models (pure) Background
Process
External
Market
Data
APIs
Client #1
Read #1
Event Store
Read
Events Read
Events
Client #2
Read #2
Client #3
Update / Delete
Commands
Create / Update
Commands
Vocabulary
1. Domain-Driven Design (DDD)
○ Event: something that happened in the past; a fact; a state transition
○ Command: a request for a state transition in the domain
○ Aggregate
2. Event Sourcing (ES)
○ Projection
○ Read Model
Aside: Domain Model is Pure?
● FP: A "pure" function doesn't cause any side effects
○ No reads or writes that modify the world
○ No altering a mutable data structure
○ Substitute f(x) for its result without changing meaning of program
● An event-sourced domain can be two pure functions
○ process(currentState, command) => e1, e2, e3
○ apply (currentState, event) => nextState
● Separates the logic of the model from interactions with any changing state
in the world
What is different from CRUD?
(Quotes from Greg Young)
"The model that a client needs for the data in a distributed
system is screen-based and different than the domain
model."
Trade-offs
● ACID vs. Eventual Consistency
○ CRUD w/ ACID database
■ Once a row is written, subsequent reads reflect it
■ But: no help with domain-level consistency!
○ Event Sourced
■ Once event is written, subsequent reads reflect it
■ Projections eventually consistent
● Up-front costs
○ Domain modeling is hard!
CRUD Models of Market Data
From CRUD to Event Sourcing an Investible Stock Universe
CRUD Models of Market Data
Listing
- ticker
- exchange
- trading status
- ....
*
Fundamentals
- earnings per share
- market cap
- ...
primary
listing
Org
- name
- ...
Price
- date
- value
- currency
*
Universe
- name
- ...
**
other
listings
merged
with
spin-off
from
Answerable?
Q: Was AOL in Investible Universe in 1995?
Known
Equities
Active
Also:
● Market Cap > X
● Price > Y
● Avg Turnover > Z
● ...
Investible
2001 - Merger of AOL/Time Warner
Listings
.findByTicker("AOL")
.setTicker("TWX")
t
1995 2000 2005 2010 2015
Time Warner
Listing
ticker: AOL
trading status: Listed
....
Listing
ticker: TWX
trading status: Listed
....
Org
name: Time Warner
Org
name: America Online
Orgs
.findByName("Time Warner")
.mergeWith("America Online")
Orgs
.findByName("America Online")
.setName("AOL Time Warner)
.addListing("TWX")
Listings
.findByTicker("TWX")
.setTradingStatus("Delisted")
America Online
AOL Time
Warner Time Warner
AOL
Time Warner
2001 - Merger of AOL/Time Warner
Listings
.findByTicker("AOL")
.setTicker("TWX")
t
1995 2000 2005 2010 2015
Listing
ticker: AOL
trading status: Listed
....
Listing
ticker: TWX
trading status: Delisted
...
Org
name: Time Warner
Org
name: AOL Time Warner
Orgs
.findByName("Time Warner")
.mergeWith("America Online")
Orgs
.findByName("America Online")
.setName("AOL Time Warner)
.addListing("TWX")
Listings
.findByTicker("TWX")
.setTradingStatus("Delisted")
merged
with
America Online
Time Warner
AOL Time
Warner Time Warner
AOL
Time Warner
2003 - Name / Ticker Change
Listings
.findByTicker("AOL")
.setTicker("TWX")
t
1995 2000 2005 2010 2015
Listing
ticker: TWX (old)
trading status: Delisted
...
Org
name: Time Warner
Org
name: AOL Time Warner
Listing
ticker: AOL
trading status: Listed
....
Org
.findByName("AOL Time Warner")
.setName("Time Warner")
Listings
.findByTicker("AOL")
.setTicker("TWX")
merged
with
Time Warner
America Online
AOL Time
Warner Time Warner
AOL
Time Warner
2003 - Name / Ticker Change
Listings
.findByTicker("AOL")
.setTicker("TWX")
Listing
ticker: TWX (old)
trading status: Delisted
...
Org
name: Time Warner
Org
name: Time Warner
Listing
ticker: TWX
old ticker: AOL
trading status: Listed
Org
.findByName("AOL Time Warner")
.setName("Time Warner")
Listings
.findByTicker("AOL")
.setTicker("TWX")
merged
with
t
1995 2000 2005 2010 2015
Time Warner
AOL Time
Warner Time WarnerAmerica Online
AOL
Time Warner
2009 - AOL Spinoff
Listing
ticker: TWX (old)
trading status:
Delisted
...
Org
name: Time Warner
Org
name: Time Warner
Listing
ticker: TWX
old ticker: AOL
trading status:
Listed
merged
with
Org
.findByName("Time Warner")
.spinOff("AOL")
.addListing("AOL")
Listings
.newListing("AOL")
t
1995 2000 2005 2010 2015
Time Warner
AOL Time
Warner Time WarnerAmerica Online
AOL
Time Warner
2009 - AOL Spinoff
Listing
ticker: TWX (old)
trading status:
Delisted
...
Org
name: Time Warner
Org
name: Time Warner
Listing
ticker: TWX
old ticker: AOL
trading status:
Listed
merged
with
Org
name: AOL
Listing
ticker: AOL
trading status:
Listed
...
spin-off
from
Org
.findByName("Time Warner")
.spinOff("AOL")
.addListing("AOL")
Listings
.newListing("AOL")
t
1995 2000 2005 2010 2015
Time Warner
AOL Time
Warner Time WarnerAmerica Online
AOL
Time Warner
Answerable with CRUD Models?
Q: Was AOL in Investible Universe in 1995?
A: No and Yes
○ No current AOL org (didn't exist at time)
○ Yes former America Online (now Time Warner)
Complexity in query, requires previous states
○ Query against version columns with date ranges?
○ Query against previous versions tables?
CRUD Models - How to Update?
● Update price of TWX on 1995-01-03
○ Original: $56.22 Correct: $52.62
● Complexity in query
○ Wrong: Listings.findByTicker("TWX") // this is AOL!
○ Right: Complex historical query...
● Complexity in update
○ Org Primary Listing - any change?
○ Org Universe membership - any changes?
○ Support Two-Dimensional Time aka as-of query?
Problems
From CRUD to Event Sourcing an Investible Stock Universe
Main Problem
Unanswerable questions
● Time travel intractable
● Past not always reproducible
More Problems
● Correctness
○ Divergent interpretations of data
● Availability
○ How often can data be unavailable to clients?
● Performance
○ How fast must operations complete?
● Determinism
○ Reproducing prior states for reporting, debugging, etc.
● Auditability
○ Who changed what when and why?
Problems - Correctness
● Need a new definition of e.g. adjusted price
○ Old: unadjusted * splits * spin-offs
○ New: unadjusted * splits * spin-offs * dividends
● But...
○ Some client systems still need the old definition
○ CRUD data store didn't store the individual factors
● Common Solutions
○ Add past to relational model? Reprocess?
Problems - Availability
● How long can data be unavailable?
○ Not long - End-user client
○ Hoursto Days - Reporting
● But...
○ Most stringent of client requirements applies to all
○ Cascading failures: unavailability propagates
● Common solutions
○ bulk-heading, circuit-breakers, more servers
○ more complex than necessary?
Problems - Performance
● How fast must operations complete?
○ Writes need to keep up with input
○ Reads have varying requirements
● But..
○ Due to contention on Shared Mutable State, badly
performing Reads can impact everything else
● Common Solutions
○ Caching, sharding, more server resources
○ Trade-off with ACID consistency
Problems - Determinism
● Reproducing prior states
○ Reporting Consistently on a Past period
■ Apply adjustments only to end of the period
○ Debugging
■ Reproduce state of data in past
● But..
○ Not easy with Shared Mutable State!
● Common Solutions
○ Versioned rows, audit tables, database snapshots
Problems - Auditability
● Why did data change?
○ Attribution (source of data)
○ Security (who did it)
○ History (what and when was previous value)
● But..
○ Not easy with Shared Mutable State!
● Common Solutions
○ Versioned rows, audit tables
Event Sourced Models of Market Data
From CRUD to Event Sourcing an Investible Stock Universe
Event Sourced Listings
Listing
Aggregate
Other Reference Data...
ListingListed
ListingDelisted
ListingDeleted
Ticker ISIN
Org Name
Reference Data
ListingInitialized
ListingChanged
ListingIsinChanged
ListingTickerChanged
Listing Reference
Data Processor
Our Listing Id
Vendor Listing Id
ListingUndeleted
ListingSourceOrgIdChanged
Exch
External
Market
Data
APIs
Background
Process
Vendor Org Id
Event Sourced Orgs
Org
Aggregate
ListingListed
ListingDelisted
ListingDeleted
ListingIsinChanged
ListingRicChanged
Org Lifecycle
Data Processor
Our Org Id
Reference
Data
Reference Data
Org Name
Vendor Org Id
ListingUndeleted
ListingSourceOrgIdChanged
Listing
Reference
Data
Processor
(state…)
OrgListed
OrgDelisted
OrgDeleted
OrgUndeleted
OrgListingAdded
OrgListingRemoved
OrgEnteredUniverse
OrgLeftUniverse
OrgPrimaryListingChanged
Listing
1..n
primary
Vocabulary
1. Domain-Driven Design (DDD)
○ Event: something that happened in the past; a fact; a state transition
○ Command: a request for a state transition in the domain
○ Aggregate: domain objects in a transactionally-consistent unit
2. Event Sourcing (ES)
○ Projection
○ Read Model
Resulting Read Models
From CRUD to Event Sourcing an Investible Stock Universe
Stock Universes - Read Model
● Generate as CSV files in S3, clients retrieve via REST API
GET /universes/1995-01-03
Org Id,Known Universe?,Active Universe?,Investible Universe?,Primary Listing Id, … Ticker
"49498",true,true,true,"0x00100b000b569402","US8873173038",..."AOL"
● Problems?
○ Correctness: Interpretation only for this use case
○ Availability: No impact on other use cases
○ Performance: No read-side calculations
○ Determinism: Can re-generate from event source data
○ Auditability: Available in event source data
● Consistency is at domain level -- entire history of universes in this case
○ Generate entire history to S3 bucket, API switches buckets atomically
Answerable via Event Sourcing?
Q: Was AOL in Investible Universe in 1995?
GET /universes/1995-01-03
Org Id,Known Universe?,Active Universe?,Investible Universe?,Primary Listing Id, …, Ticker
"49498",true,true,true,"0x00100b000b569402","US8873173038",...,"AOL"
A: Yes! former America Online (now Time Warner)
Trivial query against purpose-built Read Model
Org History - Read Model
● Build directly from indexed event stream, clients retrieve via REST API
GET /orgs?ticker=TRW
[ { eventType: "ListingListed", listing_id: "1", ticker: "AOL", processedAt: "...", effectiveAt: "...", …}
{ eventType: "OrgListed", … },
{ eventType: "ListingAdded", listing_id: "1", …}, …
{ eventType: "ListingTickerChanged", listing_id: "1", old_ticker: "AOL", ticker: "TWX" …}, ... ]
● Problems?
○ Correctness: Interpretation only for this use case
○ Availability: No impact on other use cases
○ Performance: No read-side calculations
○ Determinism: Reads directly from event source data
○ Auditability: Available in event source data
● Consistency is at domain level -- entire history of single org in this case
○ Generate entire history on-the-fly directly from source (indexed!)
Vocabulary
1. Domain-Driven Design (DDD)
○ Event: something that happened in the past; a fact; a state transition
○ Command: a request for a state transition in the domain
○ Aggregate: domain objects in a transactionally-consistent unit
2. Event Sourcing (ES)
○ Projection: to derive current state from the stream of events
○ Read Model: a model of current state designed to answer a query
Read Model vs Cache
A cache is a query intermediary. It holds a previous
response until invalidated, then queries again.
A read model does not query. Applying new events to it
changes the answer it returns.
Example: count of people in room
● both may return 10 when the answer is now 11 or 9 - staleness
● cache counts the people - slow and may require locking the doors
● read model applies the entrance/exit events - like a click counter - no
impact on people nor doors
From CRUD to Event Sourcing an Investible Stock Universe
Conclusions
● Answer the "unanswerable"
● Avoid impacts on other use cases
● Keep facts that may answer future questions
Marc Siegel
@ms_yd
Brian Roberts
@flicken
Questions?
Follow up later
From CRUD to Event Sourcing an Investible Stock Universe
Events vs Audit Tables or Versioned Rows
An audit table or row version column use shared mutable state as the source
of truth, and additionally store some history. Inconsistencies are hard to fix,
edits of the past are challenging, new use cases can be challenging.
An event store is an append-only list of immutable facts. What has occurred is
recorded, and can be replayed to interpret according to a future use case.
Example: count of people in room
● audit table is a logbook in the room - query may be complex, facts
may not be consistent, depends on keeping it up to date
● versioned rows is a logbook carried by each person - same issues as
audit table
● event store is "just the facts", only interpretation changes

More Related Content

Recently uploaded

Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsJaydeep Chhasatia
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024Mind IT Systems
 
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageSales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageDista
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxAutus Cyber Tech
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfBrain Inventory
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmonyelliciumsolutionspun
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorShane Coughlan
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyRaymond Okyere-Forson
 
Webinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptWebinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptkinjal48
 
Fields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxFields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxJoão Esperancinha
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Jaydeep Chhasatia
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Enterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze IncEnterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze Incrobinwilliams8624
 
Introduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntroduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntelliSource Technologies
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadIvo Andreev
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilVICTOR MAESTRE RAMIREZ
 
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.Sharon Liu
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIIvo Andreev
 

Recently uploaded (20)

Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024
 
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageSales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptx
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdf
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human Beauty
 
Webinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptWebinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.ppt
 
Fields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxFields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptx
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Enterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze IncEnterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze Inc
 
Introduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntroduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptx
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-Council
 
Salesforce AI Associate Certification.pptx
Salesforce AI Associate Certification.pptxSalesforce AI Associate Certification.pptx
Salesforce AI Associate Certification.pptx
 
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 

From CRUD to Event Sourcing an Investible Stock Universe

  • 1. From CRUD to Event Sourcing an Investible Stock Universe O'Reilly Software Architecture Conference March 18, 2015 #oreillysacon
  • 2. Marc Siegel Team Lead (@ms_yd) Brian Roberts Senior Developer / Team Lead (@flicken) TIM Group
  • 3. What's in it for you? ● Answer the Unanswerable ○ both now and future
  • 4. Vocabulary 1. Domain-Driven Design (DDD) ○ Event ○ Command ○ Aggregate 2. Event Sourcing (ES) ○ Projection ○ Read Model
  • 5. How does Event Sourcing Work?
  • 7. How does Event Sourcing Work? (Quotes from Greg Young) "State transitions are an important part of our problem space and should be modeled within our domain"
  • 8. How does Event Sourcing Work? (Quotes from Greg Young) "State transitions are an important part of our problem space and should be modeled within our domain" "Event Sourcing says all state is transient and you only store facts."
  • 9. Vocabulary 1. Domain-Driven Design (DDD) ○ Event: something that happened in the past; a fact; a state transition ○ Command ○ Aggregate 2. Event Sourcing (ES) ○ Projection ○ Read Model
  • 10. Lifecycle of a Listing (Simplified) t 1995 2000 2005 2010 2015 IPET (Pets.com) MSFT (Microsoft) Q: How to represent the meaningful state transitions of the domain as events? FB (Facebook)
  • 11. Listing Lifecycle Events IPET (Pets.com) TickerListed ticker: MSFT date: 1986 TickerListed ticker: IPET date: 2000 TickerDelisted ticker: IPET date: 2000 TickerListed ticker: FB date: 2012 Events: FB (Facebook) t 1995 2000 2005 2010 2015 MSFT (Microsoft)
  • 12. Determine Current State t 1995 2000 2005 2010 2015 IPET (Pets.com) MSFT (Microsoft) Q: How do we determine the state as of a given year? Or as of today? FB (Facebook)
  • 13. Determine Current State t 1995 2000 2005 2010 2015 IPET (Pets.com) MSFT (Microsoft) Q: How do we determine the state as of a given year? Or as of today? A: "When we talk about Event Sourcing, current state is a left-fold of previous behaviors" -- Greg Young FB (Facebook)
  • 14. Current State is a Left Fold of Events ● FP: Left Fold aggregates a collection via a function and an initial value ○ Ruby: [1, 2, 3].inject(0, :+) == 6 # symbol fn name ○ Scala: List(1, 2, 3).foldLeft(0)(_ + _) == 6 // anon function ● Provide an initial state s0 and a function f : (S, E) => S ● Current State after event e3 is: ○ = leftFold( [e1 , e2 , e3 ], s0 , f ) ○ = f( f( f( s0 , e1 ), e2 ), e3 )
  • 15. Replay to Earlier State: 1998 t 1995 2000 2005 2010 2015 IPET (Pets.com) MSFT (Microsoft) State in 1998 = [].apply( TickerListed ticker: MSFT date: 1986 ) FB (Facebook)
  • 16. Replay to Earlier State: 1998 t 1995 2000 2005 2010 2015 IPET (Pets.com) MSFT (Microsoft) MSFT Listed State in 1998 = [ ] FB (Facebook)
  • 17. Replay to Earlier State: 2000 IPET (Pets.com) MSFT Listed State in 2000 = [ ] TickerListed ticker: IPET date: 2000 .apply( ) FB (Facebook) MSFT (Microsoft) t 1995 2000 2005 2010 2015
  • 18. Replay to Earlier State: 2000 t 1995 2000 2005 2010 2015 IPET (Pets.com) MSFT (Microsoft) MSFT Listed IPET Listed State in 2000 = [ ] FB (Facebook)
  • 19. Replay to Earlier State: 2001 t 1995 2000 2005 2010 2015 IPET (Pets.com) MSFT (Microsoft) MSFT Listed IPET Listed State in 2001 = [ ] TickerDelisted ticker: IPET date: 2000 .apply( ) FB (Facebook)
  • 20. Replay to Earlier State: 2001 t 1995 2000 2005 2010 2015 IPET (Pets.com) MSFT (Microsoft) MSFT Listed IPET Delisted State in 2001 = [ ] FB (Facebook)
  • 21. Replay to Earlier State: 2012 t 1995 2000 2005 2010 2015 IPET (Pets.com) MSFT (Microsoft) FB (Facebook) MSFT Listed IPET Delisted State in 2012 = [ ] TickerListed ticker: FB date: 2012 .apply( )
  • 22. Replay to Earlier State: 2012 t 1995 2000 2005 2010 2015 IPET (Pets.com) MSFT (Microsoft) MSFT Listed IPET Delisted FB Listed State in 2012 = [ ] FB (Facebook)
  • 23. Review - Only the Events are Stored t 1995 2000 2005 2010 2015 IPET (Pets.com) MSFT (Microsoft) TickerListed ticker: MSFT date: 1986 TickerListed ticker: IPET date: 2000 TickerDelisted ticker: IPET date: 2000 TickerListed ticker: FB date: 2012 Events: FB (Facebook)
  • 24. Potential Benefits ● Answer the unanswerable (via history replay) ● Debugging of historical states deterministically (via history replay) ● Never Lose Information (write-only store) ● Edit the Past (via new events effective at older times) ● Optimize reads (purpose-built read models) ● Enhanced Analytics (analyze all history as it occurred)
  • 25. Potential Drawbacks ● Eventual Consistency ● No built-in querying of domain models (SELECT name WHERE …) ● Risks of using a new architectural pattern ● Lack of agreement on tools and infrastructure ● Increased storage requirements
  • 26. What is different from CRUD?
  • 28. CRUD Micro-Service REST API Data Store Controllers ORM Model ORM Model Background Process Create / Update External Market Data APIs Client #3Client #2Client #1 Update / DeleteRead #1 Read #2 ORM Model
  • 29. Event-sourced Micro-Service Read Model #1 (files in S3) Read Model #2 (specific database) Write Events Projection #1 Projection #2 Command Processor Domain Models (pure) Background Process External Market Data APIs Client #1 Read #1 Event Store Read Events Read Events Client #2 Read #2 Client #3 Update / Delete Commands Create / Update Commands
  • 30. Vocabulary 1. Domain-Driven Design (DDD) ○ Event: something that happened in the past; a fact; a state transition ○ Command: a request for a state transition in the domain ○ Aggregate 2. Event Sourcing (ES) ○ Projection ○ Read Model
  • 31. Aside: Domain Model is Pure? ● FP: A "pure" function doesn't cause any side effects ○ No reads or writes that modify the world ○ No altering a mutable data structure ○ Substitute f(x) for its result without changing meaning of program ● An event-sourced domain can be two pure functions ○ process(currentState, command) => e1, e2, e3 ○ apply (currentState, event) => nextState ● Separates the logic of the model from interactions with any changing state in the world
  • 32. What is different from CRUD? (Quotes from Greg Young) "The model that a client needs for the data in a distributed system is screen-based and different than the domain model."
  • 33. Trade-offs ● ACID vs. Eventual Consistency ○ CRUD w/ ACID database ■ Once a row is written, subsequent reads reflect it ■ But: no help with domain-level consistency! ○ Event Sourced ■ Once event is written, subsequent reads reflect it ■ Projections eventually consistent ● Up-front costs ○ Domain modeling is hard!
  • 34. CRUD Models of Market Data
  • 36. CRUD Models of Market Data Listing - ticker - exchange - trading status - .... * Fundamentals - earnings per share - market cap - ... primary listing Org - name - ... Price - date - value - currency * Universe - name - ... ** other listings merged with spin-off from
  • 37. Answerable? Q: Was AOL in Investible Universe in 1995? Known Equities Active Also: ● Market Cap > X ● Price > Y ● Avg Turnover > Z ● ... Investible
  • 38. 2001 - Merger of AOL/Time Warner Listings .findByTicker("AOL") .setTicker("TWX") t 1995 2000 2005 2010 2015 Time Warner Listing ticker: AOL trading status: Listed .... Listing ticker: TWX trading status: Listed .... Org name: Time Warner Org name: America Online Orgs .findByName("Time Warner") .mergeWith("America Online") Orgs .findByName("America Online") .setName("AOL Time Warner) .addListing("TWX") Listings .findByTicker("TWX") .setTradingStatus("Delisted") America Online AOL Time Warner Time Warner AOL Time Warner
  • 39. 2001 - Merger of AOL/Time Warner Listings .findByTicker("AOL") .setTicker("TWX") t 1995 2000 2005 2010 2015 Listing ticker: AOL trading status: Listed .... Listing ticker: TWX trading status: Delisted ... Org name: Time Warner Org name: AOL Time Warner Orgs .findByName("Time Warner") .mergeWith("America Online") Orgs .findByName("America Online") .setName("AOL Time Warner) .addListing("TWX") Listings .findByTicker("TWX") .setTradingStatus("Delisted") merged with America Online Time Warner AOL Time Warner Time Warner AOL Time Warner
  • 40. 2003 - Name / Ticker Change Listings .findByTicker("AOL") .setTicker("TWX") t 1995 2000 2005 2010 2015 Listing ticker: TWX (old) trading status: Delisted ... Org name: Time Warner Org name: AOL Time Warner Listing ticker: AOL trading status: Listed .... Org .findByName("AOL Time Warner") .setName("Time Warner") Listings .findByTicker("AOL") .setTicker("TWX") merged with Time Warner America Online AOL Time Warner Time Warner AOL Time Warner
  • 41. 2003 - Name / Ticker Change Listings .findByTicker("AOL") .setTicker("TWX") Listing ticker: TWX (old) trading status: Delisted ... Org name: Time Warner Org name: Time Warner Listing ticker: TWX old ticker: AOL trading status: Listed Org .findByName("AOL Time Warner") .setName("Time Warner") Listings .findByTicker("AOL") .setTicker("TWX") merged with t 1995 2000 2005 2010 2015 Time Warner AOL Time Warner Time WarnerAmerica Online AOL Time Warner
  • 42. 2009 - AOL Spinoff Listing ticker: TWX (old) trading status: Delisted ... Org name: Time Warner Org name: Time Warner Listing ticker: TWX old ticker: AOL trading status: Listed merged with Org .findByName("Time Warner") .spinOff("AOL") .addListing("AOL") Listings .newListing("AOL") t 1995 2000 2005 2010 2015 Time Warner AOL Time Warner Time WarnerAmerica Online AOL Time Warner
  • 43. 2009 - AOL Spinoff Listing ticker: TWX (old) trading status: Delisted ... Org name: Time Warner Org name: Time Warner Listing ticker: TWX old ticker: AOL trading status: Listed merged with Org name: AOL Listing ticker: AOL trading status: Listed ... spin-off from Org .findByName("Time Warner") .spinOff("AOL") .addListing("AOL") Listings .newListing("AOL") t 1995 2000 2005 2010 2015 Time Warner AOL Time Warner Time WarnerAmerica Online AOL Time Warner
  • 44. Answerable with CRUD Models? Q: Was AOL in Investible Universe in 1995? A: No and Yes ○ No current AOL org (didn't exist at time) ○ Yes former America Online (now Time Warner) Complexity in query, requires previous states ○ Query against version columns with date ranges? ○ Query against previous versions tables?
  • 45. CRUD Models - How to Update? ● Update price of TWX on 1995-01-03 ○ Original: $56.22 Correct: $52.62 ● Complexity in query ○ Wrong: Listings.findByTicker("TWX") // this is AOL! ○ Right: Complex historical query... ● Complexity in update ○ Org Primary Listing - any change? ○ Org Universe membership - any changes? ○ Support Two-Dimensional Time aka as-of query?
  • 48. Main Problem Unanswerable questions ● Time travel intractable ● Past not always reproducible
  • 49. More Problems ● Correctness ○ Divergent interpretations of data ● Availability ○ How often can data be unavailable to clients? ● Performance ○ How fast must operations complete? ● Determinism ○ Reproducing prior states for reporting, debugging, etc. ● Auditability ○ Who changed what when and why?
  • 50. Problems - Correctness ● Need a new definition of e.g. adjusted price ○ Old: unadjusted * splits * spin-offs ○ New: unadjusted * splits * spin-offs * dividends ● But... ○ Some client systems still need the old definition ○ CRUD data store didn't store the individual factors ● Common Solutions ○ Add past to relational model? Reprocess?
  • 51. Problems - Availability ● How long can data be unavailable? ○ Not long - End-user client ○ Hoursto Days - Reporting ● But... ○ Most stringent of client requirements applies to all ○ Cascading failures: unavailability propagates ● Common solutions ○ bulk-heading, circuit-breakers, more servers ○ more complex than necessary?
  • 52. Problems - Performance ● How fast must operations complete? ○ Writes need to keep up with input ○ Reads have varying requirements ● But.. ○ Due to contention on Shared Mutable State, badly performing Reads can impact everything else ● Common Solutions ○ Caching, sharding, more server resources ○ Trade-off with ACID consistency
  • 53. Problems - Determinism ● Reproducing prior states ○ Reporting Consistently on a Past period ■ Apply adjustments only to end of the period ○ Debugging ■ Reproduce state of data in past ● But.. ○ Not easy with Shared Mutable State! ● Common Solutions ○ Versioned rows, audit tables, database snapshots
  • 54. Problems - Auditability ● Why did data change? ○ Attribution (source of data) ○ Security (who did it) ○ History (what and when was previous value) ● But.. ○ Not easy with Shared Mutable State! ● Common Solutions ○ Versioned rows, audit tables
  • 55. Event Sourced Models of Market Data
  • 57. Event Sourced Listings Listing Aggregate Other Reference Data... ListingListed ListingDelisted ListingDeleted Ticker ISIN Org Name Reference Data ListingInitialized ListingChanged ListingIsinChanged ListingTickerChanged Listing Reference Data Processor Our Listing Id Vendor Listing Id ListingUndeleted ListingSourceOrgIdChanged Exch External Market Data APIs Background Process Vendor Org Id
  • 58. Event Sourced Orgs Org Aggregate ListingListed ListingDelisted ListingDeleted ListingIsinChanged ListingRicChanged Org Lifecycle Data Processor Our Org Id Reference Data Reference Data Org Name Vendor Org Id ListingUndeleted ListingSourceOrgIdChanged Listing Reference Data Processor (state…) OrgListed OrgDelisted OrgDeleted OrgUndeleted OrgListingAdded OrgListingRemoved OrgEnteredUniverse OrgLeftUniverse OrgPrimaryListingChanged Listing 1..n primary
  • 59. Vocabulary 1. Domain-Driven Design (DDD) ○ Event: something that happened in the past; a fact; a state transition ○ Command: a request for a state transition in the domain ○ Aggregate: domain objects in a transactionally-consistent unit 2. Event Sourcing (ES) ○ Projection ○ Read Model
  • 62. Stock Universes - Read Model ● Generate as CSV files in S3, clients retrieve via REST API GET /universes/1995-01-03 Org Id,Known Universe?,Active Universe?,Investible Universe?,Primary Listing Id, … Ticker "49498",true,true,true,"0x00100b000b569402","US8873173038",..."AOL" ● Problems? ○ Correctness: Interpretation only for this use case ○ Availability: No impact on other use cases ○ Performance: No read-side calculations ○ Determinism: Can re-generate from event source data ○ Auditability: Available in event source data ● Consistency is at domain level -- entire history of universes in this case ○ Generate entire history to S3 bucket, API switches buckets atomically
  • 63. Answerable via Event Sourcing? Q: Was AOL in Investible Universe in 1995? GET /universes/1995-01-03 Org Id,Known Universe?,Active Universe?,Investible Universe?,Primary Listing Id, …, Ticker "49498",true,true,true,"0x00100b000b569402","US8873173038",...,"AOL" A: Yes! former America Online (now Time Warner) Trivial query against purpose-built Read Model
  • 64. Org History - Read Model ● Build directly from indexed event stream, clients retrieve via REST API GET /orgs?ticker=TRW [ { eventType: "ListingListed", listing_id: "1", ticker: "AOL", processedAt: "...", effectiveAt: "...", …} { eventType: "OrgListed", … }, { eventType: "ListingAdded", listing_id: "1", …}, … { eventType: "ListingTickerChanged", listing_id: "1", old_ticker: "AOL", ticker: "TWX" …}, ... ] ● Problems? ○ Correctness: Interpretation only for this use case ○ Availability: No impact on other use cases ○ Performance: No read-side calculations ○ Determinism: Reads directly from event source data ○ Auditability: Available in event source data ● Consistency is at domain level -- entire history of single org in this case ○ Generate entire history on-the-fly directly from source (indexed!)
  • 65. Vocabulary 1. Domain-Driven Design (DDD) ○ Event: something that happened in the past; a fact; a state transition ○ Command: a request for a state transition in the domain ○ Aggregate: domain objects in a transactionally-consistent unit 2. Event Sourcing (ES) ○ Projection: to derive current state from the stream of events ○ Read Model: a model of current state designed to answer a query
  • 66. Read Model vs Cache A cache is a query intermediary. It holds a previous response until invalidated, then queries again. A read model does not query. Applying new events to it changes the answer it returns. Example: count of people in room ● both may return 10 when the answer is now 11 or 9 - staleness ● cache counts the people - slow and may require locking the doors ● read model applies the entrance/exit events - like a click counter - no impact on people nor doors
  • 68. Conclusions ● Answer the "unanswerable" ● Avoid impacts on other use cases ● Keep facts that may answer future questions
  • 71. Events vs Audit Tables or Versioned Rows An audit table or row version column use shared mutable state as the source of truth, and additionally store some history. Inconsistencies are hard to fix, edits of the past are challenging, new use cases can be challenging. An event store is an append-only list of immutable facts. What has occurred is recorded, and can be replayed to interpret according to a future use case. Example: count of people in room ● audit table is a logbook in the room - query may be complex, facts may not be consistent, depends on keeping it up to date ● versioned rows is a logbook carried by each person - same issues as audit table ● event store is "just the facts", only interpretation changes