SlideShare a Scribd company logo
1 of 51
Download to read offline
Large-Scale Architecture
The Unreasonable
Effectiveness of Simplicity
Randy Shoup
@randyshoup
Background
@randyshoup
@randyshoup
• 1995 Monolithic Perl
• 1996-2002 v2
o Monolithic C++ ISAPI DLL
o 3.4M lines of code
o Compiler limits on number of methods per class
• 2002-2006 v3 Migration
o Java “mini-applications”
o Shared databases
• 2012 v4 Microservices
• 2017 v5 Microservices
@randyshoup
eBay Architecture
• 1995-2001 Obidos
o Monolithic Perl / Mason frontend over C
backend
o ~4GB application in a 4GB address space
o Regularly breaking Gnu linker
o Restarting every 100-200 requests for memory
leaks
o Releasing once per quarter
• 2001-2005 Service Migration
o Services in C++, Java, etc.
o No shared databases
• 2006 AWS launches
@randyshoup
Amazon Architecture
No one starts with microservices
…
Past a certain scale, everyone ends
up with microservices
@randyshoup
Large-Scale Architecture
•Simple Components
•Simple Interactions
•Simple Changes
•Putting It All Together
Large-Scale Architecture
•Simple Components
•Simple Interactions
•Simple Changes
•Putting It All Together
“There are two methods in
software design: One is to make
the program so simple, there are
obviously no errors. The other is
to make it so complicated, there
are no obvious errors.”
-- Tony Hoare
@randyshoup
Modular Services
• Service boundaries match the problem
domain
• Service boundaries encapsulate
business logic and data
o All interactions through published service interface
o Interface hides internal implementation details
o No back doors
• Service boundaries encapsulate
architectural -ilities
o Fault isolation
o Performance optimization
o Security boundary
@randyshoup
Orthogonal Domain Logic
• Stateless domain logic
o Ideally stateless pure function
o Matches domain problem as directly as possible
o Deterministic and testable in isolation
o Robust to change over time
• “Straight-line processing”
o Straightforward, synchronous, minimal branching
• Separate domain logic from I/O
o Hexagonal architecture, Ports and Adapters
o Functional core, imperative shell
@randyshoup
Sharding
• Shards partition the service’s “data space”
o Units for distribution, replication, processing, storage
o Hidden as internal implementation detail
• Shards encapsulate architectural -ilities
o Resource isolation
o Fault isolation
o Availability
o Performance
• Shards are autoscaled
o Divide or scale out as processing or data needs increase
o E.g., DynamoDB partitions, Aurora segments, Bigtable
tablets
@randyshoup
Service Graph
• Common services provide and
abstract widely-used capabilities
• Service topology
o Services call others, which call others, etc.
o Graph, not a strict layering
• Simplifies concerns for service team
o Concentrate on business logic
o Abstract away details of dependencies
o Focus on services you need, capabilities you
provide
@randyshoup
Common Platform
• “Paved Road”
o Shared infrastructure
o Standard frameworks
o Developer experience
o E.g., Netflix, Google
• Separation of Concerns
o Reduce cognitive load on stream-aligned
teams
o Bound decisions through enabling constraints
@randyshoup
Large-scale organizations often
invest more than 50% of
engineering effort in platform
capabilities
@randyshoup
Service Ecosystem
@randyshoup
Evolving Services
• Variation and Natural Selection
o Create / extract new services when needed to
solve a problem
o Services justify their continued existence through
usage
o Deprecate services when they are no longer used
• Services grow and evolve over time
o Factor out common libraries and services as
needed
o Teams and services split like “cellular mitosis”
@randyshoup
“Every service at
Google is either
deprecated or not
ready yet.”
@randyshoup
Large-Scale Architecture
•Simple Components
•Simple Interactions
•Simple Changes
•Putting It All Together
“Everything should
be made as simple
as possible, but not
simpler.”
@randyshoup
Event-Driven
• Communicate state changes as
stream of events
o Statement that some interesting thing
occurred
o Ideally represents a semantic domain event
• Decouples domains and teams
o Abstracted through a well-defined
interface
o Asynchronous from one another
• Simplifies component
implementation
@randyshoup
Immutable Log
• Store state as immutable log of events
o Event Sourcing
• Often matches domain
o E.g., Stitch Fix order processing / delivery state
• Log encapsulates architectural –ilities
o Durable
o Traceable and auditable
o Replayable
o Explicit and comprehensible
• Compact snapshots for efficiency
@randyshoup
Immutable Log
• Stitch Fix order states
Request a fix fix_scheduled
Assign fix to warehouse fix_hizzy_assigned
Assign fix to a stylist fix_stylist_assigned
Style the fix fix_styled
Pick the items for the fix fix_picked
Pack the items into a box fix_packed
Ship the fix via a carrier fix_shipped
Fix travels to customer fix_delivered
Customer decides, pays fix_checked_out
Embrace Asynchrony
• Decouples operations in time
o Decoupled availability
o Independent scalability
o Allows more complex processing, more
processing in parallel
o Safer to make independent changes
• Simplifies component
implementation
@randyshoup
Embrace Asynchrony
• Invert from synchronous call
graph to async dataflow
o Exploit asymmetry between writes and
reads
o Can be orders of magnitude less
resource intensive
@randyshoup
Large-Scale Architecture
•Simple Components
•Simple Interactions
•Simple Changes
•Putting It All Together
“A complex system that works is
invariably found to have evolved
from a simple system that
worked.”
-- Gall’s Law
@randyshoup
Incremental Change
• Decompose every large change into small
incremental steps
• Each step maintains backward / forward
compatibility of data and interfaces
• Multiple service versions commonly coexist
o Every change is a rolling upgrade
o Transitional states are normal, not exceptional
Continuous Testing
• Tests help us go faster
o Tests are “solid ground”
o Tests are the safety net
• Tests make better code
o Confidence to break things
o Courage to refactor mercilessly
• Tests make better systems
o Catch bugs earlier, fail faster
@randyshoup
Developer Productivity
• 75% reading
existing code
• 20% modifying
existing code
• 5% writing new
code
https://blogs.msdn.microsoft.com/peterhal/2006/01/04/what-do-programmers-really-do-anyway-aka-part-2-of-the-yardstick-saga/
@randyshoup
Developer Productivity
• 75% reading
existing code
• 20% modifying
existing code
• 5% writing new
code
https://blogs.msdn.microsoft.com/peterhal/2006/01/04/what-do-programmers-really-do-anyway-aka-part-2-of-the-yardstick-saga/
@randyshoup
Continuous Testing
• Tests make better designs
o Modularity
o Separation of Concerns
o Encapsulation
@randyshoup
“There’s a deep synergy between
testability and good design. All of the
pain that we feel when writing unit tests
points at underlying design problems.”
@randyshoup
-- Michael Feathers
Test-Driven Development
• è Basically no bug tracking system (!)
o “Inbox Zero” for bugs
o Bugs are fixed as they come up
o Backlog contains features we want to build
o Backlog contains technical debt we want to repay
@randyshoup
Canary Deployments
• Staged rollout
o Go slowly at first; go faster when you gain confidence
• Automated rollout / rollback
o Automatically monitor changes to metrics
o If metrics look good, keep going; if metrics look bad, roll back
• Make deployments routine and boring
@randyshoup
Feature Flags
• Configuration “flag” to enable / disable a feature
for a particular set of users
o Independently discovered at eBay, Facebook, Google, etc.
• More solid systems
o Decouple feature delivery from code delivery
o Rapid on and off
o Separate experiment and control groups
o Develop / test / verify in production
@randyshoup
Continuous Delivery
• Deploy services multiple times per day
o Robust build, test, deploy pipeline
o SLO monitoring
o Synthetic monitoring
• More solid systems
o Release smaller, simpler units of work
o Smaller changes to roll back or roll forward
o Faster to repair, easier to understand, simpler to diagnose
o Increase rate of change and reduce risk of change
@randyshoup
• Cross-company Velocity Initiative to
improve software delivery
o Think Big, Start Small, Learn Fast
o Iteratively identify and remove bottlenecks for teams
o “What would it take to deploy your application every
day?”
• Doubled engineering productivity
o 5x faster deployment frequency
o 5x faster lead time
o 3x lower change failure rate
o 3x lower mean-time-to-restore
• Prerequisite for large-scale architecture
changes
@randyshoup
Continuous Delivery
Large-Scale Architecture
•Simple Components
•Simple Interactions
•Simple Changes
•Putting It All Together
System of Record
• Single System of Record
o Every piece of data is owned by a single service
o That service is the canonical system of record for that data
• Every other copy is a read-only, non-authoritative cache
customer-service
styling-service
customer-search
billing-service
@randyshoup
Shared Data
Option 1: Synchronous Lookup
o Customer service owns customer data
o Fulfillment service calls customer service in real time
fulfillment-service
customer-service
@randyshoup
Shared Data
Option 2: Async event + local cache
o Customer service owns customer data
o Customer service sends address-updated event when customer address
changes
o Fulfillment service caches current customer address
fulfillment-service
customer-service
@randyshoup
Joins
Option 1: Join in Client Service
o Get a single customer from customer-service
o Query matching orders for that customer from order-service
Customers
Orders
order-history-page
customer-service order-service
@randyshoup
Joins
Option 2: Service that “Materializes the View”
o Listen to events from customer-service, events from order-service
o Maintain denormalized join of customer data and orders together in local
storage
Customer Orders
customer-order-service
customer-service
order-service
@randyshoup
Netflix Viewing History
• Store and process member’s playback
data
o 1M requests per second
o Used for viewing history, personalization,
recommendations, analytics, etc.
• Original synchronous architecture
o Synchronously write to persistent storage and lookup
cache
o Availability and data loss from backpressure at high
load
• Asynchronous rearchitecture
o Write to durable queue
o Async pipeline to enrich, process, store, serve
o Materialize views to serve reads
@randyshoup Sharma Podila, 2021, Microservices to Async Processing Migration at Scale, QConPlus 2021.
Walmart Item Availability
• Is this item available to ship to this customer?
o Customer SLO 99.98% uptime in 300ms
• Complex logic involving many teams and
domains
o Inventory, reservations, backorders, eligibility, sales caps, etc.
• Original synchronous architecture
o Graph of 23 nested synchronous service calls in hot path
o Any component failure invalidates results
o Service SLOs 99.999% uptime with 50ms marginal latency
o Extremely expensive to build and operate
@randyshoup Scott Havens, 2019, Fabulous Fortunes, Fewer Failures, and Faster Fixes from Functional Fundamentals, DOES 2019.
Walmart Item Availability
@randyshoup Scott Havens, 2019, Fabulous Fortunes, Fewer Failures, and Faster Fixes from Functional Fundamentals, DOES 2019.
Walmart Item Availability
• Invert each service to use async events
o Event-driven “dataflow”
o Idempotent processing
o Event-sourced immutable log
o Materialized view of data from upstream dependencies
• Asynchronous rearchitecture
o 2 services in synchronous hot path
o Async service SLOs 99.9% uptime with latency in seconds
or minutes
o More resilient to delays and outages
o Orders of magnitude simpler to build and operate
@randyshoup Scott Havens, 2019, Fabulous Fortunes, Fewer Failures, and Faster Fixes from Functional Fundamentals, DOES 2019.
Walmart Item Availability
@randyshoup Scott Havens, 2019, Fabulous Fortunes, Fewer Failures, and Faster Fixes from Functional Fundamentals, DOES 2019.
Large-Scale Architecture
•Simple Components
•Simple Interactions
•Simple Changes
•Putting It All Together
Thank you!
@randyshoup
linkedin.com/in/randyshoup
medium.com/@randyshoup

More Related Content

What's hot

Microservices Part 3 Service Mesh and Kafka
Microservices Part 3 Service Mesh and KafkaMicroservices Part 3 Service Mesh and Kafka
Microservices Part 3 Service Mesh and KafkaAraf Karsh Hamid
 
Clean architecture
Clean architectureClean architecture
Clean architectureandbed
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®confluent
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain confluent
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackRich Lee
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud ArchitectureAdrian Cockcroft
 
Building Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaBuilding Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaScyllaDB
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
 
Single page applications
Single page applicationsSingle page applications
Single page applicationsDiego Cardozo
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchhypto
 
Systematic Migration of Monolith to Microservices
Systematic Migration of Monolith to MicroservicesSystematic Migration of Monolith to Microservices
Systematic Migration of Monolith to MicroservicesPradeep Dalvi
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkroutconfluent
 

What's hot (20)

Microservices Part 3 Service Mesh and Kafka
Microservices Part 3 Service Mesh and KafkaMicroservices Part 3 Service Mesh and Kafka
Microservices Part 3 Service Mesh and Kafka
 
Clean architecture
Clean architectureClean architecture
Clean architecture
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Architecture: Microservices
Architecture: MicroservicesArchitecture: Microservices
Architecture: Microservices
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
CQRS
CQRSCQRS
CQRS
 
Building Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaBuilding Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and Kafka
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
Introduction to GraphQL
Introduction to GraphQLIntroduction to GraphQL
Introduction to GraphQL
 
InnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick FiguresInnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick Figures
 
Single page applications
Single page applicationsSingle page applications
Single page applications
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
React workshop
React workshopReact workshop
React workshop
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
Systematic Migration of Monolith to Microservices
Systematic Migration of Monolith to MicroservicesSystematic Migration of Monolith to Microservices
Systematic Migration of Monolith to Microservices
 
Introduction to redis
Introduction to redisIntroduction to redis
Introduction to redis
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
 

Similar to Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity

Scaling Your Architecture for the Long Term
Scaling Your Architecture for the Long TermScaling Your Architecture for the Long Term
Scaling Your Architecture for the Long TermRandy Shoup
 
Effective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldEffective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldRandy Shoup
 
Service Architectures At Scale - QCon London 2015
Service Architectures At Scale - QCon London 2015Service Architectures At Scale - QCon London 2015
Service Architectures At Scale - QCon London 2015Randy Shoup
 
Service Architectures at Scale
Service Architectures at ScaleService Architectures at Scale
Service Architectures at ScaleRandy Shoup
 
Evolving Architecture and Organization - Lessons from Google and eBay
Evolving Architecture and Organization - Lessons from Google and eBayEvolving Architecture and Organization - Lessons from Google and eBay
Evolving Architecture and Organization - Lessons from Google and eBayRandy Shoup
 
Monoliths, Migrations, and Microservices
Monoliths, Migrations, and MicroservicesMonoliths, Migrations, and Microservices
Monoliths, Migrations, and MicroservicesRandy Shoup
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Moving Fast At Scale
Moving Fast At ScaleMoving Fast At Scale
Moving Fast At ScaleRandy Shoup
 
Converting Your Legacy Data to S1000D
Converting Your Legacy Data to S1000DConverting Your Legacy Data to S1000D
Converting Your Legacy Data to S1000Ddclsocialmedia
 
Art of refactoring - Code Smells and Microservices Antipatterns
Art of refactoring - Code Smells and Microservices AntipatternsArt of refactoring - Code Smells and Microservices Antipatterns
Art of refactoring - Code Smells and Microservices AntipatternsEl Mahdi Benzekri
 
Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesIvo Andreev
 
Scaling Your Architecture with Services and Events
Scaling Your Architecture with Services and EventsScaling Your Architecture with Services and Events
Scaling Your Architecture with Services and EventsRandy Shoup
 
Minimal Viable Architecture - Silicon Slopes 2020
Minimal Viable Architecture - Silicon Slopes 2020Minimal Viable Architecture - Silicon Slopes 2020
Minimal Viable Architecture - Silicon Slopes 2020Randy Shoup
 
Developing and Implementing a QA Plan During Your Legacy Data to S1000D
Developing and Implementing a QA Plan During Your Legacy Data to S1000DDeveloping and Implementing a QA Plan During Your Legacy Data to S1000D
Developing and Implementing a QA Plan During Your Legacy Data to S1000Ddclsocialmedia
 
From the Monolith to Microservices - CraftConf 2015
From the Monolith to Microservices - CraftConf 2015From the Monolith to Microservices - CraftConf 2015
From the Monolith to Microservices - CraftConf 2015Randy Shoup
 
DevOps - It's About How We Work
DevOps - It's About How We WorkDevOps - It's About How We Work
DevOps - It's About How We WorkRandy Shoup
 
How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...PerformanceVision (previously SecurActive)
 
Cloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a CacheCloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a Cachecornelia davis
 
5 Steps on the Way to Continuous Delivery
5 Steps on the Way to Continuous Delivery5 Steps on the Way to Continuous Delivery
5 Steps on the Way to Continuous DeliveryXebiaLabs
 

Similar to Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity (20)

Scaling Your Architecture for the Long Term
Scaling Your Architecture for the Long TermScaling Your Architecture for the Long Term
Scaling Your Architecture for the Long Term
 
Effective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldEffective Microservices In a Data-centric World
Effective Microservices In a Data-centric World
 
Service Architectures At Scale - QCon London 2015
Service Architectures At Scale - QCon London 2015Service Architectures At Scale - QCon London 2015
Service Architectures At Scale - QCon London 2015
 
Service Architectures at Scale
Service Architectures at ScaleService Architectures at Scale
Service Architectures at Scale
 
Evolving Architecture and Organization - Lessons from Google and eBay
Evolving Architecture and Organization - Lessons from Google and eBayEvolving Architecture and Organization - Lessons from Google and eBay
Evolving Architecture and Organization - Lessons from Google and eBay
 
Monoliths, Migrations, and Microservices
Monoliths, Migrations, and MicroservicesMonoliths, Migrations, and Microservices
Monoliths, Migrations, and Microservices
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Moving Fast At Scale
Moving Fast At ScaleMoving Fast At Scale
Moving Fast At Scale
 
Converting Your Legacy Data to S1000D
Converting Your Legacy Data to S1000DConverting Your Legacy Data to S1000D
Converting Your Legacy Data to S1000D
 
Art of refactoring - Code Smells and Microservices Antipatterns
Art of refactoring - Code Smells and Microservices AntipatternsArt of refactoring - Code Smells and Microservices Antipatterns
Art of refactoring - Code Smells and Microservices Antipatterns
 
Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challenges
 
Scaling Your Architecture with Services and Events
Scaling Your Architecture with Services and EventsScaling Your Architecture with Services and Events
Scaling Your Architecture with Services and Events
 
Minimal Viable Architecture - Silicon Slopes 2020
Minimal Viable Architecture - Silicon Slopes 2020Minimal Viable Architecture - Silicon Slopes 2020
Minimal Viable Architecture - Silicon Slopes 2020
 
Developing and Implementing a QA Plan During Your Legacy Data to S1000D
Developing and Implementing a QA Plan During Your Legacy Data to S1000DDeveloping and Implementing a QA Plan During Your Legacy Data to S1000D
Developing and Implementing a QA Plan During Your Legacy Data to S1000D
 
From the Monolith to Microservices - CraftConf 2015
From the Monolith to Microservices - CraftConf 2015From the Monolith to Microservices - CraftConf 2015
From the Monolith to Microservices - CraftConf 2015
 
DevOps - It's About How We Work
DevOps - It's About How We WorkDevOps - It's About How We Work
DevOps - It's About How We Work
 
Beyond The Rails Way
Beyond The Rails WayBeyond The Rails Way
Beyond The Rails Way
 
How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...
 
Cloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a CacheCloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a Cache
 
5 Steps on the Way to Continuous Delivery
5 Steps on the Way to Continuous Delivery5 Steps on the Way to Continuous Delivery
5 Steps on the Way to Continuous Delivery
 

More from Randy Shoup

Anatomy of Three Incidents -- Commonalities and Lessons
Anatomy of Three Incidents -- Commonalities and LessonsAnatomy of Three Incidents -- Commonalities and Lessons
Anatomy of Three Incidents -- Commonalities and LessonsRandy Shoup
 
One Terrible Day at Google, and How It Made Us Better
One Terrible Day at Google, and How It Made Us BetterOne Terrible Day at Google, and How It Made Us Better
One Terrible Day at Google, and How It Made Us BetterRandy Shoup
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine LearningRandy Shoup
 
Moving Fast at Scale
Moving Fast at ScaleMoving Fast at Scale
Moving Fast at ScaleRandy Shoup
 
Breaking Codes, Designing Jets, and Building Teams
Breaking Codes, Designing Jets, and Building TeamsBreaking Codes, Designing Jets, and Building Teams
Breaking Codes, Designing Jets, and Building TeamsRandy Shoup
 
Learning from Learnings: Anatomy of Three Incidents
Learning from Learnings: Anatomy of Three IncidentsLearning from Learnings: Anatomy of Three Incidents
Learning from Learnings: Anatomy of Three IncidentsRandy Shoup
 
Minimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good EnoughMinimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good EnoughRandy Shoup
 
Managing Data at Scale - Microservices and Events
Managing Data at Scale - Microservices and EventsManaging Data at Scale - Microservices and Events
Managing Data at Scale - Microservices and EventsRandy Shoup
 
Ten Lessons of the DevOps Transition
Ten Lessons of the DevOps TransitionTen Lessons of the DevOps Transition
Ten Lessons of the DevOps TransitionRandy Shoup
 
Managing Data in Microservices
Managing Data in MicroservicesManaging Data in Microservices
Managing Data in MicroservicesRandy Shoup
 
Pragmatic Microservices
Pragmatic MicroservicesPragmatic Microservices
Pragmatic MicroservicesRandy Shoup
 
A CTO's Guide to Scaling Organizations
A CTO's Guide to Scaling OrganizationsA CTO's Guide to Scaling Organizations
A CTO's Guide to Scaling OrganizationsRandy Shoup
 
Concurrency at Scale: Evolution to Micro-Services
Concurrency at Scale:  Evolution to Micro-ServicesConcurrency at Scale:  Evolution to Micro-Services
Concurrency at Scale: Evolution to Micro-ServicesRandy Shoup
 
Minimum Viable Architecture -- Good Enough is Good Enough in a Startup
Minimum Viable Architecture -- Good Enough is Good Enough in a StartupMinimum Viable Architecture -- Good Enough is Good Enough in a Startup
Minimum Viable Architecture -- Good Enough is Good Enough in a StartupRandy Shoup
 
Why Enterprises Are Embracing the Cloud
Why Enterprises Are Embracing the CloudWhy Enterprises Are Embracing the Cloud
Why Enterprises Are Embracing the CloudRandy Shoup
 
DevOpsDays Silicon Valley 2014 - The Game of Operations
DevOpsDays Silicon Valley 2014 - The Game of OperationsDevOpsDays Silicon Valley 2014 - The Game of Operations
DevOpsDays Silicon Valley 2014 - The Game of OperationsRandy Shoup
 
QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYE
QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYEQCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYE
QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYERandy Shoup
 
QCon Tokyo 2014 - Virtuous Cycles of Velocity: What I Learned About Going Fas...
QCon Tokyo 2014 - Virtuous Cycles of Velocity: What I Learned About Going Fas...QCon Tokyo 2014 - Virtuous Cycles of Velocity: What I Learned About Going Fas...
QCon Tokyo 2014 - Virtuous Cycles of Velocity: What I Learned About Going Fas...Randy Shoup
 
The Importance of Culture: Building and Sustaining Effective Engineering Org...
The Importance of Culture:  Building and Sustaining Effective Engineering Org...The Importance of Culture:  Building and Sustaining Effective Engineering Org...
The Importance of Culture: Building and Sustaining Effective Engineering Org...Randy Shoup
 

More from Randy Shoup (19)

Anatomy of Three Incidents -- Commonalities and Lessons
Anatomy of Three Incidents -- Commonalities and LessonsAnatomy of Three Incidents -- Commonalities and Lessons
Anatomy of Three Incidents -- Commonalities and Lessons
 
One Terrible Day at Google, and How It Made Us Better
One Terrible Day at Google, and How It Made Us BetterOne Terrible Day at Google, and How It Made Us Better
One Terrible Day at Google, and How It Made Us Better
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
 
Moving Fast at Scale
Moving Fast at ScaleMoving Fast at Scale
Moving Fast at Scale
 
Breaking Codes, Designing Jets, and Building Teams
Breaking Codes, Designing Jets, and Building TeamsBreaking Codes, Designing Jets, and Building Teams
Breaking Codes, Designing Jets, and Building Teams
 
Learning from Learnings: Anatomy of Three Incidents
Learning from Learnings: Anatomy of Three IncidentsLearning from Learnings: Anatomy of Three Incidents
Learning from Learnings: Anatomy of Three Incidents
 
Minimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good EnoughMinimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good Enough
 
Managing Data at Scale - Microservices and Events
Managing Data at Scale - Microservices and EventsManaging Data at Scale - Microservices and Events
Managing Data at Scale - Microservices and Events
 
Ten Lessons of the DevOps Transition
Ten Lessons of the DevOps TransitionTen Lessons of the DevOps Transition
Ten Lessons of the DevOps Transition
 
Managing Data in Microservices
Managing Data in MicroservicesManaging Data in Microservices
Managing Data in Microservices
 
Pragmatic Microservices
Pragmatic MicroservicesPragmatic Microservices
Pragmatic Microservices
 
A CTO's Guide to Scaling Organizations
A CTO's Guide to Scaling OrganizationsA CTO's Guide to Scaling Organizations
A CTO's Guide to Scaling Organizations
 
Concurrency at Scale: Evolution to Micro-Services
Concurrency at Scale:  Evolution to Micro-ServicesConcurrency at Scale:  Evolution to Micro-Services
Concurrency at Scale: Evolution to Micro-Services
 
Minimum Viable Architecture -- Good Enough is Good Enough in a Startup
Minimum Viable Architecture -- Good Enough is Good Enough in a StartupMinimum Viable Architecture -- Good Enough is Good Enough in a Startup
Minimum Viable Architecture -- Good Enough is Good Enough in a Startup
 
Why Enterprises Are Embracing the Cloud
Why Enterprises Are Embracing the CloudWhy Enterprises Are Embracing the Cloud
Why Enterprises Are Embracing the Cloud
 
DevOpsDays Silicon Valley 2014 - The Game of Operations
DevOpsDays Silicon Valley 2014 - The Game of OperationsDevOpsDays Silicon Valley 2014 - The Game of Operations
DevOpsDays Silicon Valley 2014 - The Game of Operations
 
QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYE
QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYEQCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYE
QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYE
 
QCon Tokyo 2014 - Virtuous Cycles of Velocity: What I Learned About Going Fas...
QCon Tokyo 2014 - Virtuous Cycles of Velocity: What I Learned About Going Fas...QCon Tokyo 2014 - Virtuous Cycles of Velocity: What I Learned About Going Fas...
QCon Tokyo 2014 - Virtuous Cycles of Velocity: What I Learned About Going Fas...
 
The Importance of Culture: Building and Sustaining Effective Engineering Org...
The Importance of Culture:  Building and Sustaining Effective Engineering Org...The Importance of Culture:  Building and Sustaining Effective Engineering Org...
The Importance of Culture: Building and Sustaining Effective Engineering Org...
 

Recently uploaded

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 

Recently uploaded (20)

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 

Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity

  • 1. Large-Scale Architecture The Unreasonable Effectiveness of Simplicity Randy Shoup @randyshoup
  • 4. • 1995 Monolithic Perl • 1996-2002 v2 o Monolithic C++ ISAPI DLL o 3.4M lines of code o Compiler limits on number of methods per class • 2002-2006 v3 Migration o Java “mini-applications” o Shared databases • 2012 v4 Microservices • 2017 v5 Microservices @randyshoup eBay Architecture
  • 5. • 1995-2001 Obidos o Monolithic Perl / Mason frontend over C backend o ~4GB application in a 4GB address space o Regularly breaking Gnu linker o Restarting every 100-200 requests for memory leaks o Releasing once per quarter • 2001-2005 Service Migration o Services in C++, Java, etc. o No shared databases • 2006 AWS launches @randyshoup Amazon Architecture
  • 6. No one starts with microservices … Past a certain scale, everyone ends up with microservices @randyshoup
  • 7. Large-Scale Architecture •Simple Components •Simple Interactions •Simple Changes •Putting It All Together
  • 8. Large-Scale Architecture •Simple Components •Simple Interactions •Simple Changes •Putting It All Together
  • 9. “There are two methods in software design: One is to make the program so simple, there are obviously no errors. The other is to make it so complicated, there are no obvious errors.” -- Tony Hoare @randyshoup
  • 10. Modular Services • Service boundaries match the problem domain • Service boundaries encapsulate business logic and data o All interactions through published service interface o Interface hides internal implementation details o No back doors • Service boundaries encapsulate architectural -ilities o Fault isolation o Performance optimization o Security boundary @randyshoup
  • 11. Orthogonal Domain Logic • Stateless domain logic o Ideally stateless pure function o Matches domain problem as directly as possible o Deterministic and testable in isolation o Robust to change over time • “Straight-line processing” o Straightforward, synchronous, minimal branching • Separate domain logic from I/O o Hexagonal architecture, Ports and Adapters o Functional core, imperative shell @randyshoup
  • 12. Sharding • Shards partition the service’s “data space” o Units for distribution, replication, processing, storage o Hidden as internal implementation detail • Shards encapsulate architectural -ilities o Resource isolation o Fault isolation o Availability o Performance • Shards are autoscaled o Divide or scale out as processing or data needs increase o E.g., DynamoDB partitions, Aurora segments, Bigtable tablets @randyshoup
  • 13. Service Graph • Common services provide and abstract widely-used capabilities • Service topology o Services call others, which call others, etc. o Graph, not a strict layering • Simplifies concerns for service team o Concentrate on business logic o Abstract away details of dependencies o Focus on services you need, capabilities you provide @randyshoup
  • 14. Common Platform • “Paved Road” o Shared infrastructure o Standard frameworks o Developer experience o E.g., Netflix, Google • Separation of Concerns o Reduce cognitive load on stream-aligned teams o Bound decisions through enabling constraints @randyshoup
  • 15. Large-scale organizations often invest more than 50% of engineering effort in platform capabilities @randyshoup
  • 17. Evolving Services • Variation and Natural Selection o Create / extract new services when needed to solve a problem o Services justify their continued existence through usage o Deprecate services when they are no longer used • Services grow and evolve over time o Factor out common libraries and services as needed o Teams and services split like “cellular mitosis” @randyshoup
  • 18. “Every service at Google is either deprecated or not ready yet.” @randyshoup
  • 19. Large-Scale Architecture •Simple Components •Simple Interactions •Simple Changes •Putting It All Together
  • 20. “Everything should be made as simple as possible, but not simpler.” @randyshoup
  • 21. Event-Driven • Communicate state changes as stream of events o Statement that some interesting thing occurred o Ideally represents a semantic domain event • Decouples domains and teams o Abstracted through a well-defined interface o Asynchronous from one another • Simplifies component implementation @randyshoup
  • 22. Immutable Log • Store state as immutable log of events o Event Sourcing • Often matches domain o E.g., Stitch Fix order processing / delivery state • Log encapsulates architectural –ilities o Durable o Traceable and auditable o Replayable o Explicit and comprehensible • Compact snapshots for efficiency @randyshoup
  • 23. Immutable Log • Stitch Fix order states Request a fix fix_scheduled Assign fix to warehouse fix_hizzy_assigned Assign fix to a stylist fix_stylist_assigned Style the fix fix_styled Pick the items for the fix fix_picked Pack the items into a box fix_packed Ship the fix via a carrier fix_shipped Fix travels to customer fix_delivered Customer decides, pays fix_checked_out
  • 24. Embrace Asynchrony • Decouples operations in time o Decoupled availability o Independent scalability o Allows more complex processing, more processing in parallel o Safer to make independent changes • Simplifies component implementation @randyshoup
  • 25. Embrace Asynchrony • Invert from synchronous call graph to async dataflow o Exploit asymmetry between writes and reads o Can be orders of magnitude less resource intensive @randyshoup
  • 26. Large-Scale Architecture •Simple Components •Simple Interactions •Simple Changes •Putting It All Together
  • 27. “A complex system that works is invariably found to have evolved from a simple system that worked.” -- Gall’s Law @randyshoup
  • 28. Incremental Change • Decompose every large change into small incremental steps • Each step maintains backward / forward compatibility of data and interfaces • Multiple service versions commonly coexist o Every change is a rolling upgrade o Transitional states are normal, not exceptional
  • 29. Continuous Testing • Tests help us go faster o Tests are “solid ground” o Tests are the safety net • Tests make better code o Confidence to break things o Courage to refactor mercilessly • Tests make better systems o Catch bugs earlier, fail faster @randyshoup
  • 30. Developer Productivity • 75% reading existing code • 20% modifying existing code • 5% writing new code https://blogs.msdn.microsoft.com/peterhal/2006/01/04/what-do-programmers-really-do-anyway-aka-part-2-of-the-yardstick-saga/ @randyshoup
  • 31. Developer Productivity • 75% reading existing code • 20% modifying existing code • 5% writing new code https://blogs.msdn.microsoft.com/peterhal/2006/01/04/what-do-programmers-really-do-anyway-aka-part-2-of-the-yardstick-saga/ @randyshoup
  • 32. Continuous Testing • Tests make better designs o Modularity o Separation of Concerns o Encapsulation @randyshoup
  • 33. “There’s a deep synergy between testability and good design. All of the pain that we feel when writing unit tests points at underlying design problems.” @randyshoup -- Michael Feathers
  • 34. Test-Driven Development • è Basically no bug tracking system (!) o “Inbox Zero” for bugs o Bugs are fixed as they come up o Backlog contains features we want to build o Backlog contains technical debt we want to repay @randyshoup
  • 35. Canary Deployments • Staged rollout o Go slowly at first; go faster when you gain confidence • Automated rollout / rollback o Automatically monitor changes to metrics o If metrics look good, keep going; if metrics look bad, roll back • Make deployments routine and boring @randyshoup
  • 36. Feature Flags • Configuration “flag” to enable / disable a feature for a particular set of users o Independently discovered at eBay, Facebook, Google, etc. • More solid systems o Decouple feature delivery from code delivery o Rapid on and off o Separate experiment and control groups o Develop / test / verify in production @randyshoup
  • 37. Continuous Delivery • Deploy services multiple times per day o Robust build, test, deploy pipeline o SLO monitoring o Synthetic monitoring • More solid systems o Release smaller, simpler units of work o Smaller changes to roll back or roll forward o Faster to repair, easier to understand, simpler to diagnose o Increase rate of change and reduce risk of change @randyshoup
  • 38. • Cross-company Velocity Initiative to improve software delivery o Think Big, Start Small, Learn Fast o Iteratively identify and remove bottlenecks for teams o “What would it take to deploy your application every day?” • Doubled engineering productivity o 5x faster deployment frequency o 5x faster lead time o 3x lower change failure rate o 3x lower mean-time-to-restore • Prerequisite for large-scale architecture changes @randyshoup Continuous Delivery
  • 39. Large-Scale Architecture •Simple Components •Simple Interactions •Simple Changes •Putting It All Together
  • 40. System of Record • Single System of Record o Every piece of data is owned by a single service o That service is the canonical system of record for that data • Every other copy is a read-only, non-authoritative cache customer-service styling-service customer-search billing-service @randyshoup
  • 41. Shared Data Option 1: Synchronous Lookup o Customer service owns customer data o Fulfillment service calls customer service in real time fulfillment-service customer-service @randyshoup
  • 42. Shared Data Option 2: Async event + local cache o Customer service owns customer data o Customer service sends address-updated event when customer address changes o Fulfillment service caches current customer address fulfillment-service customer-service @randyshoup
  • 43. Joins Option 1: Join in Client Service o Get a single customer from customer-service o Query matching orders for that customer from order-service Customers Orders order-history-page customer-service order-service @randyshoup
  • 44. Joins Option 2: Service that “Materializes the View” o Listen to events from customer-service, events from order-service o Maintain denormalized join of customer data and orders together in local storage Customer Orders customer-order-service customer-service order-service @randyshoup
  • 45. Netflix Viewing History • Store and process member’s playback data o 1M requests per second o Used for viewing history, personalization, recommendations, analytics, etc. • Original synchronous architecture o Synchronously write to persistent storage and lookup cache o Availability and data loss from backpressure at high load • Asynchronous rearchitecture o Write to durable queue o Async pipeline to enrich, process, store, serve o Materialize views to serve reads @randyshoup Sharma Podila, 2021, Microservices to Async Processing Migration at Scale, QConPlus 2021.
  • 46. Walmart Item Availability • Is this item available to ship to this customer? o Customer SLO 99.98% uptime in 300ms • Complex logic involving many teams and domains o Inventory, reservations, backorders, eligibility, sales caps, etc. • Original synchronous architecture o Graph of 23 nested synchronous service calls in hot path o Any component failure invalidates results o Service SLOs 99.999% uptime with 50ms marginal latency o Extremely expensive to build and operate @randyshoup Scott Havens, 2019, Fabulous Fortunes, Fewer Failures, and Faster Fixes from Functional Fundamentals, DOES 2019.
  • 47. Walmart Item Availability @randyshoup Scott Havens, 2019, Fabulous Fortunes, Fewer Failures, and Faster Fixes from Functional Fundamentals, DOES 2019.
  • 48. Walmart Item Availability • Invert each service to use async events o Event-driven “dataflow” o Idempotent processing o Event-sourced immutable log o Materialized view of data from upstream dependencies • Asynchronous rearchitecture o 2 services in synchronous hot path o Async service SLOs 99.9% uptime with latency in seconds or minutes o More resilient to delays and outages o Orders of magnitude simpler to build and operate @randyshoup Scott Havens, 2019, Fabulous Fortunes, Fewer Failures, and Faster Fixes from Functional Fundamentals, DOES 2019.
  • 49. Walmart Item Availability @randyshoup Scott Havens, 2019, Fabulous Fortunes, Fewer Failures, and Faster Fixes from Functional Fundamentals, DOES 2019.
  • 50. Large-Scale Architecture •Simple Components •Simple Interactions •Simple Changes •Putting It All Together