SlideShare a Scribd company logo
1 of 48
Download to read offline
Genomic Computation at Scale
with Serverless, StackStorm, and Docker
SC17, 14 Nov 2017
Dmitri Zimine
Fellow @ Extreme Networks
@dzimine
Image by Miki Yoshihito, Creative Commons license
Genomic Sequencing and Annotation
ACGTGACCGGTACTGGTAACGTACA
CCTACGTGACCGGTACTGGTAACGT
ACGCCTACGTGACCGGTACTGGTAA
CGTATACACGTGACCGGTACTGGTA
ACGTACACCTACGTGACCGGTACTG
CTGGTAACGTATACCTCT...
Sequencer
Sequenced Genome
DNA Sample
Annotated Sequence
Compute
in silko
3
So that…
Source: http://www.yourgenome.org
Victor Solovyev
Partner,
Leading scientist in
computational
biology
Victor Solovyev is a leading scientist in computational biology. His
experience is a good mixture of academic positions, including Professor
at Royal Holloway and KAUST, and various industry roles. His research
on bioinformatics and genomic computations are published in Nature,
Science, Genome Research and highly cited.
As Chief Sci. Officer at Softberry, he is leading software development
for biomedical data analysis and research in computational biology.
Softberry software products have been used in over 2000 research
publications in 2016 alone. Fgenesh program has been cited in ~ 3200,
Bprom program in ~ 800, Fgenesb pipeline in ~500 scientific
publications.
5
fgenesb pipeline: some [prev] results
PROPERTIES:
Challenges:
• Offer annotation pipelines online
• Use cloud, for large elastic capacity
• Handle scale - spiky workload
• Economically
GAaaS – Genomic Annotation as a Service
Agenda
8
Problem &
Solution
Domain demands, technology selection
& serverless, toolchain, solution overview
Show & Tell Demo
Discussion
Lessons learned, what to keep & what to
refactor, the path forward
Typicalgenomicannotationpipeline
Search for similar
proteins in
databases
KEGG
Prediction of
genes and
proteins
Compilation and
presentation of
results
NR
fgenesb
Blast(NR)
GCView
50-100Gb
KOALA(KEGG)
1Mb-3Gb
Highly
Parallel-able
Annotation Pipelines
A basic exome pipeline
delivering called variants from
raw sequence could consist of
as few as 12 steps, most of
which can be run in parallel,
but a real analysis will typically
involve several additional
downstream steps and
complex report generation.
Source: Brief Bioinform bbw020.
DOI: https://doi.org/10.1093/bib/bbw020
Annotation Pipelines
A basic exome pipeline
delivering called variants from
raw sequence could consist of
as few as 12 steps, most of
which can be run in parallel,
but a real analysis will typically
involve several additional
downstream steps and
complex report generation.
Source: Brief Bioinform bbw020.
DOI: https://doi.org/10.1093/bib/bbw020
PROPERTIES:
• Steps:
• jobs/functions
• Run times – may be hours & days
• Diverse (a.k.a. “don’t run on the same box”)
• Workflow orchestration:
• Logical patterns: splits, parallels, joins
• Data flow:
Upstream results –> downstream inputs
• Scale dimentions: spiky load
• Low volume of requests,
• Very high compute demand per request
Properties:
Serverless
Authoritative: Mike Roberts on martinfowler.com:
My summary
• Function, not service: “down when done”
• Scale – elastic, infinite, transparent for developer
• Pay per use consumption model
https://goo.gl/bTfgfU
What is Serverless?
14
Serverless fits!
*) BYOC – Bring Your Own Code (see the serverless compute manifesto, https://goo.gl/q9HsXB
Typical Serverless requirements:
• “Functions”, not “servers”,
down when done
• Elastic scale:
handle spiky workload pattern
• BYOC*:
package algorithms into containers
• Launch on a variety of events
Additional requirements:
• Long running times: hours
• Pipeline orchestration:
execution logic and data
passing
• Local Dev environment,
consistent and convenient
15
Serverless fits, but…
Typical Serverless requirements:
• Elastic scale:
handle spiky workload pattern
• “Functions”, not “servers”,
down when done
• BYOC*: package programs into
containers, run everywhere
• Launch on a variety of events
Why not <…>
16
AWS Lambda?
5 min limitation
- jobs run for hours and days
Azure?
No native support for Functions
in docker containers *
OpenWhisk?
Lacks powerful workflow to
orchestrate pipelines (only
sequences)
*) At the time of selecting. I will cover ”what has changed” in Discussion.
D I Y
18
Terraform provisions infra on AWS (WIP);
Vagrant for local dev infra.
Ansible deploys & cofigures software on
Infra.
Docker to containerize functions and
push to local Docker Registry.
StackStorm orchestrates pipeline
executions,
invokes Swarm to run functions,
dynamically scales Swarm on load.
Tool Chain
StackStorm,in1minute
ActionsSensors
WorkflowsRules
IT Domains
Config mgmtStorageNetworking ContainersCloud InfraMonitoring Ops Support
Triggers Calls
©2017 Extreme Networks, Inc. All rights reserved
StackStorm is like …
ActionsSensors
WorkflowsRules
Step Functions
AWS Lambda
OpenSource, for DIY Serverless
Three Sides to Serverless Story
DevOps
Developer
End User
Submits sequence,
Gets results,
fast and cheap.
Packs algorithms in
containers,
Defines pipelines
Provides
infrastructure
1. DevOps: deploys serverless solution
23
share(:rw) data(:ro)
StackStorm
other infra…
f(x)
Registry
Controller
f(x)
f(x)
f(x)
Worker
f(x)
f(x)
f(x)
Worker
f(x)
f(x)
f(x)
Worker
/share /data
$ function
Scale
DevOps
2. Developer:
creates functions, defines pipeline
25
StackStorm
Registry
Create functions (BYOC),
pack into Docker image,
push to local Registry
Define pipelines
as StackStorm workflows
Developer
1
2
f(x)
f(x)
f(x)
f(x)
StackStorm
StackStorm
sends results
back to user
Swarm
controller
2
46
Docker pulls
function’s images
5
Functions run in
containers, produce
data
f(x)
StackStorm runs workflow
schedules functions
as jobs on Swarm
Swarm
Worker
3
Swarm schedules
services
User sends
sequence data1
f(x) f(x)
Registry
3. User submits data,
System runs pipeline & produces results
End
User
27
Genomic annotation pipeline
with StackStorm, Docker,
and Docker Swarm
Show & Tell, PART 1
Scale: dynamically, on load
29
share(:rw) data(:ro)
StackStorm
other infra…
f(x)
Registry
Controller
f(x)
f(x)
f(x)
Worker
f(x)
f(x)
f(x)
Worker
f(x)
Worker
Scale
30
Show & Tell, PART 2
Dynamically scaling
Swarm cluster on AWS,
on workload
Agenda
32
Problem &
Solution
Domain demands, technology selection
& serverless, toolchain, solution overview
Show & Tell Demo
Discussion
Lessons learned, what to keep & what to
refactor, the path forward
Serverless hype accelerates
25+ framewors … but no turn-key fit yet
Kubernetes Won Container Arm Race
now with built-in AWS autoscaler .
Azure Introduced Container Instances
no messing with VMs, per-second billing .
We are outpaced by technology
We are outpaced by technology
So What?
Path Forward: Options
Option 1: Kubernetes
• Use Kubernetes pack from StackStorm Exchange
• Utilize k8s “run to completion” jobs
• Deploy on AWS, minikube for local development,
• Leverage AWS autoscaler for elastic capacity
StackStorm handles pipeline workflow, calls k8s Jobs.
Same app developer experience.
39
Path Forward: Options
Option 2: Azure
• Use Azure’s ”Self-orchestration” option with StackStorm
• Azure provides containers on demand (no VMs!)
• Per container, per second billing
StackStorm handles pipeline workflow, calls Azure containers.
App developer experience stays the same.
40
StackStorm
StackStorm
sends results
back to user
Azure
Container
Service
2
46
Docker pulls
function’s images
from Registry
5
Functions run in
containers, produce
data
f(x)
StackStorm runs workflow
schedules functions
as containers on Azure
Azure
Container
Instance
3
Azure schedules
container instances
User sends
sequence data1
f(x) f(x)
Registry
Path forward: Change to Azure Container Instances
End
User
42
43
STACKSTORM EVENT-DRIVEN AUTOMATION ALLOWS YOU TO GET YOUR
SOLUTION UP AND RUNNING QUICKLY SO YOU CAN DELIVER BUSINESS FAST,
EXPERIMENT AND INNOVATE. ONCE YOU HAVE IT JUST RIGHT, YOU CAN BUILD
A MORE PERMANENT VERSION WITH MICROSERVICES
ActionsSensors
WorkflowsRules
44
StackStorm is an innovation platform
where we can build solutions,
experiment and learn,
while deliver business value,
before moving implementation to
dedicated services
46
StackStorm OpenSource
Platform
Brocade Workflow Composer
(StackStorm Enterprise Edition)
Network Automation
StackStorm Exchange
Community
Security Assisted
Networking
©2017 Extreme Networks, Inc. All rights reserved
Come and see! SC17 Excibition, Booth #519
47
Image by Miki Yoshihito, Creative Commons license
Dmitri Zimine
Extreme Networks
@dzimine
http://github.com/dzimine/serverless-swarm
@Stack_Storm
http://github.com/StackStorm/st2
Star 2,317
Thank You!

More Related Content

What's hot

Mistral Atlanta design session
Mistral Atlanta design sessionMistral Atlanta design session
Mistral Atlanta design sessionRenat Akhmerov
 
Splunk Conf 2014 - Splunking the Java Virtual Machine
Splunk Conf 2014 - Splunking the Java Virtual MachineSplunk Conf 2014 - Splunking the Java Virtual Machine
Splunk Conf 2014 - Splunking the Java Virtual MachineDamien Dallimore
 
How we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHow we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHenning Spjelkavik
 
Tackling a 1 billion member social network
Tackling a 1 billion member social networkTackling a 1 billion member social network
Tackling a 1 billion member social networkArtur Bańkowski
 
Saltconf16 - Salt is Not Configuration Management
Saltconf16 - Salt is Not Configuration ManagementSaltconf16 - Salt is Not Configuration Management
Saltconf16 - Salt is Not Configuration ManagementDrew Malone
 
Open Source Monitoring Tools Shootout
Open Source Monitoring Tools ShootoutOpen Source Monitoring Tools Shootout
Open Source Monitoring Tools Shootouttomdc
 
Data Driven Security, from Gartner Security Summit 2012
Data Driven Security, from Gartner Security Summit 2012Data Driven Security, from Gartner Security Summit 2012
Data Driven Security, from Gartner Security Summit 2012Nick Galbreath
 
2021 04-15 operational verification (with notes)
2021 04-15 operational verification (with notes)2021 04-15 operational verification (with notes)
2021 04-15 operational verification (with notes)Puppet
 
HA SOA Application with GlusterFS
HA SOA Application with GlusterFSHA SOA Application with GlusterFS
HA SOA Application with GlusterFSzeridon
 
Webinar: Queues with RabbitMQ - Lorna Mitchell
Webinar: Queues with RabbitMQ - Lorna MitchellWebinar: Queues with RabbitMQ - Lorna Mitchell
Webinar: Queues with RabbitMQ - Lorna MitchellCodemotion
 
Un-broken Logging - Operability.io 2015 - Matthew Skelton
Un-broken Logging - Operability.io 2015 - Matthew SkeltonUn-broken Logging - Operability.io 2015 - Matthew Skelton
Un-broken Logging - Operability.io 2015 - Matthew SkeltonSkelton Thatcher Consulting Ltd
 
Spring Boot & Spring Cloud on PAS- Nate Schutta (2/2)
Spring Boot & Spring Cloud on PAS- Nate Schutta (2/2)Spring Boot & Spring Cloud on PAS- Nate Schutta (2/2)
Spring Boot & Spring Cloud on PAS- Nate Schutta (2/2)VMware Tanzu
 
SaltConf 2014: Safety with powertools
SaltConf 2014: Safety with powertoolsSaltConf 2014: Safety with powertools
SaltConf 2014: Safety with powertoolsThomas Jackson
 
[143]Inside fuse deview 2016
[143]Inside fuse   deview 2016[143]Inside fuse   deview 2016
[143]Inside fuse deview 2016NAVER D2
 
Advanced A/B Testing at Wix - Aviran Mordo and Sagy Rozman, Wix.com
Advanced A/B Testing at Wix - Aviran Mordo and Sagy Rozman, Wix.comAdvanced A/B Testing at Wix - Aviran Mordo and Sagy Rozman, Wix.com
Advanced A/B Testing at Wix - Aviran Mordo and Sagy Rozman, Wix.comDevOpsDays Tel Aviv
 
PyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPiyush Kumar
 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageDamien Dallimore
 

What's hot (20)

Testing at Stream-Scale
Testing at Stream-ScaleTesting at Stream-Scale
Testing at Stream-Scale
 
Mistral Atlanta design session
Mistral Atlanta design sessionMistral Atlanta design session
Mistral Atlanta design session
 
Splunk Conf 2014 - Splunking the Java Virtual Machine
Splunk Conf 2014 - Splunking the Java Virtual MachineSplunk Conf 2014 - Splunking the Java Virtual Machine
Splunk Conf 2014 - Splunking the Java Virtual Machine
 
How we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHow we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.no
 
Tackling a 1 billion member social network
Tackling a 1 billion member social networkTackling a 1 billion member social network
Tackling a 1 billion member social network
 
Saltconf16 - Salt is Not Configuration Management
Saltconf16 - Salt is Not Configuration ManagementSaltconf16 - Salt is Not Configuration Management
Saltconf16 - Salt is Not Configuration Management
 
Open Source Monitoring Tools Shootout
Open Source Monitoring Tools ShootoutOpen Source Monitoring Tools Shootout
Open Source Monitoring Tools Shootout
 
Data Driven Security, from Gartner Security Summit 2012
Data Driven Security, from Gartner Security Summit 2012Data Driven Security, from Gartner Security Summit 2012
Data Driven Security, from Gartner Security Summit 2012
 
2021 04-15 operational verification (with notes)
2021 04-15 operational verification (with notes)2021 04-15 operational verification (with notes)
2021 04-15 operational verification (with notes)
 
HA SOA Application with GlusterFS
HA SOA Application with GlusterFSHA SOA Application with GlusterFS
HA SOA Application with GlusterFS
 
Webinar: Queues with RabbitMQ - Lorna Mitchell
Webinar: Queues with RabbitMQ - Lorna MitchellWebinar: Queues with RabbitMQ - Lorna Mitchell
Webinar: Queues with RabbitMQ - Lorna Mitchell
 
Un-broken Logging - Operability.io 2015 - Matthew Skelton
Un-broken Logging - Operability.io 2015 - Matthew SkeltonUn-broken Logging - Operability.io 2015 - Matthew Skelton
Un-broken Logging - Operability.io 2015 - Matthew Skelton
 
Spring Boot & Spring Cloud on PAS- Nate Schutta (2/2)
Spring Boot & Spring Cloud on PAS- Nate Schutta (2/2)Spring Boot & Spring Cloud on PAS- Nate Schutta (2/2)
Spring Boot & Spring Cloud on PAS- Nate Schutta (2/2)
 
SaltConf 2014: Safety with powertools
SaltConf 2014: Safety with powertoolsSaltConf 2014: Safety with powertools
SaltConf 2014: Safety with powertools
 
[143]Inside fuse deview 2016
[143]Inside fuse   deview 2016[143]Inside fuse   deview 2016
[143]Inside fuse deview 2016
 
Advanced A/B Testing at Wix - Aviran Mordo and Sagy Rozman, Wix.com
Advanced A/B Testing at Wix - Aviran Mordo and Sagy Rozman, Wix.comAdvanced A/B Testing at Wix - Aviran Mordo and Sagy Rozman, Wix.com
Advanced A/B Testing at Wix - Aviran Mordo and Sagy Rozman, Wix.com
 
Observability
ObservabilityObservability
Observability
 
PyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPyCon India 2012: Celery Talk
PyCon India 2012: Celery Talk
 
Rebooting a Cloud
Rebooting a CloudRebooting a Cloud
Rebooting a Cloud
 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the message
 

Similar to Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONAdrian Cockcroft
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Iulian Pintoiu
 
Being serverless and Swift... Is that allowed?
Being serverless and Swift... Is that allowed? Being serverless and Swift... Is that allowed?
Being serverless and Swift... Is that allowed? Dev_Events
 
IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...
IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...
IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...OpenWhisk
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
 
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵Amazon Web Services Korea
 
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea SaltarelloAzure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea SaltarelloITCamp
 
Laying the Foundation for Ionic Platform Insights on Spark
Laying the Foundation for Ionic Platform Insights on SparkLaying the Foundation for Ionic Platform Insights on Spark
Laying the Foundation for Ionic Platform Insights on SparkIonic Security
 
Phil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage makerPhil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage makerAWSCOMSUM
 
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...Animesh Singh
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and ActivatorKevin Webber
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Rafael Ferreira da Silva
 
Machine learning at scale with aws sage maker
Machine learning at scale with aws sage makerMachine learning at scale with aws sage maker
Machine learning at scale with aws sage makerPhilipBasford
 
Docker Orchestration: Welcome to the Jungle! Devoxx & Docker Meetup Tour Nov ...
Docker Orchestration: Welcome to the Jungle! Devoxx & Docker Meetup Tour Nov ...Docker Orchestration: Welcome to the Jungle! Devoxx & Docker Meetup Tour Nov ...
Docker Orchestration: Welcome to the Jungle! Devoxx & Docker Meetup Tour Nov ...Patrick Chanezon
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...Josef Adersberger
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...QAware GmbH
 
KubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
KubeCon USA 2017 brief Overview - from Kubernetes meetup BangaloreKubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
KubeCon USA 2017 brief Overview - from Kubernetes meetup BangaloreKrishna-Kumar
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...Lightbend
 
OSDN: Serverless technologies with Kubernetes
OSDN: Serverless technologies with Kubernetes OSDN: Serverless technologies with Kubernetes
OSDN: Serverless technologies with Kubernetes Provectus
 

Similar to Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm (20)

Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
 
Being serverless and Swift... Is that allowed?
Being serverless and Swift... Is that allowed? Being serverless and Swift... Is that allowed?
Being serverless and Swift... Is that allowed?
 
IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...
IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...
IBM Bluemix OpenWhisk: Serverless Conference 2016, London, UK: The Future of ...
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea SaltarelloAzure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
 
Laying the Foundation for Ionic Platform Insights on Spark
Laying the Foundation for Ionic Platform Insights on SparkLaying the Foundation for Ionic Platform Insights on Spark
Laying the Foundation for Ionic Platform Insights on Spark
 
Phil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage makerPhil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage maker
 
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
 
Project Final Report
Project Final ReportProject Final Report
Project Final Report
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and Activator
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
 
Machine learning at scale with aws sage maker
Machine learning at scale with aws sage makerMachine learning at scale with aws sage maker
Machine learning at scale with aws sage maker
 
Docker Orchestration: Welcome to the Jungle! Devoxx & Docker Meetup Tour Nov ...
Docker Orchestration: Welcome to the Jungle! Devoxx & Docker Meetup Tour Nov ...Docker Orchestration: Welcome to the Jungle! Devoxx & Docker Meetup Tour Nov ...
Docker Orchestration: Welcome to the Jungle! Devoxx & Docker Meetup Tour Nov ...
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
 
KubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
KubeCon USA 2017 brief Overview - from Kubernetes meetup BangaloreKubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
KubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
 
OSDN: Serverless technologies with Kubernetes
OSDN: Serverless technologies with Kubernetes OSDN: Serverless technologies with Kubernetes
OSDN: Serverless technologies with Kubernetes
 

Recently uploaded

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

  • 1. Genomic Computation at Scale with Serverless, StackStorm, and Docker SC17, 14 Nov 2017 Dmitri Zimine Fellow @ Extreme Networks @dzimine Image by Miki Yoshihito, Creative Commons license
  • 2. Genomic Sequencing and Annotation ACGTGACCGGTACTGGTAACGTACA CCTACGTGACCGGTACTGGTAACGT ACGCCTACGTGACCGGTACTGGTAA CGTATACACGTGACCGGTACTGGTA ACGTACACCTACGTGACCGGTACTG CTGGTAACGTATACCTCT... Sequencer Sequenced Genome DNA Sample Annotated Sequence Compute in silko
  • 4. Victor Solovyev Partner, Leading scientist in computational biology Victor Solovyev is a leading scientist in computational biology. His experience is a good mixture of academic positions, including Professor at Royal Holloway and KAUST, and various industry roles. His research on bioinformatics and genomic computations are published in Nature, Science, Genome Research and highly cited. As Chief Sci. Officer at Softberry, he is leading software development for biomedical data analysis and research in computational biology. Softberry software products have been used in over 2000 research publications in 2016 alone. Fgenesh program has been cited in ~ 3200, Bprom program in ~ 800, Fgenesb pipeline in ~500 scientific publications.
  • 5. 5 fgenesb pipeline: some [prev] results
  • 6.
  • 7. PROPERTIES: Challenges: • Offer annotation pipelines online • Use cloud, for large elastic capacity • Handle scale - spiky workload • Economically GAaaS – Genomic Annotation as a Service
  • 8. Agenda 8 Problem & Solution Domain demands, technology selection & serverless, toolchain, solution overview Show & Tell Demo Discussion Lessons learned, what to keep & what to refactor, the path forward
  • 9. Typicalgenomicannotationpipeline Search for similar proteins in databases KEGG Prediction of genes and proteins Compilation and presentation of results NR fgenesb Blast(NR) GCView 50-100Gb KOALA(KEGG) 1Mb-3Gb Highly Parallel-able
  • 10. Annotation Pipelines A basic exome pipeline delivering called variants from raw sequence could consist of as few as 12 steps, most of which can be run in parallel, but a real analysis will typically involve several additional downstream steps and complex report generation. Source: Brief Bioinform bbw020. DOI: https://doi.org/10.1093/bib/bbw020
  • 11. Annotation Pipelines A basic exome pipeline delivering called variants from raw sequence could consist of as few as 12 steps, most of which can be run in parallel, but a real analysis will typically involve several additional downstream steps and complex report generation. Source: Brief Bioinform bbw020. DOI: https://doi.org/10.1093/bib/bbw020 PROPERTIES: • Steps: • jobs/functions • Run times – may be hours & days • Diverse (a.k.a. “don’t run on the same box”) • Workflow orchestration: • Logical patterns: splits, parallels, joins • Data flow: Upstream results –> downstream inputs • Scale dimentions: spiky load • Low volume of requests, • Very high compute demand per request Properties:
  • 13. Authoritative: Mike Roberts on martinfowler.com: My summary • Function, not service: “down when done” • Scale – elastic, infinite, transparent for developer • Pay per use consumption model https://goo.gl/bTfgfU What is Serverless?
  • 14. 14 Serverless fits! *) BYOC – Bring Your Own Code (see the serverless compute manifesto, https://goo.gl/q9HsXB Typical Serverless requirements: • “Functions”, not “servers”, down when done • Elastic scale: handle spiky workload pattern • BYOC*: package algorithms into containers • Launch on a variety of events
  • 15. Additional requirements: • Long running times: hours • Pipeline orchestration: execution logic and data passing • Local Dev environment, consistent and convenient 15 Serverless fits, but… Typical Serverless requirements: • Elastic scale: handle spiky workload pattern • “Functions”, not “servers”, down when done • BYOC*: package programs into containers, run everywhere • Launch on a variety of events
  • 16. Why not <…> 16 AWS Lambda? 5 min limitation - jobs run for hours and days Azure? No native support for Functions in docker containers * OpenWhisk? Lacks powerful workflow to orchestrate pipelines (only sequences) *) At the time of selecting. I will cover ”what has changed” in Discussion.
  • 17. D I Y
  • 18. 18
  • 19. Terraform provisions infra on AWS (WIP); Vagrant for local dev infra. Ansible deploys & cofigures software on Infra. Docker to containerize functions and push to local Docker Registry. StackStorm orchestrates pipeline executions, invokes Swarm to run functions, dynamically scales Swarm on load. Tool Chain
  • 20. StackStorm,in1minute ActionsSensors WorkflowsRules IT Domains Config mgmtStorageNetworking ContainersCloud InfraMonitoring Ops Support Triggers Calls
  • 21. ©2017 Extreme Networks, Inc. All rights reserved StackStorm is like … ActionsSensors WorkflowsRules Step Functions AWS Lambda OpenSource, for DIY Serverless
  • 22. Three Sides to Serverless Story DevOps Developer End User Submits sequence, Gets results, fast and cheap. Packs algorithms in containers, Defines pipelines Provides infrastructure
  • 23. 1. DevOps: deploys serverless solution 23 share(:rw) data(:ro) StackStorm other infra… f(x) Registry Controller f(x) f(x) f(x) Worker f(x) f(x) f(x) Worker f(x) f(x) f(x) Worker /share /data $ function Scale DevOps
  • 24.
  • 25. 2. Developer: creates functions, defines pipeline 25 StackStorm Registry Create functions (BYOC), pack into Docker image, push to local Registry Define pipelines as StackStorm workflows Developer 1 2 f(x) f(x) f(x) f(x)
  • 26. StackStorm StackStorm sends results back to user Swarm controller 2 46 Docker pulls function’s images 5 Functions run in containers, produce data f(x) StackStorm runs workflow schedules functions as jobs on Swarm Swarm Worker 3 Swarm schedules services User sends sequence data1 f(x) f(x) Registry 3. User submits data, System runs pipeline & produces results End User
  • 27. 27 Genomic annotation pipeline with StackStorm, Docker, and Docker Swarm Show & Tell, PART 1
  • 28.
  • 29. Scale: dynamically, on load 29 share(:rw) data(:ro) StackStorm other infra… f(x) Registry Controller f(x) f(x) f(x) Worker f(x) f(x) f(x) Worker f(x) Worker Scale
  • 30. 30 Show & Tell, PART 2 Dynamically scaling Swarm cluster on AWS, on workload
  • 31.
  • 32. Agenda 32 Problem & Solution Domain demands, technology selection & serverless, toolchain, solution overview Show & Tell Demo Discussion Lessons learned, what to keep & what to refactor, the path forward
  • 33.
  • 34. Serverless hype accelerates 25+ framewors … but no turn-key fit yet
  • 35. Kubernetes Won Container Arm Race now with built-in AWS autoscaler .
  • 36. Azure Introduced Container Instances no messing with VMs, per-second billing .
  • 37. We are outpaced by technology
  • 38. We are outpaced by technology So What?
  • 39. Path Forward: Options Option 1: Kubernetes • Use Kubernetes pack from StackStorm Exchange • Utilize k8s “run to completion” jobs • Deploy on AWS, minikube for local development, • Leverage AWS autoscaler for elastic capacity StackStorm handles pipeline workflow, calls k8s Jobs. Same app developer experience. 39
  • 40. Path Forward: Options Option 2: Azure • Use Azure’s ”Self-orchestration” option with StackStorm • Azure provides containers on demand (no VMs!) • Per container, per second billing StackStorm handles pipeline workflow, calls Azure containers. App developer experience stays the same. 40
  • 41. StackStorm StackStorm sends results back to user Azure Container Service 2 46 Docker pulls function’s images from Registry 5 Functions run in containers, produce data f(x) StackStorm runs workflow schedules functions as containers on Azure Azure Container Instance 3 Azure schedules container instances User sends sequence data1 f(x) f(x) Registry Path forward: Change to Azure Container Instances End User
  • 42. 42
  • 43. 43
  • 44. STACKSTORM EVENT-DRIVEN AUTOMATION ALLOWS YOU TO GET YOUR SOLUTION UP AND RUNNING QUICKLY SO YOU CAN DELIVER BUSINESS FAST, EXPERIMENT AND INNOVATE. ONCE YOU HAVE IT JUST RIGHT, YOU CAN BUILD A MORE PERMANENT VERSION WITH MICROSERVICES ActionsSensors WorkflowsRules 44
  • 45. StackStorm is an innovation platform where we can build solutions, experiment and learn, while deliver business value, before moving implementation to dedicated services
  • 46. 46 StackStorm OpenSource Platform Brocade Workflow Composer (StackStorm Enterprise Edition) Network Automation StackStorm Exchange Community Security Assisted Networking
  • 47. ©2017 Extreme Networks, Inc. All rights reserved Come and see! SC17 Excibition, Booth #519 47
  • 48. Image by Miki Yoshihito, Creative Commons license Dmitri Zimine Extreme Networks @dzimine http://github.com/dzimine/serverless-swarm @Stack_Storm http://github.com/StackStorm/st2 Star 2,317 Thank You!