6. Alert workflow – previous
Ops: where’s the runbook for this?
Ops: app bug or system issue?
Ops: who’s the devel of this game?
Phone #?
Ops: I can’t find the developer…
who’s his manager?
Critical
Critical
NonCritical
Ops
Dev
7. Alert workflow 2.0
Ops: where’s the runbook for this?
Ops: app bug or system issue?
Ops: who’s the devel of this game?
Phone #?
Ops: I can’t find the developer…
who’s his manager?
Ops
Critical
Dev
9. Alert Workflow 3.0 - current
Ops
Dev, Project X, Server
Each alert go directly to
the right team that can
resolve it !
Dev, Project Y, Client,
Android
Dev, …
10. Alerts go to the person that can resolve
Type
Scope
Checked by
Who to page?
ELB
Load balancer
health-check
ELB
No one – email
alert only
System-level
Check cpu /
disk / memory /
network
Pingdom /
Nagios
Ops team
App-level
Application
issues / bugs
Pingdom
Dev and Ops
teams
11. Alerts go to the person that can resolve
Type
Scope
Checked by
Who to page?
ELB
Load balancer
health-check
ELB
No one – email
alert only
System-level
Check cpu /
disk / memory /
network
Pingdom /
Nagios
Ops team
App-level
Application
issues / bugs
Pingdom
Dev and Ops
teams
12. Alerts go to the person that can resolve
Type
Scope
Checked by
Who to page?
ELB
Load balancer
health-check
ELB
No one – email
alert only
System-level
Check cpu /
disk / memory /
network
Pingdom /
Nagios
Ops team
App-level
Application
issues / bugs
Pingdom
Dev and Ops
teams
13. Alerts go to the person that can resolve
Type
App-level alerts can beChecked byby issuesto page?
triggered
Scope
Who in:
ELB
System-level
• Load balancer ELB
Server-side
• health-check
Client-side
• iOS
Check cpu /
• Android Pingdom /
disk / memory /
network
App-level
Pingdom
Ops team
Nagios
Application
issues / bugs
No one – email
alert only
Dev and Ops
teams
14. Dev and Ops are responsible
Team
On-call
Ops
8
Dev
32, from 20 games (Serverside or client-side Android or
iOS)
Analytics
5