1. XRM:
An
Event-‐based
Resource
Management
Framework
for
XCP
Pradeep
Padala
in collaboration with Ken Igarashi, Akshay I. Mehta, and Ulas C. Kozat
2. Typical
scenario
in
shared
infrastructures
Web search Data analytics
Shared
infrastructure
(cloud)
Data Center!
Xen Summit AMD 2010
3. ApplicaCon
requirements
Web search Data analytics
Fast searches Analyze large data
Low response time High throughput
QoS differentiation 3:1
Xen Summit AMD 2010
4. How
to
host
these
applicaCons?
Physical partitioning Virtualized data center
app1 app1 app1 app1
web app2 app3
db web db
Node I Node II Virtualization Virtualization
Virtualized
shared
data
center
=
a
new
paradigm!
Node II
app2 app3 Challenge
I
Node
How
to
allocate
resources
to
meet
goals?
Node III Node IV
Improved utilization
× Wasteful
Reduced costs
× Difficult to manage High flexibility (elastic!)
Xen Summit AMD 2010
5. Challenge
#1:
Developers
don’t
want
to
manage
resources
ProvisionVMs()
RunApplications() Where
to
provision
VMs?
While (true) {
MonitorApplications()
If(AppPerformance != GOAL) {
FindReason()
If (ScaleUp) { Holy
Grail
FindAvailableResources()
MigrateVM() How
to
determine
what
to
do?
DeployService();!
}
If (ScaleOut) { AutoScale();! Migrate?
Clone?
Scale
UP?
Scale
Out?
ProvisionVMs()
RunApplication()
}
}
If (Consolidation == True) {
FindSuitableVMs()
Consolidate() How
to
consolidate
VMs?
}
}
Cloud
Providers
Want
to
Consolidate
MulCple
Services
too!
5
Xen Summit AMD 2010
6. Challenge
#2:
Resource
Management
Spans
MulCple
Layers
Services
Management
Resource
PaaS
IaaS
Hardware
How
to
pass
informa.on
between
the
layers
so
that
they
don’t
make
conflic.ng
decisions?
Xen Summit AMD 2010
7. Challenge
#3:
Complexity
of
Scaling
PrimiCves
Slicing
Live
MigraCon
LiZle
overhead
Handles
overload
Efficient
Small
downCme
X Limited
to
single
X Overhead
machine
Cloning
Live
ReplicaCon
State-‐ful
clone
Maintain
X Overhead
connecCons
X Side-‐effects
X Overhead
How
to
combine
primi.ves
to
achieve
goals?
Xen Summit AMD 2010
8. What
is
a
perfect
Resource
Manager?
A
RM
that
can
automaCcally
re-‐arrange
resources
to
mulCple
applicaCons/VMs
on
mulCple
physical
machines
and
provides
opCmal
resource
uClizaCon
and
applicaCon
performance
We
are
building
the
(ulCmate)
RM
system
AutomaCon
XRM
=
first
incarnaCon
on
XCP!
Resource
AllocaCon
High
UClizaCon
High
ApplicaCon
Performance
Xen Summit AMD 2010
9. Outline
• MoCvaCon
• Challenges
in
RM
• XRM
Feedback
Control
based
Design
• XRM
ImplementaCon
and
Preliminary
Results
• Summary
and
Feedback
Xen Summit AMD 2010
10. How
to
achieve
the
automaCon?
“Almost any system that is
considered automatic has some
element of feedback control”
-Hellerstein et al.
XRM
=
A
Feedback
Control
System
Xen Summit AMD 2010
11. RM
in
mulCple
layers
Services
High
level
service
request
Does
app
modeling
PaaS
RM
and
may
request
changes
Slice
request
Slice
changes
IaaS
RM
Automated
Knows
only
about
control
loop
Hardware
VMs
and
hardware
resources
XRM
=
IaaS
RM
Xen Summit AMD 2010
12. XRM’s
feedback
control
loop
XCP
Monitor
Network
stats
Model
can
model
Model
applicaCons,
VMs,
and
Performance
underlying
resources
goals
Control
Control
parameters
AcCon
Change
resource
Migrate
Power-‐off
shares
machines
Xen Summit AMD 2010
13. Current
incarnaCon
XCP
Stats
Stats
analysis
1. Thresholds
monitoring
module
2. Rules
module
Filtered
Stats
and
stats
analysis
data
Core
algorithm
Algorithm
module
bank
RRD
database
Take
acCon
Out
of
band
stat
updates
from
XCP
Wrapper
nodes
Low-‐level
commands/XAPI
commands
XCP
master
node
Openflow
Xen Summit AMD 2010
14. XRM
is
an
event-‐based
framework
• Many
algorithms
can
be
developed
and
plugged
in
• The
algorithms
register
for
specific
events
– High
CPU
uClizaCon
– Packet
drops
– PowerOff
– PowerOn
– …
• Different
algorithms
may
take
different
acCons
A
Common
Abstrac.on
for
ALL
Algorithms
Xen Summit AMD 2010
15. What
algorithms
can
you
implement?
• AutoControl
–
automated
control
of
mulCple
virtualized
resources
[PadalaEurosys09]
• Models
applicaCon
and
sets
VM
shares
based
on
applicaCon
goals
App
App
App
Controller
Controller
Controller
Resource
Goals
Shares
Node
Controller
Node
Controller
[PadalaEurosys09] Pradeep Padala, Xiaoyun Zhu, Mustafa Uysal et al.
Automated Control of Multiple Virtualized Resources. In the proceedings of the
EuroSys 2009
Xen Summit AMD 2010
16. Outline
• MoCvaCon
• Challenges
in
RM
• XRM
Feedback
Control
based
Design
• XRM
ImplementaCon
and
Preliminary
Results
• Summary
and
Feedback
Xen Summit AMD 2010
17. XRM
features
• Interface
to
upper
layers
• Auto-‐*
features
• External
control
• Pluggable
algorithms
• Extensibility
Xen Summit AMD 2010
18. XRM
ImplementaCon
• Implemented
on
XCP
0.1.1
• WriZen
in
Python
• Pluggable
algorithms
have
to
be
wriZen
in
Python
• Currently
implements
four
algorithms
– Bin
packing
– Bin
packing
+
Live
migraCon
– Random
host
– Round-‐robin
• We
have
also
implemented
a
simulator
(run
1
Million
VMs
on
100,000
nodes!)
– Can
capture
data
during
a
“real”
run
– Run
mulCple
algorithms
on
exact
same
trace
Xen Summit AMD 2010
19. XRM
EvaluaCon
• 5
hosts,
4
cores
• Random
uClizaCons
• Random
slice
requests
• Three
algorithms
– Bin-‐packing
– Round-‐robin
– Random-‐host
• Slicing
algorithms
evaluated
in
previous
work
-‐
AutoControl
[PadalaEurosy’09]
Xen Summit AMD 2010
20. Comparing
three
algorithms
1000
Round-Robin Uses all five hosts, wasting energy
500
Host Utilization
0
1000
Random Host
Uses <= five hosts, wasting energy
500
0
1000
Bin Packing Uses <= three hosts!
500
0
1
2
3
4
5
6
7
8
9
Time Interval
21. AutoControl
experiments
• Experiments on Emulab
• 20 server nodes – 80 VMs
• 20 client nodes
• Mix of applications
• Load increased on ½ of the VMs chosen randomly
Under
Under
Over
Over
Over
loaded
loaded
loaded
loaded
loaded
VM1
VM2
No
control
AutoControl
VM3
VM4
needed
can
readjust
22. SLO
(performance
goal)
violaCons
Default Xen AutoControl
A
p
p
l
i
c
a
t
i
o
n
s
Time
Time
Bad Target Good
23. Summary
• Resource
management
in
cloud
infrastructures
is
complex
– MulCple
layers
of
RM
– Complex
primiCves
– Complex
decisions
• We
are
developing
feedback
control
theory
based
RM
• XRM
is
event-‐based,
pluggable
and
extensible
• Complex
algorithms
like
AutoControl
can
be
developed
• Research
in
advanced
algorithms
in
progress
Xen Summit AMD 2010
24. Summary
of
our
experiences
with
XCP
0.1.1
• We
are
trying
to
build
a
research
cloud
based
on
XCP
• Other
than
XRM,
adding
Fault
Tolerance
and
a
Web-‐based
GUI
to
XCP
• Having
to
install
a
special
distribuCon
is
difficult
– Why
not
have
XCP
as
a
set
of
packages
in
RHEL
or
other
distribuCons?
– You
are
breaking
toolstacks
developed
at
various
companies
• XCP
docs
is
same
as
Citrix
Xenserver
docs
– Some
of
the
features
don’t
work
or
not
supported
– BeZer
documentaCon
of
API
• XCP
GUI
needs
to
improve
– Bugs
in
OpenXenCenter
Xen Summit AMD 2010
26. We
want
feedback
from
Xen
community
• Comments
on
XRM
architecture
• Should
we
incorporate
XRM
into
XCP?
– Ocaml
• Are
you
interested
in
open
source
XRM?
– Does
the
community
wants
to
be
involved?
• QuesCons?
ppadala@docomolabs-‐usa.com
Xen Summit AMD 2010