NUMA and Virtualization, the case of Xen

NUMA and Virtualization,
the case of Xen
Dario Faggioli, dario.faggioli@citrix.com

August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com

NUMA and Virt. And Xen
Talk outline:

● What is NUMA
– NUMA and Virtualization
– NUMA and Xen
● What I have done in Xen about NUMA
● What remains to be done in Xen about NUMA


What is NUMA
● Non-Uniform Memory Access: it will take longer
Access
to access some regions of memory than others
● Designed to improve scalability on large SMPs
● Bottleneck: contention on the shared memory bus


What is NUMA

● Groups of processors
(NUMA node) have their
own local memory
● Any processor can access
any memory, including the
one not "owned" by its group
(remote memory)
● Non-uniform: accessing local
memory is faster than
accessing remote memory


What is NUMA
Since:
● (most of) memory is NODE NODE
allocated at task
startup, CPUs CPUs
task A task A
● tasks are (usually) free
to run on any
processor
Both local and remote mem A
accesses can happen
during task's life MEM MEM


NUMA and Virtualization
Short lived tasks are just fine... But what
about VMs?
NODE NODE

CPUs CPUs
VM2
VM1 VM1

mem
VM1
FOREVER!!
mem
VM2
MEM MEM


NUMA and Xen
Xen allocates memory from all the nodes
where the VM is allowed to run (when created)
NODE NODE

CPUs CPUs
VM2
VM1 VM1

mem mem
VM1 VM1
mem mem
VM2 VM2
MEM MEM


NUMA and Xen
NODE NODE NODE NODE

CPUs CPUs CPUs CPUs
VM2 VM1

mem mem
VM1 VM1
mem mem
VM2 VM2
MEM MEM MEM MEM


NUMA and Xen
How to specify that?
vCPU pinning during VM creation (e.g., in the VM
config file)


Automatic Placement
1.What happens on the left is better than
what happens on the right
NODE NODE NODE NODE

CPUs CPUs CPUs CPUs
VM1 VM1

mem mem mem
VM1 VM1 VM1

MEM MEM MEM MEM


Automatic Placement
2.What happens on the left can be avoided:
look at what happens on the right
NODE NODE NODE NODE

CPUs CPUs CPUs CPUs
VM2 VM1 VM1 VM2

mem mem mem
VM1 VM1 VM2
mem
VM2
MEM MEM MEM MEM


Automatic Placement
1.VM1 creation
NODE NODE
time: pin VM1 to the
CPUs CPUs first node
VM1 VM2
2.VM2 creation time:
pin VM2 to the
mem mem
VM1 VM2 second node, as first
one already has
another VM pinned to
This now happens
MEM MEM
it
Automatically within
libxl


NUMA Aware Scheduling
Nice, but using pinning has a major
drawback:
NODE NODE

CPUs CPUs
VM3
VM1 VM2

VM1
mem mem
When VM1 is back, VM1 VM2
It has to wait, even if
There are idle CPUs! mem
VM3
MEM MEM


Don't use pinning, tell the scheduler where
a VM prefers to run (node affinity)
NODE NODE

CPUs CPUs Now VM1 can
run immediately:
VM3
VM1 VM1
VM2 remote accesses
Are better than not
Patches almost running at all!
mem mem
ready (targeting VM1 VM2
Xen 4.3) mem
VM3
MEM MEM


Experimental Results
Trying to verify the performances of:
● Automatic placement (pinning)
● NUMA aware scheduling
(automatic placement + node affinity)


Testing Placement
(thanks to Andre Przywara from AMD)

● Host: AMD Opteron(TM) 6276,
64 cores, 128 GB RAM, 8 NUMA nodes
● VMs: 1 to 49, 2 vCPUs, 2GB RAM each.
nr. vCPUs Node
14 0
12 1
12 2
12 3
12 4
12 5
12 6
12 7


Some Traces about


Benchmarking Pinning and
● Host: Intel Xeon(R) E5620, 16 cores, 12 GB RAM,
2 NUMA nodes
● VMs: 2, 4, 6, 8 and 10 of them, 960MB RAM.
Different experiments with 1, 2 and 4 vCPUs per VM

➔ SPECjbb2005 executed concurrently in all VMs
➔ 3 configurations: all-cpus, auto-pinning,
auto-affinity
➔ Exp. repeated 3 times per each configuration


1 vCPU per VM: 20% to 16% improvement!


2 vCPUs per VM: 17% to 13% improvement!


4 vCPUs per VM:

0.2% to 8% gap from
auto-affinity to auto-pinning

16% to 4% improvement
from all-cpus to auto-affinity


Lots of Room for
Further Improvement!
Tracking ideas, people and progress on
http://wiki.xen.org/wiki/Xen_NUMA_Roadmap

● Dynamic memory migration
● IO NUMA
● Guest (or Virtual) NUMA
● Ballooning and memory sharing
● Inter-VM dependencies
● Benchmarking and performances evaluation

Dynamic Memory
Migration
If VM2 goes away, it would be nice if we
could change VM1's (or VM3's) node affinity
on-line NODE NODE

CPUs CPUs
VM3 VM2
VM1

Also targeting mem mem And move it's
VM1 VM1
VM2 Memory accordingly!
Xen 4.3
mem
VM3
MEM MEM


IO NUMA
Different devices can be attached to
different nodes: needs to be considered during
placement / scheduling


Guest NUMA
If a VM is bigger than 1 node, should it know?

NODE NODE NODE
Pros: performances
CPUs CPUs CPUs (especially HPC)
VM1 VM2
VM1 VM2
VM1
Cons: what if things
mem mem mem
changes?
VM1 VM1
VM2 VM1
VM2 ● live migration

MEM MEM MEM


Ballooning and Sharing
Ballooning should be Sharing, should we
NUMA aware allow that cross-node?
NODE NODE NODE NODE

CPUs CPUs CPUs CPUs
VM1 VM2 VM3 VM1 VM2

mem mem mem mem
VM1
mem VM3 VM1 VM2
VM1 mem shmem
!! VM2 Remote access!
MEM MEM MEM MEM

Inter-VM Dependences
Are we sure the situation on the right is always better?
Might it be workload dependant (VM cooperation VS.
competiotion)
NODE NODE NODE NODE

CPUs CPUs CPUs CPUs
VM2 VM3 VM1 VM2

VM1 VM3
mem mem mem mem
VM1 VM3 VM1 VM2
mem mem
VM2 VM3
MEM MEM MEM MEM

Benchmarking and
Performances Evaluation
How to verify we are actually improving:

● What kind of workload(s)?

● What VMs configuration?


Thanks!

Any Questions?


NUMA and Virtualization, the case of Xen

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

More from The Linux Foundation

More from The Linux Foundation (20)

Recently uploaded

Recently uploaded (20)

NUMA and Virtualization, the case of Xen