1. NUMA and Virtualization,
the case of Xen
Dario Faggioli, dario.faggioli@citrix.com
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
2. NUMA and Virt. And Xen
Talk outline:
● What is NUMA
– NUMA and Virtualization
– NUMA and Xen
● What I have done in Xen about NUMA
● What remains to be done in Xen about NUMA
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
3. NUMA and Virt. And Xen
Talk outline:
● What is NUMA
– NUMA and Virtualization
– NUMA and Xen
● What I have done in Xen about NUMA
● What remains to be done in Xen about NUMA
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
4. What is NUMA
● Non-Uniform Memory Access: it will take longer
Access
to access some regions of memory than others
● Designed to improve scalability on large SMPs
● Bottleneck: contention on the shared memory bus
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
5. What is NUMA
● Groups of processors
(NUMA node) have their
own local memory
● Any processor can access
any memory, including the
one not "owned" by its group
(remote memory)
● Non-uniform: accessing local
memory is faster than
accessing remote memory
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
6. What is NUMA
Since:
● (most of) memory is NODE NODE
allocated at task
startup, CPUs CPUs
task A task A
● tasks are (usually) free
to run on any
processor
Both local and remote mem A
accesses can happen
during task's life MEM MEM
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
7. NUMA and Virtualization
Short lived tasks are just fine... But what
about VMs?
NODE NODE
CPUs CPUs
VM2
VM1 VM1
mem
VM1
FOREVER!!
mem
VM2
MEM MEM
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
8. NUMA and Xen
Xen allocates memory from all the nodes
where the VM is allowed to run (when created)
NODE NODE
CPUs CPUs
VM2
VM1 VM1
mem mem
VM1 VM1
mem mem
VM2 VM2
MEM MEM
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
9. NUMA and Xen
Xen allocates memory from all the nodes
where the VM is allowed to run (when created)
NODE NODE NODE NODE
CPUs CPUs CPUs CPUs
VM2 VM1
mem mem
VM1 VM1
mem mem
VM2 VM2
MEM MEM MEM MEM
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
10. NUMA and Xen
Xen allocates memory from all the nodes
where the VM is allowed to run (when created)
How to specify that?
vCPU pinning during VM creation (e.g., in the VM
config file)
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
11. NUMA and Virt. And Xen
Talk outline:
● What is NUMA
– NUMA and Virtualization
– NUMA and Xen
● What I have done in Xen about NUMA
● What remains to be done in Xen about NUMA
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
12. Automatic Placement
1.What happens on the left is better than
what happens on the right
NODE NODE NODE NODE
CPUs CPUs CPUs CPUs
VM1 VM1
mem mem mem
VM1 VM1 VM1
MEM MEM MEM MEM
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
13. Automatic Placement
2.What happens on the left can be avoided:
look at what happens on the right
NODE NODE NODE NODE
CPUs CPUs CPUs CPUs
VM2 VM1 VM1 VM2
mem mem mem
VM1 VM1 VM2
mem
VM2
MEM MEM MEM MEM
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
14. Automatic Placement
1.VM1 creation
NODE NODE
time: pin VM1 to the
CPUs CPUs first node
VM1 VM2
2.VM2 creation time:
pin VM2 to the
mem mem
VM1 VM2 second node, as first
one already has
another VM pinned to
This now happens
MEM MEM
it
Automatically within
libxl
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
15. NUMA Aware Scheduling
Nice, but using pinning has a major
drawback:
NODE NODE
CPUs CPUs
VM3
VM1 VM2
VM1
mem mem
When VM1 is back, VM1 VM2
It has to wait, even if
There are idle CPUs! mem
VM3
MEM MEM
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
16. NUMA Aware Scheduling
Don't use pinning, tell the scheduler where
a VM prefers to run (node affinity)
NODE NODE
CPUs CPUs Now VM1 can
run immediately:
VM3
VM1 VM1
VM2 remote accesses
Are better than not
Patches almost running at all!
mem mem
ready (targeting VM1 VM2
Xen 4.3) mem
VM3
MEM MEM
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
17. Experimental Results
Trying to verify the performances of:
● Automatic placement (pinning)
● NUMA aware scheduling
(automatic placement + node affinity)
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
18. Testing Placement
(thanks to Andre Przywara from AMD)
● Host: AMD Opteron(TM) 6276,
64 cores, 128 GB RAM, 8 NUMA nodes
● VMs: 1 to 49, 2 vCPUs, 2GB RAM each.
nr. vCPUs Node
14 0
12 1
12 2
12 3
12 4
12 5
12 6
12 7
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
19. Some Traces about
NUMA Aware Scheduling
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
20. Benchmarking Pinning and
NUMA Aware Scheduling
● Host: Intel Xeon(R) E5620, 16 cores, 12 GB RAM,
2 NUMA nodes
● VMs: 2, 4, 6, 8 and 10 of them, 960MB RAM.
Different experiments with 1, 2 and 4 vCPUs per VM
➔ SPECjbb2005 executed concurrently in all VMs
➔ 3 configurations: all-cpus, auto-pinning,
auto-affinity
➔ Exp. repeated 3 times per each configuration
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
21. Benchmarking Pinning and
NUMA Aware Scheduling
1 vCPU per VM: 20% to 16% improvement!
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
22. Benchmarking Pinning and
NUMA Aware Scheduling
2 vCPUs per VM: 17% to 13% improvement!
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
23. Benchmarking Pinning and
NUMA Aware Scheduling
4 vCPUs per VM:
0.2% to 8% gap from
auto-affinity to auto-pinning
16% to 4% improvement
from all-cpus to auto-affinity
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
24. Some Traces about
NUMA Aware Scheduling
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
25. Some Traces about
NUMA Aware Scheduling
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
26. NUMA and Virt. And Xen
Talk outline:
● What is NUMA
– NUMA and Virtualization
– NUMA and Xen
● What I have done in Xen about NUMA
● What remains to be done in Xen about NUMA
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
27. Lots of Room for
Further Improvement!
Tracking ideas, people and progress on
http://wiki.xen.org/wiki/Xen_NUMA_Roadmap
● Dynamic memory migration
● IO NUMA
● Guest (or Virtual) NUMA
● Ballooning and memory sharing
● Inter-VM dependencies
● Benchmarking and performances evaluation
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
28. Dynamic Memory
Migration
If VM2 goes away, it would be nice if we
could change VM1's (or VM3's) node affinity
on-line NODE NODE
CPUs CPUs
VM3 VM2
VM1
Also targeting mem mem And move it's
VM1 VM1
VM2 Memory accordingly!
Xen 4.3
mem
VM3
MEM MEM
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
29. IO NUMA
Different devices can be attached to
different nodes: needs to be considered during
placement / scheduling
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
30. Guest NUMA
If a VM is bigger than 1 node, should it know?
NODE NODE NODE
Pros: performances
CPUs CPUs CPUs (especially HPC)
VM1 VM2
VM1 VM2
VM1
Cons: what if things
mem mem mem
changes?
VM1 VM1
VM2 VM1
VM2 ● live migration
MEM MEM MEM
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
31. Ballooning and Sharing
Ballooning should be Sharing, should we
NUMA aware allow that cross-node?
NODE NODE NODE NODE
CPUs CPUs CPUs CPUs
VM1 VM2 VM3 VM1 VM2
mem mem mem mem
VM1
mem VM3 VM1 VM2
VM1 mem shmem
!! VM2 Remote access!
MEM MEM MEM MEM
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
32. Inter-VM Dependences
Are we sure the situation on the right is always better?
Might it be workload dependant (VM cooperation VS.
competiotion)
NODE NODE NODE NODE
CPUs CPUs CPUs CPUs
VM2 VM3 VM1 VM2
VM1 VM3
mem mem mem mem
VM1 VM3 VM1 VM2
mem mem
VM2 VM3
MEM MEM MEM MEM
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
33. Benchmarking and
Performances Evaluation
How to verify we are actually improving:
● What kind of workload(s)?
● What VMs configuration?
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
34. NUMA and Virt. And Xen
Talk outline:
● What is NUMA
– NUMA and Virtualization
– NUMA and Xen
● What I have done in Xen about NUMA
● What remains to be done in Xen about NUMA
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com
35. Thanks!
Any Questions?
August 27-28, 2012, Dario Faggioli,
San Diego, CA, USA dario.faggioli@citrix.com