1. Profiling the Nova
Scheduler
A Case Study
Joe Gordon
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 1
Wednesday, October 17, 12
2. About Me
• Engineer at Cloudscaling
• Contributor
• Deployer
• Folsom Contributions
o Top 10 developer
(by commits)
o Mostly in Nova
http://bitergia.com/public/reports/openstack/2012_09_folsom/
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 2
Wednesday, October 17, 12
3. By the Numbers: Nova Folsom
• 190+ Contributors
• Release Python Lines Other Lines Python Files Other Files
Folsom 186,738 242,721 666 788
Essex 150,894 221,109 593 302
Diablo 110,581 110,393 427 389
• Code churn:
Release Lines Insertions Lines Deletions Insertions/LoC %
Folsom 110,308 71,911 59.0%
Essex 182,298 138,346 120.8%
Code churn generated with git log --numstat --pretty="%H" $A..$B| grep .py$ | awk 'NF==3 {plus+=$1; minus+=$2} END {printf("+%d, -%dn", plus,
minus)}'
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 3
Wednesday, October 17, 12
4. Who Wrote Nova Folsom?
1 138 Russell Bryant <rbryant@redhat.com>
2 112 Johannes Erdfelt <johannes.erdfelt@rackspace.com>
3 97 Dan Prince <dprince@redhat.com>
4 88 Vishvananda Ishaya <vishvananda@gmail.com>
5 81 Joe Gordon <jogo@cloudscaling.com>
6 63 Michael Still <mikal@stillhq.com>
7 59 Mark McLoughlin <markmc@redhat.com>
8 58 Rick Harris <rconradharris@gmail.com>
9 50 Yun Mao <yunmao@gmail.com>
10 45 Daniel P. Berrange <berrange@redhat.com>
11 36 Chris Behrens <cbehrens@codestud.com>
12 31 Eoghan Glynn <eglynn@redhat.com>
13 29 Brian Waldon <brian.waldon@rackspace.com>
14 26 Pádraig Brady <pbrady@redhat.com>
15 25 Chuck Short <zulcss@ubuntu.com>
16 23 Sean Dague <sdague@linux.vnet.ibm.com>
17 21 Alex Meade <alex.meade@rackspace.com>
18 18 Kevin L. Mitchell <kevin.mitchell@rackspace.com>
19 17 Brian Elliott <brian.elliott@rackspace.com>
20 17 Zhongyue Luo <zhongyue.nah@intel.com>
21 16 John Griffith <john.griffith@solidfire.com>
22 13 Dan Smith <danms@us.ibm.com>
23 13 Andrew Bogott <abogott@wikimedia.org>
24 12 Renuka Apte <renuka.apte@citrix.com>
25 12 Thierry Carrez <thierry@openstack.org>
git shortlog -sne --since="Tue Mar 20
26 12 Monty Taylor <mordred@inaugust.com> 08:17:40 2012 +0100" --no-merges | cat -n
27 10 MotoKen <motokentsai@gmail.com>
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 4
Wednesday, October 17, 12
5. Problems
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 5
Wednesday, October 17, 12
6. Problems
euca-run-instances -n 1000
Environment: default nova essex scheduler
1. When trying to schedule 1000 VMs, more than 1000
VMs were scheduled. Why were too many VMs
scheduled?
2. Scheduling 100 VMs took 24 seconds. Why is the
scheduler so slow?
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 6
Wednesday, October 17, 12
7. Outline
• How the nova essex scheduler works
• P1: Why were too many VMs scheduled?
• Nova performance
• P2: Why is the essex scheduler is so slow?
• How the folsom scheduler performs
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 7
Wednesday, October 17, 12
8. How the nova essex scheduler
works
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 8
Wednesday, October 17, 12
9. Scheduling
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 9
Wednesday, October 17, 12
10. Scheduling
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 9
Wednesday, October 17, 12
11. Scheduling
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 9
Wednesday, October 17, 12
12. Scheduling
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 9
Wednesday, October 17, 12
13. Scheduling
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 9
Wednesday, October 17, 12
14. Scheduling
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 9
Wednesday, October 17, 12
15. Scheduling
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 9
Wednesday, October 17, 12
16. Default Scheduler
Filter Scheduler
http://docs.openstack.org/developer/nova/devref/filter_scheduler.html
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 10
Wednesday, October 17, 12
17. Limitations
• Cannot run multiple schedulers
• Race condition
• Fixed in folsom
• O(nm)
• n = number of nodes
• m = number of VMs requested
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 11
Wednesday, October 17, 12
18. Problem 1: Why were too many VMs
scheduled?
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 12
Wednesday, October 17, 12
19. Problem #1: Too Many VMs
• scheduling is slow
• RPC call instead of cast
• boto retries upon API timeout
$ time euca-run-instances -t m1.tiny -n 100
real 0m25.250s
user 0m0.100s
sys 0m0.020s
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 13
Wednesday, October 17, 12
20. nova/compute/api.py
rpc.call instead of rpc.cast
# We can create the DB entry for the instance here if we're
# only going to create 1 instance.
# This speeds up API responses for builds
# as we don't need to wait for the scheduler.
create_instance_here = max_count == 1
def _create_instance(...):
"""Verify all the input parameters regardless of the provisioning
strategy being performed and schedule the instance(s) for
creation."""
...
# We need to wait for the scheduler to create the instance
# DB entries, because the instance *could* be # created in
# a child zone.
rpc_method = rpc.call
# TODO(comstud): We should use rpc.multicall when we can
# retrieve the full instance dictionary from the scheduler.
# Otherwise, we could exceed the AMQP max message size limit.
# This would require the schedulers' schedule_run_instances
# methods to return an iterator vs a list.
instances = self._schedule_run_instance(
rpc_method,
context, base_options,
instance_type,
availability_zone, injected_files,
admin_password, image,
num_instances, requested_networks,
block_device_mapping, security_group,
filter_properties)
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 14
Wednesday, October 17, 12
21. nova/scheduler/filter_scheduler.py
each VM is scheduled individually
def schedule_run_instance(self, context, request_spec, *args, **kwargs):
"""This method is called from nova.compute.api to provision
an instance. We first create a build plan (a list of WeightedHosts)
and then provision.
Returns a list of the instances created.
"""
...
for num in xrange(num_instances):
if not weighted_hosts:
break
weighted_host = weighted_hosts.pop(0)
request_spec['instance_properties']['launch_index'] = num
instance = self._provision_resource(elevated, weighted_host,
request_spec, kwargs)
if instance:
instances.append(instance)
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 15
Wednesday, October 17, 12
22. Nova
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 16
Wednesday, October 17, 12
23. Retries
EC2
User
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 17
Wednesday, October 17, 12
24. Retries
EC2 Nova EC2
User API
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 17
Wednesday, October 17, 12
25. Retries
Pound
EC2 Nova EC2
User Timeout=15 API
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 17
Wednesday, October 17, 12
26. Retries
Pound
EC2 Nova EC2
User Timeout=15 API
euca-run-instances -n 100
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 17
Wednesday, October 17, 12
27. Retries
Pound
EC2 Nova EC2
User Timeout=15 API
euca-run-instances -n 100 Slow
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 17
Wednesday, October 17, 12
28. Retries
Pound
EC2 Nova EC2
User Timeout=15 API
Send Slow
euca-run-instances -n 100
Timeout
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 17
Wednesday, October 17, 12
29. Retries
Pound
EC2 Nova EC2
User Timeout=15 API
Send Slow
euca-run-instances -n 100
Timeout
Retry
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 17
Wednesday, October 17, 12
30. Nova Performance
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 18
Wednesday, October 17, 12
31. Nova Performance
• How eventlet changes everything
•
coroutines
• interplay between MySQL and eventlet
• RPC
• CPU frequency scaling
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 19
Wednesday, October 17, 12
32. Nova Logs
logging format:
%(asctime)s %(levelname)s %(name)s [%(request_id)s %(user_id)s %(project_id)s]
Example:
2012-10-11 21:48:08 DEBUG nova.api.ec2 [req-d54b7cfa-fd51-4a04-92f2-5509c8411ed4
d97c90fd1a4f49e7b7fb38f4d358f4dd a6f5079f10464a6f9aefd1b2e9d5aab3] action:
RunInstances from (pid=30579) __call__ /usr/local/lib/python2.7/dist-packages/
nova-2012.1.3-py2.7.egg/nova/api/ec2/__init__.py:435
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 20
Wednesday, October 17, 12
33. Why is scheduling so slow?
24 seconds in the life of nova
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 21
Wednesday, October 17, 12
34. Problem #2: Slow
24 seconds in the life of nova
nova-api.log
2012-10-11 21:48:08 - POST RunInstances Request
2012-10-11 21:48:08 - RPC RR CALL ‘scheduler’
2012-10-11 21:48:32 - POST Response (24 seconds)
nova-scheduler.log
2012-10-11 21:48:08 - Attempting to build 100 instance(s)
2012-10-11 21:48:09 - Last VM filtered and weighted
... - Write to DB, and Cast (19 seconds)
2012-10-11 21:48:28 - Respond to api CALL
mysql.log
filter and weighting: 733 DB calls, 0.07151 seconds
Write to DB, and Cast: 3563 DB Calls, 1.77971 seconds
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 22
Wednesday, October 17, 12
35. Problem #2: Slow
24 seconds in the life of nova
• DB is fast
• filter and weight is fast
• Python code around DB is slow
•SQLAlchemy?
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 23
Wednesday, October 17, 12
36. Folsom
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 24
Wednesday, October 17, 12
37. Folsom
• RPC.Call to scheduler is now RPC.Cast to scheduler
•
API server doesn’t wait for scheduling
• Improved DB access patterns
•
Essex: 29 DB calls in nova/scheduler/driver.py
•
Folsom: 12 DB calls in nova/scheduler/driver.py
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 25
Wednesday, October 17, 12
38. Questions?
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 26
Wednesday, October 17, 12
39. Raw Data
goal: show how much time spent where
setup: essex filter scheduler, 2 compute nodes
$ time euca-run-instances -t m1.tiny -n 50
real 0m12.183s
user 0m0.104s
sys 0m0.020s
$ time euca-run-instances -t m1.tiny -n 100
real 0m25.250s
user 0m0.100s
sys 0m0.020s
req-d54b7cfa-fd51-4a04-92f2-5509c8411ed4
nova-api.log
2012-10-11 21:48:08 - POST RunInstances Request
2012-10-11 21:48:08 - RPC RR CALL ‘scheduler’
2012-10-11 21:48:32 - POST response
Note: show euca -n * loop in nova-scheduler (1 RPC to scheduler 100 separate schedule commands)
nova-scheduler.log
2012-10-11 21:48:08 - Attempting to build 100 instance(s)
2012-10-11 21:48:09 - Last VM filtered wand weighted
... - Cast to compute, block_device_mapping etc.
2012-10-11 21:48:28 - respond to api CALL
mysql.log
filter and weighting: 733 DB calls, 0.071512 seconds
cast to compute, block_device_mapping, security groups: 3563 DB Calls, 1.77971
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
* All unlicensed or borrowed works retain their original licenses. 27
Wednesday, October 17, 12