SlideShare a Scribd company logo
1 of 39
Download to read offline
Profiling the Nova
                                       Scheduler
                                                           A Case Study



                                                               Joe Gordon
              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     1

Wednesday, October 17, 12
About Me
      •    Engineer at Cloudscaling
      •                Contributor
      •                Deployer
      •    Folsom Contributions
           o Top 10 developer
              (by commits)
           o Mostly in Nova




                                                                                              http://bitergia.com/public/reports/openstack/2012_09_folsom/



              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                                    2

Wednesday, October 17, 12
By the Numbers: Nova Folsom
      •      190+ Contributors
      •       Release            Python Lines              Other Lines                        Python Files           Other Files


               Folsom                186,738                       242,721                          666                 788

               Essex                 150,894                       221,109                          593                 302

               Diablo                110,581                       110,393                          427                 389


      •      Code churn:
              Release               Lines Insertions                     Lines Deletions                Insertions/LoC %

               Folsom                     110,308                               71,911                       59.0%

                Essex                     182,298                              138,346                       120.8%

Code churn generated with git log --numstat --pretty="%H" $A..$B| grep .py$ | awk 'NF==3 {plus+=$1; minus+=$2} END {printf("+%d, -%dn", plus,
minus)}'


                 CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
                 * All unlicensed or borrowed works retain their original licenses.                              3

Wednesday, October 17, 12
Who Wrote Nova Folsom?
        1      138      Russell Bryant <rbryant@redhat.com>
        2      112      Johannes Erdfelt <johannes.erdfelt@rackspace.com>
        3       97      Dan Prince <dprince@redhat.com>
        4       88      Vishvananda Ishaya <vishvananda@gmail.com>
        5       81      Joe Gordon <jogo@cloudscaling.com>
        6       63      Michael Still <mikal@stillhq.com>
        7       59      Mark McLoughlin <markmc@redhat.com>
        8       58      Rick Harris <rconradharris@gmail.com>
        9       50      Yun Mao <yunmao@gmail.com>
       10       45      Daniel P. Berrange <berrange@redhat.com>
       11       36      Chris Behrens <cbehrens@codestud.com>
       12       31      Eoghan Glynn <eglynn@redhat.com>
       13       29      Brian Waldon <brian.waldon@rackspace.com>
       14       26      Pádraig Brady <pbrady@redhat.com>
       15       25      Chuck Short <zulcss@ubuntu.com>
       16       23      Sean Dague <sdague@linux.vnet.ibm.com>
       17       21      Alex Meade <alex.meade@rackspace.com>
       18       18      Kevin L. Mitchell <kevin.mitchell@rackspace.com>
       19       17      Brian Elliott <brian.elliott@rackspace.com>
       20       17      Zhongyue Luo <zhongyue.nah@intel.com>
       21       16      John Griffith <john.griffith@solidfire.com>
       22       13      Dan Smith <danms@us.ibm.com>
       23       13      Andrew Bogott <abogott@wikimedia.org>
       24       12      Renuka Apte <renuka.apte@citrix.com>
       25       12      Thierry Carrez <thierry@openstack.org>
                                                                    git shortlog -sne --since="Tue Mar 20
       26       12      Monty Taylor <mordred@inaugust.com>         08:17:40 2012 +0100" --no-merges | cat -n
       27       10      MotoKen <motokentsai@gmail.com>

              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     4

Wednesday, October 17, 12
Problems




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     5

Wednesday, October 17, 12
Problems
     euca-run-instances -n 1000
     Environment: default nova essex scheduler
      1. When trying to schedule 1000 VMs, more than 1000
         VMs were scheduled. Why were too many VMs
         scheduled?
      2. Scheduling 100 VMs took 24 seconds. Why is the
         scheduler so slow?




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     6

Wednesday, October 17, 12
Outline
      •    How the nova essex scheduler works
      •    P1: Why were too many VMs scheduled?
      •    Nova performance
      •    P2: Why is the essex scheduler is so slow?
      •    How the folsom scheduler performs




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     7

Wednesday, October 17, 12
How the nova essex scheduler
     works




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     8

Wednesday, October 17, 12
Scheduling




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     9

Wednesday, October 17, 12
Scheduling




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     9

Wednesday, October 17, 12
Scheduling




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     9

Wednesday, October 17, 12
Scheduling




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     9

Wednesday, October 17, 12
Scheduling




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     9

Wednesday, October 17, 12
Scheduling




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     9

Wednesday, October 17, 12
Scheduling




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     9

Wednesday, October 17, 12
Default Scheduler
     Filter Scheduler




                                                              http://docs.openstack.org/developer/nova/devref/filter_scheduler.html
              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     10

Wednesday, October 17, 12
Limitations
      • Cannot run multiple schedulers
        • Race condition
        • Fixed in folsom
      • O(nm)
        • n = number of nodes
        • m = number of VMs requested


              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     11

Wednesday, October 17, 12
Problem 1: Why were too many VMs
     scheduled?




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     12

Wednesday, October 17, 12
Problem #1: Too Many VMs
      •    scheduling is slow
      •    RPC call instead of cast
      •    boto retries upon API timeout




           $ time euca-run-instances -t m1.tiny -n 100
             real    0m25.250s
             user    0m0.100s
             sys     0m0.020s




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     13

Wednesday, October 17, 12
nova/compute/api.py
     rpc.call instead of rpc.cast
                     # We can create the DB entry for the instance here if we're
                     # only going to create 1 instance.
                     # This speeds up API responses for builds
                     # as we don't need to wait for the scheduler.
                     create_instance_here = max_count == 1

                 def _create_instance(...):
                     """Verify all the input parameters regardless of the provisioning
                     strategy being performed and schedule the instance(s) for
                     creation."""
             ...
                         # We need to wait for the scheduler to create the instance
                         # DB entries, because the instance *could* be # created in
                         # a child zone.
                         rpc_method = rpc.call

                     # TODO(comstud): We should use rpc.multicall when we can
                     # retrieve the full instance dictionary from the scheduler.
                     # Otherwise, we could exceed the AMQP max message size limit.
                     # This would require the schedulers' schedule_run_instances
                     # methods to return an iterator vs a list.
                     instances = self._schedule_run_instance(
                             rpc_method,
                             context, base_options,
                             instance_type,
                             availability_zone, injected_files,
                             admin_password, image,
                             num_instances, requested_networks,
                             block_device_mapping, security_group,
                             filter_properties)

              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     14

Wednesday, October 17, 12
nova/scheduler/filter_scheduler.py
     each VM is scheduled individually


                 def schedule_run_instance(self, context, request_spec, *args, **kwargs):
                     """This method is called from nova.compute.api to provision
                     an instance. We first create a build plan (a list of WeightedHosts)
                     and then provision.

                          Returns a list of the instances created.
                          """
             ...
                     for num in xrange(num_instances):
                         if not weighted_hosts:
                             break
                         weighted_host = weighted_hosts.pop(0)

                         request_spec['instance_properties']['launch_index'] = num
                         instance = self._provision_resource(elevated, weighted_host,
                                                             request_spec, kwargs)

                         if instance:
                             instances.append(instance)




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     15

Wednesday, October 17, 12
Nova




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     16

Wednesday, October 17, 12
Retries



                                     EC2
                                     User




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     17

Wednesday, October 17, 12
Retries



                                     EC2                                                             Nova EC2
                                     User                                                              API




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                        17

Wednesday, October 17, 12
Retries



                                                                            Pound
                                     EC2                                                             Nova EC2
                                     User                              Timeout=15                      API




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                        17

Wednesday, October 17, 12
Retries



                                                                            Pound
                                     EC2                                                             Nova EC2
                                     User                              Timeout=15                      API



                euca-run-instances -n 100




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                        17

Wednesday, October 17, 12
Retries



                                                                            Pound
                                     EC2                                                             Nova EC2
                                     User                              Timeout=15                      API



                euca-run-instances -n 100                                                              Slow




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                        17

Wednesday, October 17, 12
Retries



                                                                            Pound
                                     EC2                                                             Nova EC2
                                     User                              Timeout=15                      API



                                                                            Send                       Slow
                euca-run-instances -n 100
                                                                           Timeout




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                        17

Wednesday, October 17, 12
Retries



                                                                            Pound
                                     EC2                                                             Nova EC2
                                     User                              Timeout=15                      API



                                                                            Send                       Slow
                euca-run-instances -n 100
                                                                           Timeout
                                      Retry




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                        17

Wednesday, October 17, 12
Nova Performance




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     18

Wednesday, October 17, 12
Nova Performance
      •    How eventlet changes everything
                   •
                 coroutines
      •    interplay between MySQL and eventlet
      •    RPC
      •    CPU frequency scaling




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     19

Wednesday, October 17, 12
Nova Logs
logging format:

%(asctime)s %(levelname)s %(name)s [%(request_id)s %(user_id)s %(project_id)s]



Example:

2012-10-11 21:48:08 DEBUG nova.api.ec2 [req-d54b7cfa-fd51-4a04-92f2-5509c8411ed4
d97c90fd1a4f49e7b7fb38f4d358f4dd a6f5079f10464a6f9aefd1b2e9d5aab3] action:
RunInstances from (pid=30579) __call__ /usr/local/lib/python2.7/dist-packages/
nova-2012.1.3-py2.7.egg/nova/api/ec2/__init__.py:435




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     20

Wednesday, October 17, 12
Why is scheduling so slow?
     24 seconds in the life of nova




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     21

Wednesday, October 17, 12
Problem #2: Slow
     24 seconds in the life of nova
    nova-api.log
    2012-10-11 21:48:08 - POST RunInstances Request
    2012-10-11 21:48:08 - RPC RR CALL ‘scheduler’
    2012-10-11 21:48:32 - POST Response (24 seconds)

    nova-scheduler.log
    2012-10-11 21:48:08                                   -     Attempting to build 100 instance(s)
    2012-10-11 21:48:09                                   -     Last VM filtered and weighted
    ...                                                   -     Write to DB, and Cast (19 seconds)
    2012-10-11 21:48:28                                   -     Respond to api CALL

    mysql.log
    filter and weighting:   733 DB calls, 0.07151 seconds
    Write to DB, and Cast: 3563 DB Calls, 1.77971 seconds



              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     22

Wednesday, October 17, 12
Problem #2: Slow
     24 seconds in the life of nova

      • DB is fast
      • filter and weight is fast
      • Python code around DB is slow
         •SQLAlchemy?




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     23

Wednesday, October 17, 12
Folsom




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     24

Wednesday, October 17, 12
Folsom
      •    RPC.Call to scheduler is now RPC.Cast to scheduler
                   •
                 API server doesn’t wait for scheduling
      •    Improved DB access patterns
                   •
                 Essex: 29 DB calls in nova/scheduler/driver.py
                   •
                 Folsom: 12 DB calls in nova/scheduler/driver.py




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     25

Wednesday, October 17, 12
Questions?




              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     26

Wednesday, October 17, 12
Raw Data
     goal: show how much time spent where
     setup: essex filter scheduler, 2 compute nodes

     $ time euca-run-instances -t m1.tiny -n 50
     real    0m12.183s
     user    0m0.104s
     sys     0m0.020s

     $ time euca-run-instances -t m1.tiny -n 100
     real    0m25.250s
     user    0m0.100s
     sys     0m0.020s

     req-d54b7cfa-fd51-4a04-92f2-5509c8411ed4

     nova-api.log
     2012-10-11 21:48:08 - POST RunInstances Request
     2012-10-11 21:48:08 - RPC RR CALL ‘scheduler’
     2012-10-11 21:48:32 - POST response

     Note: show euca -n * loop in nova-scheduler (1 RPC to scheduler 100 separate schedule commands)

     nova-scheduler.log
     2012-10-11 21:48:08           -   Attempting to build 100 instance(s)
     2012-10-11 21:48:09           -   Last VM filtered wand weighted
     ...                           -   Cast to compute, block_device_mapping etc.
     2012-10-11 21:48:28           -   respond to api CALL

     mysql.log
     filter and weighting: 733 DB calls, 0.071512 seconds
     cast to compute, block_device_mapping, security groups: 3563 DB Calls, 1.77971



              CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.*
              * All unlicensed or borrowed works retain their original licenses.                     27

Wednesday, October 17, 12

More Related Content

What's hot

Installation Openstack Swift
Installation Openstack SwiftInstallation Openstack Swift
Installation Openstack Swiftymtech
 
Varnish @ Velocity Ignite
Varnish @ Velocity IgniteVarnish @ Velocity Ignite
Varnish @ Velocity IgniteArtur Bergman
 
Corpus collapsum: Partition tolerance of Galera in a noisy high load environment
Corpus collapsum: Partition tolerance of Galera in a noisy high load environmentCorpus collapsum: Partition tolerance of Galera in a noisy high load environment
Corpus collapsum: Partition tolerance of Galera in a noisy high load environmentRaghavendra Prabhu
 
A git workflow for Drupal Core development
A git workflow for Drupal Core developmentA git workflow for Drupal Core development
A git workflow for Drupal Core developmentCameron Tod
 
Mens jan piet_dnssec-in-practice
Mens jan piet_dnssec-in-practiceMens jan piet_dnssec-in-practice
Mens jan piet_dnssec-in-practicekuchinskaya
 
BIND 9 logging best practices
BIND 9 logging best practicesBIND 9 logging best practices
BIND 9 logging best practicesMen and Mice
 
Understanding docker networking
Understanding docker networkingUnderstanding docker networking
Understanding docker networkingLorenzo Fontana
 
Oracle 10g Performance: chapter 05 waits intro
Oracle 10g Performance: chapter 05 waits introOracle 10g Performance: chapter 05 waits intro
Oracle 10g Performance: chapter 05 waits introKyle Hailey
 
青云CoreOS虚拟机部署kubernetes
青云CoreOS虚拟机部署kubernetes 青云CoreOS虚拟机部署kubernetes
青云CoreOS虚拟机部署kubernetes Zhichao Liang
 

What's hot (12)

Installation Openstack Swift
Installation Openstack SwiftInstallation Openstack Swift
Installation Openstack Swift
 
Docker practice
Docker practiceDocker practice
Docker practice
 
Varnish @ Velocity Ignite
Varnish @ Velocity IgniteVarnish @ Velocity Ignite
Varnish @ Velocity Ignite
 
Ceph issue 해결 사례
Ceph issue 해결 사례Ceph issue 해결 사례
Ceph issue 해결 사례
 
Corpus collapsum: Partition tolerance of Galera in a noisy high load environment
Corpus collapsum: Partition tolerance of Galera in a noisy high load environmentCorpus collapsum: Partition tolerance of Galera in a noisy high load environment
Corpus collapsum: Partition tolerance of Galera in a noisy high load environment
 
A git workflow for Drupal Core development
A git workflow for Drupal Core developmentA git workflow for Drupal Core development
A git workflow for Drupal Core development
 
Mens jan piet_dnssec-in-practice
Mens jan piet_dnssec-in-practiceMens jan piet_dnssec-in-practice
Mens jan piet_dnssec-in-practice
 
Kubernetes 1001
Kubernetes 1001Kubernetes 1001
Kubernetes 1001
 
BIND 9 logging best practices
BIND 9 logging best practicesBIND 9 logging best practices
BIND 9 logging best practices
 
Understanding docker networking
Understanding docker networkingUnderstanding docker networking
Understanding docker networking
 
Oracle 10g Performance: chapter 05 waits intro
Oracle 10g Performance: chapter 05 waits introOracle 10g Performance: chapter 05 waits intro
Oracle 10g Performance: chapter 05 waits intro
 
青云CoreOS虚拟机部署kubernetes
青云CoreOS虚拟机部署kubernetes 青云CoreOS虚拟机部署kubernetes
青云CoreOS虚拟机部署kubernetes
 

Viewers also liked

Policy-driven, Platform-aware Nova Scheduler
Policy-driven, Platform-aware Nova SchedulerPolicy-driven, Platform-aware Nova Scheduler
Policy-driven, Platform-aware Nova SchedulerRam (Ramki) Krishnan
 
The Lie of a Benevolent Dictator; the Truth of a Working Democratic Meritocracy
The Lie of a Benevolent Dictator; the Truth of a Working Democratic MeritocracyThe Lie of a Benevolent Dictator; the Truth of a Working Democratic Meritocracy
The Lie of a Benevolent Dictator; the Truth of a Working Democratic MeritocracyRandy Bias
 
Connect Expo 2015 - Australia - Bringing OpenStack into the Enterprise
Connect Expo 2015 - Australia - Bringing OpenStack into the EnterpriseConnect Expo 2015 - Australia - Bringing OpenStack into the Enterprise
Connect Expo 2015 - Australia - Bringing OpenStack into the EnterpriseRandy Bias
 
Networking is NOT Free: Lessons in Network Design
Networking is NOT Free: Lessons in Network DesignNetworking is NOT Free: Lessons in Network Design
Networking is NOT Free: Lessons in Network DesignRandy Bias
 
State of the Stack v4 - OpenStack in All It's Glory
State of the Stack v4 - OpenStack in All It's GloryState of the Stack v4 - OpenStack in All It's Glory
State of the Stack v4 - OpenStack in All It's GloryRandy Bias
 
Openstack Study Nova 1
Openstack Study Nova 1Openstack Study Nova 1
Openstack Study Nova 1Jinho Shin
 
Openstack Scheduler and Scalability Issue
Openstack Scheduler and Scalability IssueOpenstack Scheduler and Scalability Issue
Openstack Scheduler and Scalability IssueVigneshvar A.S
 
The Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud SummitThe Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud SummitRandy Bias
 
OpenStack Cinder Overview - Havana Release
OpenStack Cinder Overview - Havana ReleaseOpenStack Cinder Overview - Havana Release
OpenStack Cinder Overview - Havana ReleaseAvishay Traeger
 
OpenStack Nova Scheduler
OpenStack Nova Scheduler OpenStack Nova Scheduler
OpenStack Nova Scheduler Peeyush Gupta
 
OpenStack Architecture
OpenStack ArchitectureOpenStack Architecture
OpenStack ArchitectureMirantis
 
The History of Pets vs. Cattle ... And Using It Properly
The History of Pets vs. Cattle ... And Using It ProperlyThe History of Pets vs. Cattle ... And Using It Properly
The History of Pets vs. Cattle ... And Using It ProperlyRandy Bias
 

Viewers also liked (13)

Policy-driven, Platform-aware Nova Scheduler
Policy-driven, Platform-aware Nova SchedulerPolicy-driven, Platform-aware Nova Scheduler
Policy-driven, Platform-aware Nova Scheduler
 
The Lie of a Benevolent Dictator; the Truth of a Working Democratic Meritocracy
The Lie of a Benevolent Dictator; the Truth of a Working Democratic MeritocracyThe Lie of a Benevolent Dictator; the Truth of a Working Democratic Meritocracy
The Lie of a Benevolent Dictator; the Truth of a Working Democratic Meritocracy
 
Connect Expo 2015 - Australia - Bringing OpenStack into the Enterprise
Connect Expo 2015 - Australia - Bringing OpenStack into the EnterpriseConnect Expo 2015 - Australia - Bringing OpenStack into the Enterprise
Connect Expo 2015 - Australia - Bringing OpenStack into the Enterprise
 
Networking is NOT Free: Lessons in Network Design
Networking is NOT Free: Lessons in Network DesignNetworking is NOT Free: Lessons in Network Design
Networking is NOT Free: Lessons in Network Design
 
State of the Stack v4 - OpenStack in All It's Glory
State of the Stack v4 - OpenStack in All It's GloryState of the Stack v4 - OpenStack in All It's Glory
State of the Stack v4 - OpenStack in All It's Glory
 
Openstack Study Nova 1
Openstack Study Nova 1Openstack Study Nova 1
Openstack Study Nova 1
 
Openstack Scheduler and Scalability Issue
Openstack Scheduler and Scalability IssueOpenstack Scheduler and Scalability Issue
Openstack Scheduler and Scalability Issue
 
The Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud SummitThe Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud Summit
 
OpenStack Cinder Overview - Havana Release
OpenStack Cinder Overview - Havana ReleaseOpenStack Cinder Overview - Havana Release
OpenStack Cinder Overview - Havana Release
 
OpenStack Nova Scheduler
OpenStack Nova Scheduler OpenStack Nova Scheduler
OpenStack Nova Scheduler
 
OpenStack Architecture
OpenStack ArchitectureOpenStack Architecture
OpenStack Architecture
 
The History of Pets vs. Cattle ... And Using It Properly
The History of Pets vs. Cattle ... And Using It ProperlyThe History of Pets vs. Cattle ... And Using It Properly
The History of Pets vs. Cattle ... And Using It Properly
 
Openstack的研究与实践
Openstack的研究与实践Openstack的研究与实践
Openstack的研究与实践
 

Similar to OpenStack Summit :: Profiling the Nova Scheduler

Hitchhikers guide to open stack toolchains
Hitchhikers guide to open stack toolchainsHitchhikers guide to open stack toolchains
Hitchhikers guide to open stack toolchainsstagr_lee
 
Reusing your existing software on Android
Reusing your existing software on AndroidReusing your existing software on Android
Reusing your existing software on AndroidTetsuyuki Kobayashi
 
Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...
Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...
Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...Codemotion
 
Kafka at half the price with JBOD setup
Kafka at half the price with JBOD setupKafka at half the price with JBOD setup
Kafka at half the price with JBOD setupDong Lin
 
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...Laurent Bernaille
 
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021Altinity Ltd
 
Docker, the Future of DevOps
Docker, the Future of DevOpsDocker, the Future of DevOps
Docker, the Future of DevOpsandersjanmyr
 
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + KubernetesMongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + KubernetesMongoDB
 
XPDS13: Xen and XenServer Storage Performance - Felipe Franciosi, Citrix
XPDS13: Xen and XenServer Storage Performance - Felipe Franciosi, CitrixXPDS13: Xen and XenServer Storage Performance - Felipe Franciosi, Citrix
XPDS13: Xen and XenServer Storage Performance - Felipe Franciosi, CitrixThe Linux Foundation
 
Why Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container TechnologyWhy Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container TechnologySagi Brody
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes NetworkingCJ Cullen
 
Scaling Docker Containers using Kubernetes and Azure Container Service
Scaling Docker Containers using Kubernetes and Azure Container ServiceScaling Docker Containers using Kubernetes and Azure Container Service
Scaling Docker Containers using Kubernetes and Azure Container ServiceBen Hall
 
What’s New in Docker - Victor Vieux, Docker
What’s New in Docker - Victor Vieux, DockerWhat’s New in Docker - Victor Vieux, Docker
What’s New in Docker - Victor Vieux, DockerDocker, Inc.
 
Networking in Kubernetes
Networking in KubernetesNetworking in Kubernetes
Networking in KubernetesMinhan Xia
 
Open Sourcing the Secret Sauce
Open Sourcing the Secret SauceOpen Sourcing the Secret Sauce
Open Sourcing the Secret SauceFITC
 
New bare-metal provisioning setup built around Collins
New bare-metal provisioning setup built around CollinsNew bare-metal provisioning setup built around Collins
New bare-metal provisioning setup built around Collinsleboncoin engineering
 
Keep Them out of the Database
Keep Them out of the DatabaseKeep Them out of the Database
Keep Them out of the DatabaseMartin Berger
 

Similar to OpenStack Summit :: Profiling the Nova Scheduler (20)

Pimp My Cloud.pdf
Pimp My Cloud.pdfPimp My Cloud.pdf
Pimp My Cloud.pdf
 
Hitchhikers guide to open stack toolchains
Hitchhikers guide to open stack toolchainsHitchhikers guide to open stack toolchains
Hitchhikers guide to open stack toolchains
 
Reusing your existing software on Android
Reusing your existing software on AndroidReusing your existing software on Android
Reusing your existing software on Android
 
Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...
Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...
Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...
 
Kafka at half the price with JBOD setup
Kafka at half the price with JBOD setupKafka at half the price with JBOD setup
Kafka at half the price with JBOD setup
 
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
 
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
 
Docker, the Future of DevOps
Docker, the Future of DevOpsDocker, the Future of DevOps
Docker, the Future of DevOps
 
Fisl10 adenilson08
Fisl10 adenilson08Fisl10 adenilson08
Fisl10 adenilson08
 
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + KubernetesMongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
 
XPDS13: Xen and XenServer Storage Performance - Felipe Franciosi, Citrix
XPDS13: Xen and XenServer Storage Performance - Felipe Franciosi, CitrixXPDS13: Xen and XenServer Storage Performance - Felipe Franciosi, Citrix
XPDS13: Xen and XenServer Storage Performance - Felipe Franciosi, Citrix
 
Why Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container TechnologyWhy Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container Technology
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes Networking
 
Scaling Docker Containers using Kubernetes and Azure Container Service
Scaling Docker Containers using Kubernetes and Azure Container ServiceScaling Docker Containers using Kubernetes and Azure Container Service
Scaling Docker Containers using Kubernetes and Azure Container Service
 
What’s New in Docker - Victor Vieux, Docker
What’s New in Docker - Victor Vieux, DockerWhat’s New in Docker - Victor Vieux, Docker
What’s New in Docker - Victor Vieux, Docker
 
Networking in Kubernetes
Networking in KubernetesNetworking in Kubernetes
Networking in Kubernetes
 
Why I Like Hardware Hacking (and if you haven't tried it, here's a few tips o...
Why I Like Hardware Hacking (and if you haven't tried it, here's a few tips o...Why I Like Hardware Hacking (and if you haven't tried it, here's a few tips o...
Why I Like Hardware Hacking (and if you haven't tried it, here's a few tips o...
 
Open Sourcing the Secret Sauce
Open Sourcing the Secret SauceOpen Sourcing the Secret Sauce
Open Sourcing the Secret Sauce
 
New bare-metal provisioning setup built around Collins
New bare-metal provisioning setup built around CollinsNew bare-metal provisioning setup built around Collins
New bare-metal provisioning setup built around Collins
 
Keep Them out of the Database
Keep Them out of the DatabaseKeep Them out of the Database
Keep Them out of the Database
 

More from Randy Bias

Services are the New Cloud Platform (Services-as-a-Platform)
Services are the New Cloud Platform (Services-as-a-Platform)Services are the New Cloud Platform (Services-as-a-Platform)
Services are the New Cloud Platform (Services-as-a-Platform)Randy Bias
 
Rebooting the OpenContrail Community
Rebooting the OpenContrail CommunityRebooting the OpenContrail Community
Rebooting the OpenContrail CommunityRandy Bias
 
OpenStack Architected Like AWS (and GCP)
OpenStack Architected Like AWS (and GCP)OpenStack Architected Like AWS (and GCP)
OpenStack Architected Like AWS (and GCP)Randy Bias
 
OpenStack Scale-out Networking Architecture
OpenStack Scale-out Networking ArchitectureOpenStack Scale-out Networking Architecture
OpenStack Scale-out Networking ArchitectureRandy Bias
 
Pets vs. Cattle: The Elastic Cloud Story
Pets vs. Cattle: The Elastic Cloud StoryPets vs. Cattle: The Elastic Cloud Story
Pets vs. Cattle: The Elastic Cloud StoryRandy Bias
 
SFBay OpenStack Meetup // Neutron and SDN in Production – Dec 3 2013
SFBay OpenStack Meetup // Neutron and SDN in Production – Dec 3 2013SFBay OpenStack Meetup // Neutron and SDN in Production – Dec 3 2013
SFBay OpenStack Meetup // Neutron and SDN in Production – Dec 3 2013Randy Bias
 
AWS Repatriation: Bring Your Apps Back
AWS Repatriation: Bring Your Apps BackAWS Repatriation: Bring Your Apps Back
AWS Repatriation: Bring Your Apps BackRandy Bias
 
State of the Stack v2
State of the Stack v2State of the Stack v2
State of the Stack v2Randy Bias
 
Scale-Out Block Storage
Scale-Out Block StorageScale-Out Block Storage
Scale-Out Block StorageRandy Bias
 
State of the Stack April 2013
State of the Stack April 2013State of the Stack April 2013
State of the Stack April 2013Randy Bias
 
Open Cloud System Networking Vision
Open Cloud System Networking VisionOpen Cloud System Networking Vision
Open Cloud System Networking VisionRandy Bias
 
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"Randy Bias
 
2012 open storage summit keynote
2012 open storage summit   keynote2012 open storage summit   keynote
2012 open storage summit keynoteRandy Bias
 
Distributed RPC in Nova with ZeroMQ
Distributed RPC in Nova with ZeroMQDistributed RPC in Nova with ZeroMQ
Distributed RPC in Nova with ZeroMQRandy Bias
 
Architectures for open and scalable clouds
Architectures for open and scalable cloudsArchitectures for open and scalable clouds
Architectures for open and scalable cloudsRandy Bias
 
Cloud Frontiers 2011
Cloud Frontiers 2011Cloud Frontiers 2011
Cloud Frontiers 2011Randy Bias
 
Is There Such a Thing as a Private Cloud? Citrix Synergy 2011
Is There Such a Thing as a Private Cloud? Citrix Synergy 2011Is There Such a Thing as a Private Cloud? Citrix Synergy 2011
Is There Such a Thing as a Private Cloud? Citrix Synergy 2011Randy Bias
 
Carrier Cloud Opportunity - TM Forum Management World Dublin 2011
Carrier Cloud Opportunity - TM Forum Management World Dublin 2011Carrier Cloud Opportunity - TM Forum Management World Dublin 2011
Carrier Cloud Opportunity - TM Forum Management World Dublin 2011Randy Bias
 
Enterprise Cloud Myth(s)
Enterprise Cloud Myth(s)Enterprise Cloud Myth(s)
Enterprise Cloud Myth(s)Randy Bias
 
State Of The Cloud - Lightning Talk
State Of The Cloud - Lightning TalkState Of The Cloud - Lightning Talk
State Of The Cloud - Lightning TalkRandy Bias
 

More from Randy Bias (20)

Services are the New Cloud Platform (Services-as-a-Platform)
Services are the New Cloud Platform (Services-as-a-Platform)Services are the New Cloud Platform (Services-as-a-Platform)
Services are the New Cloud Platform (Services-as-a-Platform)
 
Rebooting the OpenContrail Community
Rebooting the OpenContrail CommunityRebooting the OpenContrail Community
Rebooting the OpenContrail Community
 
OpenStack Architected Like AWS (and GCP)
OpenStack Architected Like AWS (and GCP)OpenStack Architected Like AWS (and GCP)
OpenStack Architected Like AWS (and GCP)
 
OpenStack Scale-out Networking Architecture
OpenStack Scale-out Networking ArchitectureOpenStack Scale-out Networking Architecture
OpenStack Scale-out Networking Architecture
 
Pets vs. Cattle: The Elastic Cloud Story
Pets vs. Cattle: The Elastic Cloud StoryPets vs. Cattle: The Elastic Cloud Story
Pets vs. Cattle: The Elastic Cloud Story
 
SFBay OpenStack Meetup // Neutron and SDN in Production – Dec 3 2013
SFBay OpenStack Meetup // Neutron and SDN in Production – Dec 3 2013SFBay OpenStack Meetup // Neutron and SDN in Production – Dec 3 2013
SFBay OpenStack Meetup // Neutron and SDN in Production – Dec 3 2013
 
AWS Repatriation: Bring Your Apps Back
AWS Repatriation: Bring Your Apps BackAWS Repatriation: Bring Your Apps Back
AWS Repatriation: Bring Your Apps Back
 
State of the Stack v2
State of the Stack v2State of the Stack v2
State of the Stack v2
 
Scale-Out Block Storage
Scale-Out Block StorageScale-Out Block Storage
Scale-Out Block Storage
 
State of the Stack April 2013
State of the Stack April 2013State of the Stack April 2013
State of the Stack April 2013
 
Open Cloud System Networking Vision
Open Cloud System Networking VisionOpen Cloud System Networking Vision
Open Cloud System Networking Vision
 
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
 
2012 open storage summit keynote
2012 open storage summit   keynote2012 open storage summit   keynote
2012 open storage summit keynote
 
Distributed RPC in Nova with ZeroMQ
Distributed RPC in Nova with ZeroMQDistributed RPC in Nova with ZeroMQ
Distributed RPC in Nova with ZeroMQ
 
Architectures for open and scalable clouds
Architectures for open and scalable cloudsArchitectures for open and scalable clouds
Architectures for open and scalable clouds
 
Cloud Frontiers 2011
Cloud Frontiers 2011Cloud Frontiers 2011
Cloud Frontiers 2011
 
Is There Such a Thing as a Private Cloud? Citrix Synergy 2011
Is There Such a Thing as a Private Cloud? Citrix Synergy 2011Is There Such a Thing as a Private Cloud? Citrix Synergy 2011
Is There Such a Thing as a Private Cloud? Citrix Synergy 2011
 
Carrier Cloud Opportunity - TM Forum Management World Dublin 2011
Carrier Cloud Opportunity - TM Forum Management World Dublin 2011Carrier Cloud Opportunity - TM Forum Management World Dublin 2011
Carrier Cloud Opportunity - TM Forum Management World Dublin 2011
 
Enterprise Cloud Myth(s)
Enterprise Cloud Myth(s)Enterprise Cloud Myth(s)
Enterprise Cloud Myth(s)
 
State Of The Cloud - Lightning Talk
State Of The Cloud - Lightning TalkState Of The Cloud - Lightning Talk
State Of The Cloud - Lightning Talk
 

OpenStack Summit :: Profiling the Nova Scheduler

  • 1. Profiling the Nova Scheduler A Case Study Joe Gordon CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 1 Wednesday, October 17, 12
  • 2. About Me • Engineer at Cloudscaling • Contributor • Deployer • Folsom Contributions o Top 10 developer (by commits) o Mostly in Nova http://bitergia.com/public/reports/openstack/2012_09_folsom/ CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 2 Wednesday, October 17, 12
  • 3. By the Numbers: Nova Folsom • 190+ Contributors • Release Python Lines Other Lines Python Files Other Files Folsom 186,738 242,721 666 788 Essex 150,894 221,109 593 302 Diablo 110,581 110,393 427 389 • Code churn: Release Lines Insertions Lines Deletions Insertions/LoC % Folsom 110,308 71,911 59.0% Essex 182,298 138,346 120.8% Code churn generated with git log --numstat --pretty="%H" $A..$B| grep .py$ | awk 'NF==3 {plus+=$1; minus+=$2} END {printf("+%d, -%dn", plus, minus)}' CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 3 Wednesday, October 17, 12
  • 4. Who Wrote Nova Folsom? 1 138 Russell Bryant <rbryant@redhat.com> 2 112 Johannes Erdfelt <johannes.erdfelt@rackspace.com> 3 97 Dan Prince <dprince@redhat.com> 4 88 Vishvananda Ishaya <vishvananda@gmail.com> 5 81 Joe Gordon <jogo@cloudscaling.com> 6 63 Michael Still <mikal@stillhq.com> 7 59 Mark McLoughlin <markmc@redhat.com> 8 58 Rick Harris <rconradharris@gmail.com> 9 50 Yun Mao <yunmao@gmail.com> 10 45 Daniel P. Berrange <berrange@redhat.com> 11 36 Chris Behrens <cbehrens@codestud.com> 12 31 Eoghan Glynn <eglynn@redhat.com> 13 29 Brian Waldon <brian.waldon@rackspace.com> 14 26 Pádraig Brady <pbrady@redhat.com> 15 25 Chuck Short <zulcss@ubuntu.com> 16 23 Sean Dague <sdague@linux.vnet.ibm.com> 17 21 Alex Meade <alex.meade@rackspace.com> 18 18 Kevin L. Mitchell <kevin.mitchell@rackspace.com> 19 17 Brian Elliott <brian.elliott@rackspace.com> 20 17 Zhongyue Luo <zhongyue.nah@intel.com> 21 16 John Griffith <john.griffith@solidfire.com> 22 13 Dan Smith <danms@us.ibm.com> 23 13 Andrew Bogott <abogott@wikimedia.org> 24 12 Renuka Apte <renuka.apte@citrix.com> 25 12 Thierry Carrez <thierry@openstack.org> git shortlog -sne --since="Tue Mar 20 26 12 Monty Taylor <mordred@inaugust.com> 08:17:40 2012 +0100" --no-merges | cat -n 27 10 MotoKen <motokentsai@gmail.com> CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 4 Wednesday, October 17, 12
  • 5. Problems CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 5 Wednesday, October 17, 12
  • 6. Problems euca-run-instances -n 1000 Environment: default nova essex scheduler 1. When trying to schedule 1000 VMs, more than 1000 VMs were scheduled. Why were too many VMs scheduled? 2. Scheduling 100 VMs took 24 seconds. Why is the scheduler so slow? CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 6 Wednesday, October 17, 12
  • 7. Outline • How the nova essex scheduler works • P1: Why were too many VMs scheduled? • Nova performance • P2: Why is the essex scheduler is so slow? • How the folsom scheduler performs CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 7 Wednesday, October 17, 12
  • 8. How the nova essex scheduler works CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 8 Wednesday, October 17, 12
  • 9. Scheduling CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 9 Wednesday, October 17, 12
  • 10. Scheduling CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 9 Wednesday, October 17, 12
  • 11. Scheduling CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 9 Wednesday, October 17, 12
  • 12. Scheduling CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 9 Wednesday, October 17, 12
  • 13. Scheduling CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 9 Wednesday, October 17, 12
  • 14. Scheduling CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 9 Wednesday, October 17, 12
  • 15. Scheduling CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 9 Wednesday, October 17, 12
  • 16. Default Scheduler Filter Scheduler http://docs.openstack.org/developer/nova/devref/filter_scheduler.html CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 10 Wednesday, October 17, 12
  • 17. Limitations • Cannot run multiple schedulers • Race condition • Fixed in folsom • O(nm) • n = number of nodes • m = number of VMs requested CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 11 Wednesday, October 17, 12
  • 18. Problem 1: Why were too many VMs scheduled? CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 12 Wednesday, October 17, 12
  • 19. Problem #1: Too Many VMs • scheduling is slow • RPC call instead of cast • boto retries upon API timeout $ time euca-run-instances -t m1.tiny -n 100 real 0m25.250s user 0m0.100s sys 0m0.020s CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 13 Wednesday, October 17, 12
  • 20. nova/compute/api.py rpc.call instead of rpc.cast         # We can create the DB entry for the instance here if we're         # only going to create 1 instance.         # This speeds up API responses for builds         # as we don't need to wait for the scheduler.         create_instance_here = max_count == 1     def _create_instance(...):         """Verify all the input parameters regardless of the provisioning strategy being performed and schedule the instance(s) for creation.""" ...             # We need to wait for the scheduler to create the instance             # DB entries, because the instance *could* be # created in             # a child zone.             rpc_method = rpc.call         # TODO(comstud): We should use rpc.multicall when we can         # retrieve the full instance dictionary from the scheduler.         # Otherwise, we could exceed the AMQP max message size limit.         # This would require the schedulers' schedule_run_instances         # methods to return an iterator vs a list.         instances = self._schedule_run_instance(                 rpc_method,                 context, base_options,                 instance_type,                 availability_zone, injected_files,                 admin_password, image,                 num_instances, requested_networks,                 block_device_mapping, security_group,                 filter_properties) CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 14 Wednesday, October 17, 12
  • 21. nova/scheduler/filter_scheduler.py each VM is scheduled individually     def schedule_run_instance(self, context, request_spec, *args, **kwargs):         """This method is called from nova.compute.api to provision an instance. We first create a build plan (a list of WeightedHosts) and then provision. Returns a list of the instances created. """ ...         for num in xrange(num_instances):             if not weighted_hosts:                 break             weighted_host = weighted_hosts.pop(0)             request_spec['instance_properties']['launch_index'] = num             instance = self._provision_resource(elevated, weighted_host,                                                 request_spec, kwargs)             if instance:                 instances.append(instance) CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 15 Wednesday, October 17, 12
  • 22. Nova CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 16 Wednesday, October 17, 12
  • 23. Retries EC2 User CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 17 Wednesday, October 17, 12
  • 24. Retries EC2 Nova EC2 User API CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 17 Wednesday, October 17, 12
  • 25. Retries Pound EC2 Nova EC2 User Timeout=15 API CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 17 Wednesday, October 17, 12
  • 26. Retries Pound EC2 Nova EC2 User Timeout=15 API euca-run-instances -n 100 CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 17 Wednesday, October 17, 12
  • 27. Retries Pound EC2 Nova EC2 User Timeout=15 API euca-run-instances -n 100 Slow CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 17 Wednesday, October 17, 12
  • 28. Retries Pound EC2 Nova EC2 User Timeout=15 API Send Slow euca-run-instances -n 100 Timeout CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 17 Wednesday, October 17, 12
  • 29. Retries Pound EC2 Nova EC2 User Timeout=15 API Send Slow euca-run-instances -n 100 Timeout Retry CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 17 Wednesday, October 17, 12
  • 30. Nova Performance CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 18 Wednesday, October 17, 12
  • 31. Nova Performance • How eventlet changes everything • coroutines • interplay between MySQL and eventlet • RPC • CPU frequency scaling CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 19 Wednesday, October 17, 12
  • 32. Nova Logs logging format: %(asctime)s %(levelname)s %(name)s [%(request_id)s %(user_id)s %(project_id)s] Example: 2012-10-11 21:48:08 DEBUG nova.api.ec2 [req-d54b7cfa-fd51-4a04-92f2-5509c8411ed4 d97c90fd1a4f49e7b7fb38f4d358f4dd a6f5079f10464a6f9aefd1b2e9d5aab3] action: RunInstances from (pid=30579) __call__ /usr/local/lib/python2.7/dist-packages/ nova-2012.1.3-py2.7.egg/nova/api/ec2/__init__.py:435 CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 20 Wednesday, October 17, 12
  • 33. Why is scheduling so slow? 24 seconds in the life of nova CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 21 Wednesday, October 17, 12
  • 34. Problem #2: Slow 24 seconds in the life of nova nova-api.log 2012-10-11 21:48:08 - POST RunInstances Request 2012-10-11 21:48:08 - RPC RR CALL ‘scheduler’ 2012-10-11 21:48:32 - POST Response (24 seconds) nova-scheduler.log 2012-10-11 21:48:08 - Attempting to build 100 instance(s) 2012-10-11 21:48:09 - Last VM filtered and weighted ... - Write to DB, and Cast (19 seconds) 2012-10-11 21:48:28 - Respond to api CALL mysql.log filter and weighting: 733 DB calls, 0.07151 seconds Write to DB, and Cast: 3563 DB Calls, 1.77971 seconds CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 22 Wednesday, October 17, 12
  • 35. Problem #2: Slow 24 seconds in the life of nova • DB is fast • filter and weight is fast • Python code around DB is slow •SQLAlchemy? CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 23 Wednesday, October 17, 12
  • 36. Folsom CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 24 Wednesday, October 17, 12
  • 37. Folsom • RPC.Call to scheduler is now RPC.Cast to scheduler • API server doesn’t wait for scheduling • Improved DB access patterns • Essex: 29 DB calls in nova/scheduler/driver.py • Folsom: 12 DB calls in nova/scheduler/driver.py CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 25 Wednesday, October 17, 12
  • 38. Questions? CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 26 Wednesday, October 17, 12
  • 39. Raw Data goal: show how much time spent where setup: essex filter scheduler, 2 compute nodes $ time euca-run-instances -t m1.tiny -n 50 real 0m12.183s user 0m0.104s sys 0m0.020s $ time euca-run-instances -t m1.tiny -n 100 real 0m25.250s user 0m0.100s sys 0m0.020s req-d54b7cfa-fd51-4a04-92f2-5509c8411ed4 nova-api.log 2012-10-11 21:48:08 - POST RunInstances Request 2012-10-11 21:48:08 - RPC RR CALL ‘scheduler’ 2012-10-11 21:48:32 - POST response Note: show euca -n * loop in nova-scheduler (1 RPC to scheduler 100 separate schedule commands) nova-scheduler.log 2012-10-11 21:48:08 - Attempting to build 100 instance(s) 2012-10-11 21:48:09 - Last VM filtered wand weighted ... - Cast to compute, block_device_mapping etc. 2012-10-11 21:48:28 - respond to api CALL mysql.log filter and weighting: 733 DB calls, 0.071512 seconds cast to compute, block_device_mapping, security groups: 3563 DB Calls, 1.77971 CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution.* * All unlicensed or borrowed works retain their original licenses. 27 Wednesday, October 17, 12