SlideShare a Scribd company logo
1 of 49
Accelerating Science
  with OpenStack

The CERN User Story
          Tim Bell
      Tim.Bell@cern.ch
        @noggin143

 EMEA OpenStack Day, London
     5th December 2012
What is CERN ?
• Conseil Européen pour la
  Recherche Nucléaire – aka
  European Laboratory for
  Particle Physics
• Between Geneva and the
  Jura mountains, straddling
  the Swiss-French border
• Founded in 1954 with an
  international treaty
• Our business is fundamental
  physics , what is the
  universe made of and how
  does it work
  OpenStack London December 2012       Tim Bell, CERN   2
Answering fundamental questions…
• How to explain particles have mass?
   We have theories and accumulating experimental evidence.. Getting close…

• What is 96% of the universe made of ?
   We can only see 4% of its estimated mass!

• Why isn’t there anti-matter
  in the universe?
   Nature should be symmetric…

• What was the state of matter just
  after the « Big Bang » ?
   Travelling back to the earliest instants of
   the universe would help…

OpenStack London December 2012         Tim Bell, CERN                         3
Community collaboration on an international scale




OpenStack London December 2012   Tim Bell, CERN        4
The Large Hadron Collider




OpenStack London December 2012   Tim Bell, CERN   5
The Large Hadron Collider (LHC) tunnel




OpenStack London December 2012   Tim Bell, CERN
                                                  6
OpenStack London December 2012   Tim Bell, CERN   7
Accumulating events in 2009-2011




OpenStack London December 2012   Tim Bell, CERN    8
OpenStack London December 2012   Tim Bell, CERN   9
Heavy Ion Collisions




OpenStack London December 2012          Tim Bell, CERN   10
Data Acquisition and Trigger Farms




OpenStack London December 2012   Tim Bell, CERN     11
OpenStack London December 2012   Tim Bell, CERN   12
Tier-0 (CERN):
                                                               •Data recording
                                                               •Initial data reconstruction
                                                               •Data distribution



                                                              Tier-1 (11 centres):
                                                              •Permanent storage
                                                              •Re-processing
                                                              •Analysis


                                                              Tier-2 (~200 centres):
                                                              • Simulation
                                                              • End-user analysis


• Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid
• In a normal day, the grid provides 100,000 CPU days executing over 2 million jobs
   OpenStack London December 2012      Tim Bell, CERN                                13
•     Data Centre by Numbers
        – Hardware installation & retirement
                  •    ~7,000 hardware movements/year; ~1,800 disk failures/year


          Racks                              828    Disks                                  64,109         Tape Drives                 160
          Servers                         11,728    Raw disk capacity (TiB)                63,289         Tape Cartridges           45,000
          Processors                      15,694    Memory modules                         56,014         Tape slots                56,000
          Cores                           64,238    Memory capacity (TiB)                       158       Tape Capacity (TiB)       73,000
          HEPSpec06                      482,507    RAID controllers                           3,749
                                                                                                          High Speed Routers
                                                                                                                                       24
                       Xeon    Xeon Xeon                        Other Fujitsu                             (640 Mbps → 2.4 Tbps)
                       3GHz    5150 5160 Xeon                    0%    3%
       Xeon             4%      2% 10% E5335
                                                                                                          Ethernet Switches           350
       L5520                              7% Xeon                               Hitachi
        33%                                                                      23%                      10 Gbps ports              2,000
                                            E5345
                                             14%                                                 HP       Switching Capacity      4.8 Tbps
                                                                                     Seagate
                                                                                                 0%
                                                                                      15%
                                                                                                          1 Gbps ports              16,939
                                                                                                Maxtor
                                                    Western                                      0%       10 Gbps ports               558
                                           Xeon
               Xeon                                 Digital
                                           E5405
                               Xeon                  59%
               L5420                        6%                                                         IT Power Consumption       2,456 KW
                8%             E5410
                                16%
                                                                                                       Total Power Consumption    3,890 KW




    OpenStack London December 2012                            Tim Bell, CERN                                                                 14
OpenStack London December 2012   Tim Bell, CERN   15
Our Challenges - Data storage




                                                  •   >20 years retention
                                                  •   6GB/s average
                                                  •   25GB/s peaks
                                                  •   35PB/year recorded




OpenStack London December 2012   Tim Bell, CERN                             16
45,000 tapes holding 80PB of physics data




OpenStack London December 2012     Tim Bell, CERN              17
New data centre to expand capacity
                                                  • Data centre in Geneva
                                                    at the limit of
                                                    electrical capacity at
                                                    3.5MW
                                                  • New centre chosen in
                                                    Budapest, Hungary
                                                  • Additional 2.7MW of
                                                    usable power
                                                  • Hands off facility
                                                  • Deploying from 2013
                                                    with 200Gbit/s
OpenStack London December 2012   Tim Bell, CERN
                                                    network to CERN        18
Time to change strategy
• Rationale
      – Need to manage twice the servers as today
      – No increase in staff numbers
      – Tools becoming increasingly brittle and will not scale as-is
• Approach
      – CERN is no longer a special case for compute
      – Adopt an open source tool chain model
      – Our engineers rapidly iterate
            • Evaluate solutions in the problem domain
            • Identify functional gaps and challenge them
            • Select first choice but be prepared to change in future
      – Contribute new function back to the community

OpenStack London December 2012          Tim Bell, CERN                  19
Building Blocks
                                     mcollective, yum              Bamboo

      Puppet
                                                  AIMS/PXE
                                                   Foreman                  JIRA

   OpenStack
     Nova


                                                                                   git



                                                                            Koji, Mock
                                                   Yum repo
 Active Directory /                                  Pulp
       LDAP




                                                              Lemon /
        Hardware
                                                              Hadoop
        database
                                   Puppet-DB
OpenStack London December 2012         Tim Bell, CERN                                20
Training and Support
•   Buy the book rather than guru mentoring
•   Follow the mailing lists to learn
•   Newcomers are rapidly productive (and often know more than us)
•   Community and Enterprise support means we’re not on our own




OpenStack London December 2012    Tim Bell, CERN                     21
Staff Motivation
• Skills valuable outside of CERN when an engineer’s contracts
  end




OpenStack London December 2012   Tim Bell, CERN                  22
Prepare the move to the clouds
• Improve operational efficiency
      – Machine ordering, reception and testing
      – Hardware interventions with long running programs
      – Multiple operating system demand
• Improve resource efficiency
      – Exploit idle resources, especially waiting for disk and tape I/O
      – Highly variable load such as interactive or build machines
• Enable cloud architectures
      – Gradual migration to cloud interfaces and workflows
      – Autoscaling and scheduling
• Improve responsiveness
      – Self-Service with coffee break response time
OpenStack London December 2012       Tim Bell, CERN                        23
Public Procurement Purchase Model
Step                                     Time (Days)         Elapsed (Days)
User expresses requirement                                                     0
Market Survey prepared                                 15                     15
Market Survey for possible vendors                     30                     45
Specifications prepared                                15                     60
Vendor responses                                       30                     90
Test systems evaluated                                 30                     120
Offers adjudicated                                     10                     130
Finance committee                                      30                     160
Hardware delivered                                     90                     250
Burn in and acceptance                     30 days typical                    280
                                           380 worst case
Total                                                            280+ Days
OpenStack London December 2012       Tim Bell, CERN                                 24
Service Model
                                  • Pets are given names like
                                    pussinboots.cern.ch
                                  • They are unique, lovingly hand raised
                                    and cared for
                                  • When they get ill, you nurse them back
                                    to health

                                  • Cattle are given numbers like
                                    vm0042.cern.ch
                                  • They are almost identical to other cattle
                                  • When they get ill, you get another one



         • Future application architectures should use Cattle but Pets with
           strong configuration management are viable and still needed
OpenStack London December 2012        Tim Bell, CERN                            25
Current Status of OpenStack at CERN
• Working with Essex OpenStack release
      – Excellent experience with Fedora/RedHat team using EPEL packages
      – Started Folsom test environment in November
• Focusing on the compute side to start with
      – Nova compute with KVM and Hyper-V
      – Keystone identity now integrated with Active Directory
      – Replaced network layer with CERN code for our legacy network
        management system
• Current pre-production installation
      –   170 Hypervisors
      –   2,700 VMs
      –   3 DevOps part time running and enhancing the service
      –   Running production ‘cattle’ workloads for stress testing
OpenStack London December 2012       Tim Bell, CERN                        26
OpenStack London December 2012   Tim Bell, CERN   27
When communities combine…
• OpenStack’s many components and options make
  configuration complex out of the box
• Puppet forge module from PuppetLabs does our configuration
• The Foreman adds OpenStack provisioning for user kiosk to a
  configured machine in 15 minutes




OpenStack London December 2012   Tim Bell, CERN             28
Foreman to manage Puppetized VM




OpenStack London December 2012   Tim Bell, CERN   29
Opportunistic Clouds in online experiment farms
• The CERN experiments have farms of 1000s of Linux servers
  close to the detectors to filter the 1PByte/s down to 6GByte/s
  to be recorded to tape
• When the accelerator is not running, these machines are
  currently idle
      – Accelerator has regular maintenance slots of several days
      – Long Shutdown due from March 2013-November 2014
• One of the experiments have deployed OpenStack on their
  farm
      – Simulation (low I/O, high CPU)
      – Analysis (high I/O, high CPU, high network)


OpenStack London December 2012     Tim Bell, CERN                   30
Federated European Clouds
• Two significant European projects around Federated Clouds
      – European Grid Initiative Federated Cloud as a federation of grid sites providing
        IaaS
      – HELiX Nebula European Union funded project to create a scientific cloud
        based on commercial providers
                                                                   EGI Federated Cloud Sites
                                                        CESGA        CESNET       INFN         SARA


                                                        Cyfronet     FZ Jülich    SZTAKI       IPHC


                                                        GRIF         GRNET        KTH          Oxford


                                                        GWDG         IGI          TCD          IN2P3


                                                        STFC


OpenStack London December 2012         Tim Bell, CERN                                                   31
Ongoing Projects
• Production Preparation
      – Following High Availability recommendations from the community
      – Integrate monitoring with CERN frameworks
• Quota management with Boson
      – CERN does not have infinite compute resources or budget
      – Experiments are allocated a quota to split between their projects
      – Collaborating with the community to develop a distributed quota
        manager
• Block Storage Evaluation
      – Investigating Gluster, NetApp and Ceph integration for EBS
        functionality



OpenStack London December 2012     Tim Bell, CERN                           32
Going further forward
• Deployment to production
      – Planned for Q1 2013 based on Folsom
      – No legacy tools in the second data centre
• Exploit new functionality
      –   Ceilometer for metering
      –   Bare metal for non-virtualised use cases such as high I/O servers
      –   X.509 user certificate authentication
      –   Load balancing as a service
      –   Cells for scalability

Ramping to 15,000 hypervisors with
100,000 to 300,000 VMs by 2015
OpenStack London December 2012       Tim Bell, CERN                           33
Final Thoughts                     • A small project to share documents at
                                     CERN in the ‘90s created the massive
                                     phenomenon that is today’s world wide
                                     web
                                      • Open Source
                                      • Vibrant community and eco-system
                                   • Working with the Puppet and OpenStack
                                     communities has shown the power of
                                     collaboration
                                      • We have built a toolchain in one
                                          year with part time resources
                                      • Sharing with other organisations to
                                          achieve scale is the only
                                          economically feasible path
                                   • CERN contributes and benefits from
                                     contributions of others
OpenStack London December 2012   Tim Bell, CERN                         34
Questions ?




                   OpenStack London December
  Tim Bell, CERN                          35
                                        2012
References
CERN                                            http://public.web.cern.ch/public/
Scientific Linux                                http://www.scientificlinux.org/
Worldwide LHC Computing Grid                    http://wlcg.web.cern.ch/
Jobs                                            http://cern.ch/jobs
Detailed Report on Agile Infrastructure         http://cern.ch/go/N8wp
HELiX Nebula                                    http://helix-nebula.eu/
EGI Cloud Taskforce                             https://wiki.egi.eu/wiki/Fedcloud-tf




 OpenStack London December 2012           Tim Bell, CERN                               36
Backup Slides




OpenStack London December 2012   Tim Bell, CERN   37
OpenStack London December 2012   Tim Bell, CERN   38
CERN’s tools
• The world’s most powerful accelerator: LHC
      –   A 27 km long tunnel filled with high-tech instruments
      –   Equipped with thousands of superconducting magnets
      –   Accelerates particles to energies never before obtained
      –   Produces particle collisions creating microscopic “big bangs”
• Very large sophisticated detectors
      – Four experiments each the size of a cathedral
      – Hundred million measurement channels each
      – Data acquisition systems treating Petabytes per second
• Top level computing to distribute and analyse the data
      – A Computing Grid linking ~200 computer centres around the globe
      – Sufficient computing power and storage to handle 25 Petabytes per
        year, making them available to thousands of physicists for analysis
OpenStack London December 2012       Tim Bell, CERN                           39
Our Infrastructure
• Hardware is generally based on commodity, white-box servers
      – Open tendering process based on SpecInt/CHF, CHF/Watt and GB/CHF
      – Compute nodes typically dual processor, 2GB per core
      – Bulk storage on 24x2TB disk storage-in-a-box with a RAID card
• Vast majority of servers run Scientific Linux, developed by
  Fermilab and CERN, based on Redhat Enterprise
      – Focus is on stability in view of the number of centres on the WLCG




OpenStack London December 2012     Tim Bell, CERN                            40
New architecture data flows




OpenStack London December 2012   Tim Bell, CERN   41
500
                                                    1500
                                                                     2000
                                                                            2500
                                                                                   3000
                                                                                          3500




                                           1000




                                 0
 Mar-10

       Apr-10

May-10

         Jun-10

                 Jul-10

    Aug-10

      Sep-10

        Oct-10

   Nov-10




OpenStack London December 2012
     Dec-10

          Jan-11

      Feb-11

 Mar-11

       Apr-11

May-11

         Jun-11

                 Jul-11




Tim Bell, CERN
    Aug-11

      Sep-11

        Oct-11

   Nov-11

     Dec-11

          Jan-12

      Feb-12

 Mar-12

       Apr-12

May-12
                                                                                                 Virtualisation on SCVMM/Hyper-V




         Jun-12

                 Jul-12

    Aug-12

      Sep-12
42
                                                  Linux




        Oct-12
                                                           Windows
Scaling up with Puppet and OpenStack
• Use LHC@Home based on BOINC for simulating magnetics
  guiding particles around the LHC
• Naturally, there is a puppet module puppet-boinc
• 1000 VMs spun up to stress test the hypervisors with Puppet,
  Foreman and OpenStack




OpenStack London December 2012   Tim Bell, CERN                  43
Federated Cloud Commonalities
• Basic building blocks
      – Each site gives an IaaS endpoint with an API and common security
        policy
            • OCCI? CDMI ? Libcloud ? Jclouds ?
      – Image stores available across the sites
      – Federated identity management based on X.509 certificates
      – Consolidation of accounting information to validate pledges and usage
• Multiple cloud technologies
      – OpenStack
      – OpenNebula
      – Proprietary

OpenStack London December 2012         Tim Bell, CERN                       44
Supporting the Pets with OpenStack
• Network
      – CERN specific component to interface to our legacy network
        environment
• Configuration
      – Using Puppet with Puppetlabs modules for rapid deployment
• External Block Storage
      – Currently using nova-volume with Gluster backing store
• Live migration to maximise availability
      – KVM live migration using Gluster
      – KVM and Hyper-V block migration


OpenStack London December 2012     Tim Bell, CERN                    45
Active Directory Integration
• CERN’s Active Directory
     –   Unified identity management across the site
     –   44,000 users
     –   29,000 groups
     –   200 arrivals/departures per month
• Full integration with Active Directory via LDAP
     – Uses the OpenLDAP backend with some particular configuration
       settings
     – Aim for minimal changes to Active Directory
     – 7 patches submitted around hard coded values and additional filtering
• Now in use in our pre-production instance
     – Map project roles (admins, members) to groups
     – Documentation in the OpenStack wiki
OpenStack London December 2012      Tim Bell, CERN                         46
Welcome Back Hyper-V!
• We currently use Hyper-V/System Centre for our server
  consolidation activities
      – But need to scale to 100x current installation size
• Choice of hypervisors should be tactical
      – Performance
      – Compatibility/Support with integration components
      – Image migration from legacy environments
• CERN is working closely with the Hyper-V OpenStack team and
  Microsoft to arrive at parity
      – Puppet to configure hypervisors on Windows
      – Most functions work well but further work on Console, Ceilometer, …

OpenStack London December 2012      Tim Bell, CERN                            47
Active Directory Integration
• CERN’s Active Directory
     –   Unified identity management across the site
     –   44,000 users
     –   29,000 groups
     –   200 arrivals/departures per month
• Full integration with Active Directory via LDAP
     – Uses the OpenLDAP backend with some particular configuration
       settings
     – Aim for minimal changes to Active Directory
     – 7 patches submitted around hard coded values and additional filtering
• Now in use in our pre-production instance
     – Map project roles (admins, members) to groups
     – Documentation in the OpenStack wiki
OpenStack London December 2012      Tim Bell, CERN                         48
OpenStack London December 2012   Tim Bell, CERN   49

More Related Content

What's hot

Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Steve Loughran
 
Petabye scale data challenge
Petabye scale data challengePetabye scale data challenge
Petabye scale data challengeJason Shih
 
20121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.220121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.2Tim Bell
 
CERN Agile Infrastructure, Road to Production
CERN Agile Infrastructure, Road to ProductionCERN Agile Infrastructure, Road to Production
CERN Agile Infrastructure, Road to ProductionSteve Traylen
 
Evolution of OSCARS
Evolution of OSCARSEvolution of OSCARS
Evolution of OSCARSEd Dodds
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 HardwareJacob Wu
 
iMinds The Conference: Jan Lemeire
iMinds The Conference: Jan LemeireiMinds The Conference: Jan Lemeire
iMinds The Conference: Jan Lemeireimec
 
Tolly210137 Force10 Networks E1200i Energy
Tolly210137 Force10 Networks E1200i EnergyTolly210137 Force10 Networks E1200i Energy
Tolly210137 Force10 Networks E1200i EnergyChris O'Neal
 
Lug best practice_hpc_workflow
Lug best practice_hpc_workflowLug best practice_hpc_workflow
Lug best practice_hpc_workflowrjmurphyslideshare
 
How to Terminate the GLIF by Building a Campus Big Data Freeway System
How to Terminate the GLIF by Building a Campus Big Data Freeway SystemHow to Terminate the GLIF by Building a Campus Big Data Freeway System
How to Terminate the GLIF by Building a Campus Big Data Freeway SystemLarry Smarr
 
Puppet Camp CERN Geneva
Puppet Camp CERN GenevaPuppet Camp CERN Geneva
Puppet Camp CERN GenevaSteve Traylen
 

What's hot (14)

Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)
 
Sponge v2
Sponge v2Sponge v2
Sponge v2
 
Petabye scale data challenge
Petabye scale data challengePetabye scale data challenge
Petabye scale data challenge
 
20121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.220121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.2
 
CERN Agile Infrastructure, Road to Production
CERN Agile Infrastructure, Road to ProductionCERN Agile Infrastructure, Road to Production
CERN Agile Infrastructure, Road to Production
 
Evolution of OSCARS
Evolution of OSCARSEvolution of OSCARS
Evolution of OSCARS
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 Hardware
 
iMinds The Conference: Jan Lemeire
iMinds The Conference: Jan LemeireiMinds The Conference: Jan Lemeire
iMinds The Conference: Jan Lemeire
 
Tolly210137 Force10 Networks E1200i Energy
Tolly210137 Force10 Networks E1200i EnergyTolly210137 Force10 Networks E1200i Energy
Tolly210137 Force10 Networks E1200i Energy
 
Lug best practice_hpc_workflow
Lug best practice_hpc_workflowLug best practice_hpc_workflow
Lug best practice_hpc_workflow
 
Again music
Again musicAgain music
Again music
 
Lxcloud
LxcloudLxcloud
Lxcloud
 
How to Terminate the GLIF by Building a Campus Big Data Freeway System
How to Terminate the GLIF by Building a Campus Big Data Freeway SystemHow to Terminate the GLIF by Building a Campus Big Data Freeway System
How to Terminate the GLIF by Building a Campus Big Data Freeway System
 
Puppet Camp CERN Geneva
Puppet Camp CERN GenevaPuppet Camp CERN Geneva
Puppet Camp CERN Geneva
 

Similar to 20121205 open stack_accelerating_science_v3

20121017 OpenStack Accelerating Science
20121017 OpenStack Accelerating Science20121017 OpenStack Accelerating Science
20121017 OpenStack Accelerating ScienceTim Bell
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NACLarry Smarr
 
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Larry Smarr
 
How to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation SavingsHow to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation SavingsIsaac Christoffersen
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveTim Bell
 
Cern intro 2010-10-27-snw
Cern intro 2010-10-27-snwCern intro 2010-10-27-snw
Cern intro 2010-10-27-snwScott Adams
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdfhellobank1
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Danielle Womboldt
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Community
 
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...Larry Smarr
 
HPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorialHPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorialJeff Larkin
 
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM Research
 
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Heiko Joerg Schick
 
More Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit JunoMore Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit JunoKota Tsuyuzaki
 
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PROIDEA
 
Theta and the Future of Accelerator Programming
Theta and the Future of Accelerator ProgrammingTheta and the Future of Accelerator Programming
Theta and the Future of Accelerator Programminginside-BigData.com
 

Similar to 20121205 open stack_accelerating_science_v3 (20)

20121017 OpenStack Accelerating Science
20121017 OpenStack Accelerating Science20121017 OpenStack Accelerating Science
20121017 OpenStack Accelerating Science
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NAC
 
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
 
Mateo valero p1
Mateo valero p1Mateo valero p1
Mateo valero p1
 
How to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation SavingsHow to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation Savings
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspective
 
Cern intro 2010-10-27-snw
Cern intro 2010-10-27-snwCern intro 2010-10-27-snw
Cern intro 2010-10-27-snw
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
 
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
 
SGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 KievSGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 Kiev
 
LUG 2014
LUG 2014LUG 2014
LUG 2014
 
Chapter03
Chapter03Chapter03
Chapter03
 
HPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorialHPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorial
 
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
 
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
 
More Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit JunoMore Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit Juno
 
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
 
Theta and the Future of Accelerator Programming
Theta and the Future of Accelerator ProgrammingTheta and the Future of Accelerator Programming
Theta and the Future of Accelerator Programming
 

More from Tim Bell

CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring Tim Bell
 
CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019Tim Bell
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3Tim Bell
 
20190314 cern register v3
20190314 cern register v320190314 cern register v3
20190314 cern register v3Tim Bell
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3Tim Bell
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3Tim Bell
 
20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4Tim Bell
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicTim Bell
 
20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN BarcelonaTim Bell
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1Tim Bell
 
OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?Tim Bell
 
20141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v320141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v3Tim Bell
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014Tim Bell
 
20140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v320140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v3Tim Bell
 
Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4Tim Bell
 
CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013Tim Bell
 
20130529 openstack cee_day_v6
20130529 openstack cee_day_v620130529 openstack cee_day_v6
20130529 openstack cee_day_v6Tim Bell
 
Academic cloud experiences cern v4
Academic cloud experiences cern v4Academic cloud experiences cern v4
Academic cloud experiences cern v4Tim Bell
 
Ceilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitCeilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitTim Bell
 
Havana survey results-final-v2
Havana survey results-final-v2Havana survey results-final-v2
Havana survey results-final-v2Tim Bell
 

More from Tim Bell (20)

CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring
 
CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3
 
20190314 cern register v3
20190314 cern register v320190314 cern register v3
20190314 cern register v3
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack Nordic
 
20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1
 
OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?
 
20141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v320141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v3
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014
 
20140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v320140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v3
 
Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4
 
CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013
 
20130529 openstack cee_day_v6
20130529 openstack cee_day_v620130529 openstack cee_day_v6
20130529 openstack cee_day_v6
 
Academic cloud experiences cern v4
Academic cloud experiences cern v4Academic cloud experiences cern v4
Academic cloud experiences cern v4
 
Ceilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitCeilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summit
 
Havana survey results-final-v2
Havana survey results-final-v2Havana survey results-final-v2
Havana survey results-final-v2
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

20121205 open stack_accelerating_science_v3

  • 1. Accelerating Science with OpenStack The CERN User Story Tim Bell Tim.Bell@cern.ch @noggin143 EMEA OpenStack Day, London 5th December 2012
  • 2. What is CERN ? • Conseil Européen pour la Recherche Nucléaire – aka European Laboratory for Particle Physics • Between Geneva and the Jura mountains, straddling the Swiss-French border • Founded in 1954 with an international treaty • Our business is fundamental physics , what is the universe made of and how does it work OpenStack London December 2012 Tim Bell, CERN 2
  • 3. Answering fundamental questions… • How to explain particles have mass? We have theories and accumulating experimental evidence.. Getting close… • What is 96% of the universe made of ? We can only see 4% of its estimated mass! • Why isn’t there anti-matter in the universe? Nature should be symmetric… • What was the state of matter just after the « Big Bang » ? Travelling back to the earliest instants of the universe would help… OpenStack London December 2012 Tim Bell, CERN 3
  • 4. Community collaboration on an international scale OpenStack London December 2012 Tim Bell, CERN 4
  • 5. The Large Hadron Collider OpenStack London December 2012 Tim Bell, CERN 5
  • 6. The Large Hadron Collider (LHC) tunnel OpenStack London December 2012 Tim Bell, CERN 6
  • 7. OpenStack London December 2012 Tim Bell, CERN 7
  • 8. Accumulating events in 2009-2011 OpenStack London December 2012 Tim Bell, CERN 8
  • 9. OpenStack London December 2012 Tim Bell, CERN 9
  • 10. Heavy Ion Collisions OpenStack London December 2012 Tim Bell, CERN 10
  • 11. Data Acquisition and Trigger Farms OpenStack London December 2012 Tim Bell, CERN 11
  • 12. OpenStack London December 2012 Tim Bell, CERN 12
  • 13. Tier-0 (CERN): •Data recording •Initial data reconstruction •Data distribution Tier-1 (11 centres): •Permanent storage •Re-processing •Analysis Tier-2 (~200 centres): • Simulation • End-user analysis • Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid • In a normal day, the grid provides 100,000 CPU days executing over 2 million jobs OpenStack London December 2012 Tim Bell, CERN 13
  • 14. Data Centre by Numbers – Hardware installation & retirement • ~7,000 hardware movements/year; ~1,800 disk failures/year Racks 828 Disks 64,109 Tape Drives 160 Servers 11,728 Raw disk capacity (TiB) 63,289 Tape Cartridges 45,000 Processors 15,694 Memory modules 56,014 Tape slots 56,000 Cores 64,238 Memory capacity (TiB) 158 Tape Capacity (TiB) 73,000 HEPSpec06 482,507 RAID controllers 3,749 High Speed Routers 24 Xeon Xeon Xeon Other Fujitsu (640 Mbps → 2.4 Tbps) 3GHz 5150 5160 Xeon 0% 3% Xeon 4% 2% 10% E5335 Ethernet Switches 350 L5520 7% Xeon Hitachi 33% 23% 10 Gbps ports 2,000 E5345 14% HP Switching Capacity 4.8 Tbps Seagate 0% 15% 1 Gbps ports 16,939 Maxtor Western 0% 10 Gbps ports 558 Xeon Xeon Digital E5405 Xeon 59% L5420 6% IT Power Consumption 2,456 KW 8% E5410 16% Total Power Consumption 3,890 KW OpenStack London December 2012 Tim Bell, CERN 14
  • 15. OpenStack London December 2012 Tim Bell, CERN 15
  • 16. Our Challenges - Data storage • >20 years retention • 6GB/s average • 25GB/s peaks • 35PB/year recorded OpenStack London December 2012 Tim Bell, CERN 16
  • 17. 45,000 tapes holding 80PB of physics data OpenStack London December 2012 Tim Bell, CERN 17
  • 18. New data centre to expand capacity • Data centre in Geneva at the limit of electrical capacity at 3.5MW • New centre chosen in Budapest, Hungary • Additional 2.7MW of usable power • Hands off facility • Deploying from 2013 with 200Gbit/s OpenStack London December 2012 Tim Bell, CERN network to CERN 18
  • 19. Time to change strategy • Rationale – Need to manage twice the servers as today – No increase in staff numbers – Tools becoming increasingly brittle and will not scale as-is • Approach – CERN is no longer a special case for compute – Adopt an open source tool chain model – Our engineers rapidly iterate • Evaluate solutions in the problem domain • Identify functional gaps and challenge them • Select first choice but be prepared to change in future – Contribute new function back to the community OpenStack London December 2012 Tim Bell, CERN 19
  • 20. Building Blocks mcollective, yum Bamboo Puppet AIMS/PXE Foreman JIRA OpenStack Nova git Koji, Mock Yum repo Active Directory / Pulp LDAP Lemon / Hardware Hadoop database Puppet-DB OpenStack London December 2012 Tim Bell, CERN 20
  • 21. Training and Support • Buy the book rather than guru mentoring • Follow the mailing lists to learn • Newcomers are rapidly productive (and often know more than us) • Community and Enterprise support means we’re not on our own OpenStack London December 2012 Tim Bell, CERN 21
  • 22. Staff Motivation • Skills valuable outside of CERN when an engineer’s contracts end OpenStack London December 2012 Tim Bell, CERN 22
  • 23. Prepare the move to the clouds • Improve operational efficiency – Machine ordering, reception and testing – Hardware interventions with long running programs – Multiple operating system demand • Improve resource efficiency – Exploit idle resources, especially waiting for disk and tape I/O – Highly variable load such as interactive or build machines • Enable cloud architectures – Gradual migration to cloud interfaces and workflows – Autoscaling and scheduling • Improve responsiveness – Self-Service with coffee break response time OpenStack London December 2012 Tim Bell, CERN 23
  • 24. Public Procurement Purchase Model Step Time (Days) Elapsed (Days) User expresses requirement 0 Market Survey prepared 15 15 Market Survey for possible vendors 30 45 Specifications prepared 15 60 Vendor responses 30 90 Test systems evaluated 30 120 Offers adjudicated 10 130 Finance committee 30 160 Hardware delivered 90 250 Burn in and acceptance 30 days typical 280 380 worst case Total 280+ Days OpenStack London December 2012 Tim Bell, CERN 24
  • 25. Service Model • Pets are given names like pussinboots.cern.ch • They are unique, lovingly hand raised and cared for • When they get ill, you nurse them back to health • Cattle are given numbers like vm0042.cern.ch • They are almost identical to other cattle • When they get ill, you get another one • Future application architectures should use Cattle but Pets with strong configuration management are viable and still needed OpenStack London December 2012 Tim Bell, CERN 25
  • 26. Current Status of OpenStack at CERN • Working with Essex OpenStack release – Excellent experience with Fedora/RedHat team using EPEL packages – Started Folsom test environment in November • Focusing on the compute side to start with – Nova compute with KVM and Hyper-V – Keystone identity now integrated with Active Directory – Replaced network layer with CERN code for our legacy network management system • Current pre-production installation – 170 Hypervisors – 2,700 VMs – 3 DevOps part time running and enhancing the service – Running production ‘cattle’ workloads for stress testing OpenStack London December 2012 Tim Bell, CERN 26
  • 27. OpenStack London December 2012 Tim Bell, CERN 27
  • 28. When communities combine… • OpenStack’s many components and options make configuration complex out of the box • Puppet forge module from PuppetLabs does our configuration • The Foreman adds OpenStack provisioning for user kiosk to a configured machine in 15 minutes OpenStack London December 2012 Tim Bell, CERN 28
  • 29. Foreman to manage Puppetized VM OpenStack London December 2012 Tim Bell, CERN 29
  • 30. Opportunistic Clouds in online experiment farms • The CERN experiments have farms of 1000s of Linux servers close to the detectors to filter the 1PByte/s down to 6GByte/s to be recorded to tape • When the accelerator is not running, these machines are currently idle – Accelerator has regular maintenance slots of several days – Long Shutdown due from March 2013-November 2014 • One of the experiments have deployed OpenStack on their farm – Simulation (low I/O, high CPU) – Analysis (high I/O, high CPU, high network) OpenStack London December 2012 Tim Bell, CERN 30
  • 31. Federated European Clouds • Two significant European projects around Federated Clouds – European Grid Initiative Federated Cloud as a federation of grid sites providing IaaS – HELiX Nebula European Union funded project to create a scientific cloud based on commercial providers EGI Federated Cloud Sites CESGA CESNET INFN SARA Cyfronet FZ Jülich SZTAKI IPHC GRIF GRNET KTH Oxford GWDG IGI TCD IN2P3 STFC OpenStack London December 2012 Tim Bell, CERN 31
  • 32. Ongoing Projects • Production Preparation – Following High Availability recommendations from the community – Integrate monitoring with CERN frameworks • Quota management with Boson – CERN does not have infinite compute resources or budget – Experiments are allocated a quota to split between their projects – Collaborating with the community to develop a distributed quota manager • Block Storage Evaluation – Investigating Gluster, NetApp and Ceph integration for EBS functionality OpenStack London December 2012 Tim Bell, CERN 32
  • 33. Going further forward • Deployment to production – Planned for Q1 2013 based on Folsom – No legacy tools in the second data centre • Exploit new functionality – Ceilometer for metering – Bare metal for non-virtualised use cases such as high I/O servers – X.509 user certificate authentication – Load balancing as a service – Cells for scalability Ramping to 15,000 hypervisors with 100,000 to 300,000 VMs by 2015 OpenStack London December 2012 Tim Bell, CERN 33
  • 34. Final Thoughts • A small project to share documents at CERN in the ‘90s created the massive phenomenon that is today’s world wide web • Open Source • Vibrant community and eco-system • Working with the Puppet and OpenStack communities has shown the power of collaboration • We have built a toolchain in one year with part time resources • Sharing with other organisations to achieve scale is the only economically feasible path • CERN contributes and benefits from contributions of others OpenStack London December 2012 Tim Bell, CERN 34
  • 35. Questions ? OpenStack London December Tim Bell, CERN 35 2012
  • 36. References CERN http://public.web.cern.ch/public/ Scientific Linux http://www.scientificlinux.org/ Worldwide LHC Computing Grid http://wlcg.web.cern.ch/ Jobs http://cern.ch/jobs Detailed Report on Agile Infrastructure http://cern.ch/go/N8wp HELiX Nebula http://helix-nebula.eu/ EGI Cloud Taskforce https://wiki.egi.eu/wiki/Fedcloud-tf OpenStack London December 2012 Tim Bell, CERN 36
  • 37. Backup Slides OpenStack London December 2012 Tim Bell, CERN 37
  • 38. OpenStack London December 2012 Tim Bell, CERN 38
  • 39. CERN’s tools • The world’s most powerful accelerator: LHC – A 27 km long tunnel filled with high-tech instruments – Equipped with thousands of superconducting magnets – Accelerates particles to energies never before obtained – Produces particle collisions creating microscopic “big bangs” • Very large sophisticated detectors – Four experiments each the size of a cathedral – Hundred million measurement channels each – Data acquisition systems treating Petabytes per second • Top level computing to distribute and analyse the data – A Computing Grid linking ~200 computer centres around the globe – Sufficient computing power and storage to handle 25 Petabytes per year, making them available to thousands of physicists for analysis OpenStack London December 2012 Tim Bell, CERN 39
  • 40. Our Infrastructure • Hardware is generally based on commodity, white-box servers – Open tendering process based on SpecInt/CHF, CHF/Watt and GB/CHF – Compute nodes typically dual processor, 2GB per core – Bulk storage on 24x2TB disk storage-in-a-box with a RAID card • Vast majority of servers run Scientific Linux, developed by Fermilab and CERN, based on Redhat Enterprise – Focus is on stability in view of the number of centres on the WLCG OpenStack London December 2012 Tim Bell, CERN 40
  • 41. New architecture data flows OpenStack London December 2012 Tim Bell, CERN 41
  • 42. 500 1500 2000 2500 3000 3500 1000 0 Mar-10 Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 OpenStack London December 2012 Dec-10 Jan-11 Feb-11 Mar-11 Apr-11 May-11 Jun-11 Jul-11 Tim Bell, CERN Aug-11 Sep-11 Oct-11 Nov-11 Dec-11 Jan-12 Feb-12 Mar-12 Apr-12 May-12 Virtualisation on SCVMM/Hyper-V Jun-12 Jul-12 Aug-12 Sep-12 42 Linux Oct-12 Windows
  • 43. Scaling up with Puppet and OpenStack • Use LHC@Home based on BOINC for simulating magnetics guiding particles around the LHC • Naturally, there is a puppet module puppet-boinc • 1000 VMs spun up to stress test the hypervisors with Puppet, Foreman and OpenStack OpenStack London December 2012 Tim Bell, CERN 43
  • 44. Federated Cloud Commonalities • Basic building blocks – Each site gives an IaaS endpoint with an API and common security policy • OCCI? CDMI ? Libcloud ? Jclouds ? – Image stores available across the sites – Federated identity management based on X.509 certificates – Consolidation of accounting information to validate pledges and usage • Multiple cloud technologies – OpenStack – OpenNebula – Proprietary OpenStack London December 2012 Tim Bell, CERN 44
  • 45. Supporting the Pets with OpenStack • Network – CERN specific component to interface to our legacy network environment • Configuration – Using Puppet with Puppetlabs modules for rapid deployment • External Block Storage – Currently using nova-volume with Gluster backing store • Live migration to maximise availability – KVM live migration using Gluster – KVM and Hyper-V block migration OpenStack London December 2012 Tim Bell, CERN 45
  • 46. Active Directory Integration • CERN’s Active Directory – Unified identity management across the site – 44,000 users – 29,000 groups – 200 arrivals/departures per month • Full integration with Active Directory via LDAP – Uses the OpenLDAP backend with some particular configuration settings – Aim for minimal changes to Active Directory – 7 patches submitted around hard coded values and additional filtering • Now in use in our pre-production instance – Map project roles (admins, members) to groups – Documentation in the OpenStack wiki OpenStack London December 2012 Tim Bell, CERN 46
  • 47. Welcome Back Hyper-V! • We currently use Hyper-V/System Centre for our server consolidation activities – But need to scale to 100x current installation size • Choice of hypervisors should be tactical – Performance – Compatibility/Support with integration components – Image migration from legacy environments • CERN is working closely with the Hyper-V OpenStack team and Microsoft to arrive at parity – Puppet to configure hypervisors on Windows – Most functions work well but further work on Console, Ceilometer, … OpenStack London December 2012 Tim Bell, CERN 47
  • 48. Active Directory Integration • CERN’s Active Directory – Unified identity management across the site – 44,000 users – 29,000 groups – 200 arrivals/departures per month • Full integration with Active Directory via LDAP – Uses the OpenLDAP backend with some particular configuration settings – Aim for minimal changes to Active Directory – 7 patches submitted around hard coded values and additional filtering • Now in use in our pre-production instance – Map project roles (admins, members) to groups – Documentation in the OpenStack wiki OpenStack London December 2012 Tim Bell, CERN 48
  • 49. OpenStack London December 2012 Tim Bell, CERN 49

Editor's Notes

  1. Established by an international treaty at the end of 2nd world war as a place where scientists could work together for fundamental researchNuclear is part of the name but our world is particle physics
  2. Our current understanding of the universe is incomplete. A theory, called the Standard Model, proposes particles and forces, many of which have been experimentally observed. However, there are open questions- Why do some particles have mass and others not ? The Higgs Boson is a theory but we need experimental evidence.Our theory of forces does not explain how Gravity worksCosmologists can only find 4% of the matter in the universe, we have lost the other 96%We should have 50% matter, 50% anti-matter… why is there an asymmetry (although it is a good thing that there is since the two anhialiate each other) ?When we go back through time 13 billion years towards the big bang, we move back through planets, stars, atoms, protons/electrons towards a soup like quark gluon plasma. What were the properties of this?
  3. Biggest international scientific collaboration in the world, over 11,000 scientistsfrom 100 countriesAnnual Budget around 1.1 billion USDFunding for CERN, the laboratory, itselfcomesfrom the 20 member states, in ratio to the grossdomesticproduct… other countries contribute to experimentsincludingsubstantial US contribution towards the LHC experiments
  4. The LHC is CERN’s largest accelerator. A 17 mile ring 100 meters underground where two beams of particles are sent in opposite directions and collided at the 4 experiments, Atlas, CMS, LHCb and ALICE. Lake Geneva and the airport are visible in the top to give a scale.
  5. The ring consists of two beam pipes, with a vacuum pressure 10 times lower than on the moon which contain the beams of protons accelerated to just below the speed of light. These go round 11,000 times per second being bent by the superconducting magnets cooled to 2K by liquid helium (-450F), colder than outer space. The beams themselves have a total energy similar to a high speed train so care needs to be taken to make sure they turn the corners correctly and don’t bump into the walls of the pipe.
  6. - At 4 points around the ring, the beams are made to cross at points where detectors, the size of cathedrals and weighing up to 12,500 tonnes surround the pipe. These are like digital camera, but they take 100 mega pixel photos 40 million times a second. This produces up to 1 petabyte/s.
  7. - Collisions can be visualised by the tracks left in the various parts of the detectors. With many collisions, the statistics allows particle identification such as mass and charge. This is a simple one…
  8. To improve the statistics, we send round beams of multiple bunches, as they cross there are multiple collisions as 100 billion protons per bunch pass through each otherSoftware close by the detector and later offline in the computer centre then has to examine the tracks to understand the particles involved
  9. To get Quark Gluon plasma, the material closest to the big bang, we also collide lead ions which is much more intensive… the temperatures reach 100,000 times that in the sun.
  10. - We cannot record 1PB/s so there are hardware filters to remove uninteresting collisions such as those whose physics we understand already. The data is then sent to the CERN computer centre for recording via 10Gbit optical connections.
  11. The Worldwide LHC Computing grid is used to record and analyse this data. The grid currently runs over 2 million jobs/day, less than 10% of the work is done at CERN. There is an agreed set of protocols for running jobs, data distribution and accounting between all the sites which co-operate in order to support the physicists across the globe.
  12. So, to the Tier-0 computer centre at CERN… we are unusual in that we are public with our environment as there is no competitive advantage for us. We have thousands of visitors a year coming for tours and education and the computer center is a popular visit.The data centre has around 2.9MW of usable power looking after 12,000 servers.. In comparison, the accelerator uses 120MW, like a small town.With 64,000 disks, we have around 1,800 failing each year… this is much higher than the manufacturers’ MTBFs which is consistent with results from Google.Servers are mainly Intel processors, some AMD with dual core Xeon being the most common configuration.
  13. Upstairs in the computer centre, a high roof was the fashion in the 1980s for mainframes but now is very difficult to get cooled efficiently
  14. Our data storage system has to record and preserve 35PB/year with an expected lifetime of 20 years. Keeping the old data is required to get the maximum statistics for discoveries. At times, physicists will want to skim this data looking for new physics. Data rates are around 6GB/s average, with peaks of 25GB/s.
  15. Tape robots from IBM and OracleAround 60,000 tape mounts / week so the robots are kept busyData copied every two years to keep up with the latest media densities
  16. Asked member states for offers200Gbit/s links connecting the centresExpect to double computing capacity compared to today by 2015
  17. Double the capacity, same manpowerNeed to rethink how to solve the problem… look at how others approach itWe had our own tools in 2002 and as they become more sophisticated, it was not possible to take advantage of other developments elsewhere without a major break.Doing this while doing their ‘day’ jobs so it re-enforces the approach of taking what we can from the community
  18. Model based on Google Toolchain, Puppet is key for many operations. We’ve only had to write one new significant custom CERN software component which is in the certificate authority. Other parts such as Lemon for monitoring are from our previous implementation as we did not want to change all at once and they scale.
  19. We’ve been very pleased with our choices. Along with the obvious benefits of the functionality, there are soft benefits from the community model.
  20. Many staff at CERN are short term contracts… good benefits for those staff to leave with skills in need.
  21. Standardise hardware … buy in bulk and pile it up then work out what to use it forMemory, motherboards, cables or disks interventionsUsers waiting for I/O means wasted cycles. Build machines at night unused during the day. Interactive machines mainly during the dayMove to cloud APIs … need to support them but also maintain our existing applicationsDetails later on reception and testing
  22. The concept of pets and cattle came from Cloudscaling.Puppet applies well to the cattle model but we’re also using it to handle the pet cases that can’t yet move over due to software limitations. So, they get cloud provisioning but flexible configuration management.
  23. Communities integrating … when a new option is being used at CERN in OpenStack, we contribute the changes back to the puppet forge such as certificate handling. Even looking at Hyper-V/Windows openstack configuration…
  24. The project’s success comes down to community. A vibrant community has momentum of its own. As the WWW showed, many contributors can change how we see the world.Looking forward, as we help improve Puppet, remember that you will also be helping achieve a clearer understanding of the universe and how it works.
  25. CERN is more than just the LHCCNGS neutrinos to Gran SassoCLOUD demonstrating impacts of cosmic rays on weather patternsAnti-hydrogen atoms contained for minutes in a magnetic vesselHowever, for those of you who have read Dan Brown’s Angels and Demons or seen the film, there are no maniacal monks with pounds of anti-matter running around the campus
  26. We purchase on an annual cycle, replacing around ¼ of the servers. This purchasing is based on performance metrics such as cost per SpecInt or cost/GBGenerally, we are seeing dual core computer servers with Intel or AMD processors and bulk storage servers with 24 or 36 2TB disksThe operating system is Redhatlinux based distribution called Scientific Linux. We share the development and maintenance with Fermilab in Chicago. The choice of a Redhat based distribution comes from the need for stability across the grid, where keeping the 200 centres running compatible Linux distributions.
  27. LHC@Home is not an instruction on how to build your own accelerator but a magnet simulation tool to test multiple passes around the ring. We wanted to use it as a stress test tool and in ½ day, it was running on 1000 VMs.