SlideShare a Scribd company logo
1 of 39
Download to read offline
GPU Systems
 Advanced Clustering’s offerings for GPGPU computing




advanced clustering technologies
  www.advancedclustering.com • 866.802.8222
what is GPU computing
• The use of a GPU (graphics processing unit) to
  do general purpose scientific and engineering
  computing
• Model is to use a CPU and GPU together in a
  heterogenous computing model
  • CPU is used to run sequential portions of
    application
  • Offload parallel computation onto the GPU

                        2
history of GPUs
• GPUs designed with fixed function pipelines for
  real-time 3D graphics
• As complexity of GPU increased they were
  designed to be more programable to easily
  implement new features
• Scientists and engineers discovered that the
  originally purpose built GPUs could also be re-
  programmed for General Purpose computing on
  a GPU (GPGPU)


                          3
history of GPUs - continued
• The nature of 3D graphics meant GPUs have
  very fast floating-point units, which are also great
  for scientific codes
• Originally very difficult to program, GPU vendors
  have realized another market for their products
  and developed specially designed GPUs and
  programming environments for scientific
  computing
• Most prominent is NVIDIA Tesla GPU and their
  CUDA programming environment

                         4
GPUs vs. CPUs
•Traditional x86 CPUs are             240 Core Tesla GPU
    available today with 4
    cores: 6, 8, 12 core in the
    future
•   NVIDIA’s Tesla GPU is
    shipping with 240 cores

      Quad-core CPU




                                  5
GPUs vs. CPUs - continued




            6
why use GPUs?
• Massively parallel design: 240 cores per GPU
• Nearly 1 teraflop of single precision floating-
  point performance
• Designed as an accelerator card to add into your
  existing system - does not replace your current
  CPU
• Maximum of 4GB of fast dedicated RAM per
  GPU
• If your code is highly parallel it’s worth
  investigating
                          7
why not use GPUs?
• Fixed RAM sizes on GPU - not upgradable or
  configurable
• Large power requirements of 188W
• Still requires a host server and CPU to operate
• Specialized development tools required, does not
  run standard x86 code
  • Current development tools are specific to
    NVIDIA cards - no support for other
    manufacturer’s GPUs
• Your code maybe difficult to parallelize
                        8
developing for GPUs
• Current development model: CUDA parallel
  environment
 •   The CUDA parallel programming model guides programmers
     to partition the problem into coarse sub-problems that can be
     solved independently in parallel.

 •   Fine grain parallelism in the sub-problems is then expressed
     such that each sub-problem can be solved cooperatively in
     parallel.

• Currently an extension for the C programming
  language - other languages in development


                                9
NVIDIA GPUs
• All of NVIDIA’s recent GPUs support CUDA
  development
• Tesla cards designed exclusively for CUDA and
  GPGPU code (no graphics support)
• GeForce cards designed for graphics can be used
  for CUDA code as well
 • Usually slower, less cores, or less RAM - but a
    great way to get started at low price points
• Development and testing can be done on almost
  any standard GeForce GPU and run on a Tesla
  system
                        10
GeForce vs. Tesla




        11
GPU future
• More products coming: AMD Stream processor
  line of products, similar to NVIDIA’s Tesla
• Standard, portable programming via OpenCL
 •   OpenCL (Open Computing Language) is the first open, royalty-
     free standard for general-purpose parallel programming.
     Create portable code for a diverse mix of multi-core CPUs,
     GPUs, Cell-type architectures and other parallel processors
     such as DSPs.
 •   More info: http://www.khronos.org/opencl/




                              12
building GPU systems
• Building systems to house GPUs can be difficult:
 • Requires lots of engineering and design work
    to be able to be able to power and cool them
    correctly
  • GPUs were originally designed for visualization
    and gaming; size and form-factor were not as
    important
  • When used for computation data-center space
    is limited and expensive - need to find a way to
    implement GPUs in existing infrastructure

                        13
traditional GPU servers

      •Large tower style cases
      •Rackmount servers 4U or larger
      •Either choice is not an efficient
           Text
       use of limited data center space




             14
GPUs are large

                                      1.5” deep




                           lo ng
4.6                  .5”
   ”t             10
     all
                      The size of the GPU has
                       limited it’s application

                 15
GPUs are power hungry



                             =
•GPU Cards can use a lot of power
 - as much as 270W
•Lots of power equals lots of heat
•Difficult to put into a small space
 and cool effectively
                        16
GPU system options

Advanced Clustering has two solutions
to the power, heat, and density problems:

NVIDIA’s Tesla S1070


Advanced Clustering’s
15XGPU nodes



                        17
NVIDIA’s tesla S1070
• The S1070 is an external 1U box that contains
  4x Tesla C1060 GPUs
• The S1070 must be connected to one or two
  host servers to operate
• S1070 has one power supply and dedicated
  cooling for the 4x GPUs
• Only available with the C1060 GPU cards pre-
  installed



                        18
tesla S1070 - front view




           19
tesla S1070 - rear view




           20
tesla S1070 - inside view




            21
host interface cards (HIC)




• The Host Interface Card (HIC)           • HICs can be installed in 2
    connects Tesla S1070 to Server            separate servers, or 1 server
•   Every S1070 requires 2 HICs           •   HICs are available in PCI-e 8x and
•   Each HIC bridges the server to            16x widths
    two of the four GPUs inside of
    the S1070



                                     22
tesla S1070 block diagram

Tesla S1070


                         Cables to HICs in
                         Host System(s)




                23
connecting S1070 to 2 servers
                                  Server
                                    #1
Tesla S1070




Most servers do not have
enough PCI-e bandwidth, so        Server
S1070 is designed to allow
connecting to 2 separate
                                    #2
machines.

                             24
connecting S1070 to 1 server


 Tesla S1070




                                    Server
If the server has enough
PCI-e lanes and expansion
slots one Tesla S1070 can be
connected to one server.


                               25
example cluster of S1070s
HIC #1
HIC #2

HIC #1
HIC #2
                   • 10x 1U compute nodes
HIC #1                 with 2x CPUs each
HIC #2             •   5 Tesla S1070 with 4x
                       GPUs each
HIC #1             •   Balanced system of 20
                       CPUs and 20 GPUs
HIC #2
                   •   All in 15U of rack space

HIC #1
HIC #2

          26
S1070s pros and cons



•Pros                                               •Cons
 • External enclosure to hold GPUs                   • Two GPUs share one PCI-e slot in the
     doesn’t require a special server design             host server limiting bandwidth to the
     to hold the GPUs                                    GPU card
 •   Easy to add GPUs to any existing system         •   Most 1U servers only have 1x PCI-e
 •   4 GPUs in only 1U of space                          expansion slot which is occupied by the
                                                         HIC - this limits ability to use
 •   Multiple HIC card configurations
                                                         interconnects like InfiniBand or 10
     including PCI-e 8x or 16x
                                                         Gigabit Ethernet
 •   Thermally tested and validated by
                                                     •   Limited configuration options, only Tesla
     NVIDIA
                                                         cards, no GeForce or Quadro options



                                               27
S1070 - specifications




          28
advanced clustering GPU nodes
• The 15XGPU line of systems is a complete two
  processor server and GPU in 1U
• Server fully configured with latest quad-core Intel
  Xeon processors, RAM, hard drives, optical,
  networking, InfiniBand and GPU card
• Flexible to support various GPUs, including:
 • Tesla C1060 card
 • GeForce series
 • Quadro series
                        29
GPU node - front




       30
GPU node - rear




       31
GPU node - inside




        32
GPU node - block diagram

  Advanced
  Clustering
  15XGPU
    node




     Simplified design, host server completely
  integrated with GPU no external components
                  to connect to.

                      33
example cluster of GPU nodes



                   • 15x 1U compute nodes
                     • 2x CPUs each
                     • 1x GPU integrated in
                         each node
                   •   Entire system contains
                       30x CPUs and 15x GPUs
                   •   All in 15U of rack space




             34
GPU nodes - thermals

•System carefully engineered
 to ensure all components
 will fit in the small form
 factor

•Detailed modeling and
 testing to make sure the
 system components (CPU
 and memory) and the GPU
 are adequately cooled




                               35
GPU nodes pros and cons



•Pros                                             •Cons
 • Entire server and GPU all enclosed in a         • Only 1x GPU per server
     1U package
                                                   • Requires purchase of new servers, not
 •   Flexibility in GPU choice: Tesla,                 an upgrade or add-on
     GeForce, and Quadro supported
                                                   •   Not as dense of a solution as S1070 for
 •   Full PCI-e bandwidth to GPU                       4x GPUs
 •   Full-featured server with the latest
     quad-core Intel Xeon CPUs
 •   Can be used for more than
     computation, use the GPU for video
     output as well


                                             36
GPU nodes
• The GPU node concept is unique to Advanced
  Clustering
• Only vendor shipping a 1U with integrated Tesla
  or high-end GeForce / Quadro card
• Available for order as the 1X5GPU2
 • Dual Quad-Core Intel Xeon 5500 series
    processors
  • Choice of GPU

                        37
15XGPU2 - specifications
• Processor                                                 • Management
  • Two Intel Xeon 5500 Series processors                     • Integrated IPMI 2.0 module
  • Next generation "Nehalem" microarchitecture               • Integrated management controller providing iKVM
  • Integrated memory controller and 2x QPI chipset               and remote disk emulation.
        interconnects per processor                             • Dedicated RJ45 LAN for management network
    •   45nm process technology                             •   I/O connections
• Chipset                                                       • Two independent 10/100/1000Base-T (Gigabit)
  • Intel 5500 I/O controller hub                                   RJ-45 Ethernet interfaces

• Memory                                                        •   Two USB 2.0 ports

  • 800MHz, 1066MHz, or 1333MHz DDR3 memory                     • One DB-9 serial port (RS-232)
  • Twelve DIMM sockets for support up to 144GB of              • One VGA port
     memory                                                     • Optional ConnectX DDR or QDR InfiniBand
•   GPU                                                           connector

    • PCI-e 2.0 16x double height expansion slot for GPU    •   Electrical Requirements

    • Multiple options: Tesla, GeForce, or Quadro cards         • High-efficiency power supply (greater than 80%)

• Storage                                                       • Output Power: 560W
  • Two 3.5" SATA2 drive bay                                    • Universal input voltage 100V to 240V
  • Support RAID level 0-1 with Linux software RAID             • Frequency: 50Hz to 60Hz, single phase
        (with 2.5" drives)
    •   DVD+RW slim-line optical drive




                                                       38
availability
• Both the Tesla S1070 and 15XGPU GPU nodes
  are available and shipping now
• For price and custom configuration contact your
  Account Representative
 • (866) 802-8222
 • sales@advancedclustering.com
 •   http://www.advancedclustering.com/go/gpu




                        39

More Related Content

What's hot

Graphics Processing Unit by Saurabh
Graphics Processing Unit by SaurabhGraphics Processing Unit by Saurabh
Graphics Processing Unit by SaurabhSaurabh Kumar
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)Fatima Qayyum
 
Gpu presentation
Gpu presentationGpu presentation
Gpu presentationJosiah Lund
 
Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)Jafar Khan
 
Graphics Processing Unit - GPU
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPUChetan Gole
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with GpuRohit Khatana
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentationVishal Singh
 
The ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RASThe ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RASYasunori Goto
 
Presentation Graphic cards - History
Presentation Graphic cards - HistoryPresentation Graphic cards - History
Presentation Graphic cards - HistoryDipen Vasoya
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD
 
DAIS19: On the Performance of ARM TrustZone
DAIS19: On the Performance of ARM TrustZoneDAIS19: On the Performance of ARM TrustZone
DAIS19: On the Performance of ARM TrustZoneLEGATO project
 
AMD Processor
AMD ProcessorAMD Processor
AMD ProcessorAli Fahad
 

What's hot (20)

Graphics card
Graphics cardGraphics card
Graphics card
 
Graphics Processing Unit by Saurabh
Graphics Processing Unit by SaurabhGraphics Processing Unit by Saurabh
Graphics Processing Unit by Saurabh
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
 
Gpu presentation
Gpu presentationGpu presentation
Gpu presentation
 
Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)
 
Graphics Processing Unit - GPU
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPU
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
 
GPU
GPUGPU
GPU
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with Gpu
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentation
 
CPU vs GPU Comparison
CPU  vs GPU ComparisonCPU  vs GPU Comparison
CPU vs GPU Comparison
 
The ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RASThe ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RAS
 
Presentation Graphic cards - History
Presentation Graphic cards - HistoryPresentation Graphic cards - History
Presentation Graphic cards - History
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop Products
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
DAIS19: On the Performance of ARM TrustZone
DAIS19: On the Performance of ARM TrustZoneDAIS19: On the Performance of ARM TrustZone
DAIS19: On the Performance of ARM TrustZone
 
Qemu
QemuQemu
Qemu
 
Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
 
AMD Processor
AMD ProcessorAMD Processor
AMD Processor
 

Viewers also liked

Introduction to Parallel Programming
Introduction to Parallel ProgrammingIntroduction to Parallel Programming
Introduction to Parallel ProgrammingUNIST
 
GPU Performance Prediction Using High-level Application Models
GPU Performance Prediction Using High-level Application ModelsGPU Performance Prediction Using High-level Application Models
GPU Performance Prediction Using High-level Application ModelsFilipo Mór
 
병렬처리와 성능향상
병렬처리와 성능향상병렬처리와 성능향상
병렬처리와 성능향상shaderx
 
2node cluster
2node cluster2node cluster
2node clustersprdd
 
Introduction to Linux #1
Introduction to Linux #1Introduction to Linux #1
Introduction to Linux #1UNIST
 
오픈소스컨설팅 클러스터제안 V1.0
오픈소스컨설팅 클러스터제안 V1.0오픈소스컨설팅 클러스터제안 V1.0
오픈소스컨설팅 클러스터제안 V1.0sprdd
 
Intro to Microsoft Cognitive Services
Intro to Microsoft Cognitive ServicesIntro to Microsoft Cognitive Services
Intro to Microsoft Cognitive ServicesAmanda Lange
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wallugur candan
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit pptSandeep Singh
 
Messaging is Eating the World (by Edith Yeung)
Messaging is Eating the World (by Edith Yeung)Messaging is Eating the World (by Edith Yeung)
Messaging is Eating the World (by Edith Yeung)Edith Yeung
 

Viewers also liked (14)

Introduction to Parallel Programming
Introduction to Parallel ProgrammingIntroduction to Parallel Programming
Introduction to Parallel Programming
 
GPU Performance Prediction Using High-level Application Models
GPU Performance Prediction Using High-level Application ModelsGPU Performance Prediction Using High-level Application Models
GPU Performance Prediction Using High-level Application Models
 
병렬처리와 성능향상
병렬처리와 성능향상병렬처리와 성능향상
병렬처리와 성능향상
 
ISBI MPI Tutorial
ISBI MPI TutorialISBI MPI Tutorial
ISBI MPI Tutorial
 
2node cluster
2node cluster2node cluster
2node cluster
 
Introduction to Linux #1
Introduction to Linux #1Introduction to Linux #1
Introduction to Linux #1
 
Using MPI
Using MPIUsing MPI
Using MPI
 
오픈소스컨설팅 클러스터제안 V1.0
오픈소스컨설팅 클러스터제안 V1.0오픈소스컨설팅 클러스터제안 V1.0
오픈소스컨설팅 클러스터제안 V1.0
 
Open MPI 2
Open MPI 2Open MPI 2
Open MPI 2
 
Intro to Microsoft Cognitive Services
Intro to Microsoft Cognitive ServicesIntro to Microsoft Cognitive Services
Intro to Microsoft Cognitive Services
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wall
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
Messaging is Eating the World (by Edith Yeung)
Messaging is Eating the World (by Edith Yeung)Messaging is Eating the World (by Edith Yeung)
Messaging is Eating the World (by Edith Yeung)
 
Build Features, Not Apps
Build Features, Not AppsBuild Features, Not Apps
Build Features, Not Apps
 

Similar to Gpu Systems

Streaming multiprocessors and HPC
Streaming multiprocessors and HPCStreaming multiprocessors and HPC
Streaming multiprocessors and HPCOmkarKachare1
 
Throughput oriented aarchitectures
Throughput oriented aarchitecturesThroughput oriented aarchitectures
Throughput oriented aarchitecturesNomy059
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAFacultad de Informática UCM
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxssuser413a98
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiectureHaris456
 
Infrastructure optimization for seismic processing (eng)
Infrastructure optimization for seismic processing (eng)Infrastructure optimization for seismic processing (eng)
Infrastructure optimization for seismic processing (eng)Vsevolod Shabad
 
19564926 graphics-processing-unit
19564926 graphics-processing-unit19564926 graphics-processing-unit
19564926 graphics-processing-unitDayakar Siddula
 
Ximea - the pc camera, 90 gflps smart camera
Ximea  - the pc camera, 90 gflps smart cameraXimea  - the pc camera, 90 gflps smart camera
Ximea - the pc camera, 90 gflps smart cameraXIMEA
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersCastLabKAIST
 
GPUs vs CPUs for Parallel Processing
GPUs vs CPUs for Parallel ProcessingGPUs vs CPUs for Parallel Processing
GPUs vs CPUs for Parallel ProcessingMohammed Billoo
 
GPUs, CPUs and SoC
GPUs, CPUs and SoCGPUs, CPUs and SoC
GPUs, CPUs and SoCOmar Elshal
 
Nvidia’s tegra line of processors for mobile devices2 2
Nvidia’s tegra line of processors for mobile devices2 2Nvidia’s tegra line of processors for mobile devices2 2
Nvidia’s tegra line of processors for mobile devices2 2Sukul Yarraguntla
 
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PC Cluster Consortium
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis
 
MT58 High performance graphics for VDI: A technical discussion
MT58 High performance graphics for VDI: A technical discussionMT58 High performance graphics for VDI: A technical discussion
MT58 High performance graphics for VDI: A technical discussionDell EMC World
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrjRoberto Brandao
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuAlan Sill
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Cesar Maciel
 

Similar to Gpu Systems (20)

Streaming multiprocessors and HPC
Streaming multiprocessors and HPCStreaming multiprocessors and HPC
Streaming multiprocessors and HPC
 
Throughput oriented aarchitectures
Throughput oriented aarchitecturesThroughput oriented aarchitectures
Throughput oriented aarchitectures
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
 
Programming Models for Heterogeneous Chips
Programming Models for  Heterogeneous ChipsProgramming Models for  Heterogeneous Chips
Programming Models for Heterogeneous Chips
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiecture
 
Infrastructure optimization for seismic processing (eng)
Infrastructure optimization for seismic processing (eng)Infrastructure optimization for seismic processing (eng)
Infrastructure optimization for seismic processing (eng)
 
19564926 graphics-processing-unit
19564926 graphics-processing-unit19564926 graphics-processing-unit
19564926 graphics-processing-unit
 
Ximea - the pc camera, 90 gflps smart camera
Ximea  - the pc camera, 90 gflps smart cameraXimea  - the pc camera, 90 gflps smart camera
Ximea - the pc camera, 90 gflps smart camera
 
Cuda
CudaCuda
Cuda
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
 
GPUs vs CPUs for Parallel Processing
GPUs vs CPUs for Parallel ProcessingGPUs vs CPUs for Parallel Processing
GPUs vs CPUs for Parallel Processing
 
GPUs, CPUs and SoC
GPUs, CPUs and SoCGPUs, CPUs and SoC
GPUs, CPUs and SoC
 
Nvidia’s tegra line of processors for mobile devices2 2
Nvidia’s tegra line of processors for mobile devices2 2Nvidia’s tegra line of processors for mobile devices2 2
Nvidia’s tegra line of processors for mobile devices2 2
 
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
MT58 High performance graphics for VDI: A technical discussion
MT58 High performance graphics for VDI: A technical discussionMT58 High performance graphics for VDI: A technical discussion
MT58 High performance graphics for VDI: A technical discussion
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
 

Gpu Systems

  • 1. GPU Systems Advanced Clustering’s offerings for GPGPU computing advanced clustering technologies www.advancedclustering.com • 866.802.8222
  • 2. what is GPU computing • The use of a GPU (graphics processing unit) to do general purpose scientific and engineering computing • Model is to use a CPU and GPU together in a heterogenous computing model • CPU is used to run sequential portions of application • Offload parallel computation onto the GPU 2
  • 3. history of GPUs • GPUs designed with fixed function pipelines for real-time 3D graphics • As complexity of GPU increased they were designed to be more programable to easily implement new features • Scientists and engineers discovered that the originally purpose built GPUs could also be re- programmed for General Purpose computing on a GPU (GPGPU) 3
  • 4. history of GPUs - continued • The nature of 3D graphics meant GPUs have very fast floating-point units, which are also great for scientific codes • Originally very difficult to program, GPU vendors have realized another market for their products and developed specially designed GPUs and programming environments for scientific computing • Most prominent is NVIDIA Tesla GPU and their CUDA programming environment 4
  • 5. GPUs vs. CPUs •Traditional x86 CPUs are 240 Core Tesla GPU available today with 4 cores: 6, 8, 12 core in the future • NVIDIA’s Tesla GPU is shipping with 240 cores Quad-core CPU 5
  • 6. GPUs vs. CPUs - continued 6
  • 7. why use GPUs? • Massively parallel design: 240 cores per GPU • Nearly 1 teraflop of single precision floating- point performance • Designed as an accelerator card to add into your existing system - does not replace your current CPU • Maximum of 4GB of fast dedicated RAM per GPU • If your code is highly parallel it’s worth investigating 7
  • 8. why not use GPUs? • Fixed RAM sizes on GPU - not upgradable or configurable • Large power requirements of 188W • Still requires a host server and CPU to operate • Specialized development tools required, does not run standard x86 code • Current development tools are specific to NVIDIA cards - no support for other manufacturer’s GPUs • Your code maybe difficult to parallelize 8
  • 9. developing for GPUs • Current development model: CUDA parallel environment • The CUDA parallel programming model guides programmers to partition the problem into coarse sub-problems that can be solved independently in parallel. • Fine grain parallelism in the sub-problems is then expressed such that each sub-problem can be solved cooperatively in parallel. • Currently an extension for the C programming language - other languages in development 9
  • 10. NVIDIA GPUs • All of NVIDIA’s recent GPUs support CUDA development • Tesla cards designed exclusively for CUDA and GPGPU code (no graphics support) • GeForce cards designed for graphics can be used for CUDA code as well • Usually slower, less cores, or less RAM - but a great way to get started at low price points • Development and testing can be done on almost any standard GeForce GPU and run on a Tesla system 10
  • 12. GPU future • More products coming: AMD Stream processor line of products, similar to NVIDIA’s Tesla • Standard, portable programming via OpenCL • OpenCL (Open Computing Language) is the first open, royalty- free standard for general-purpose parallel programming. Create portable code for a diverse mix of multi-core CPUs, GPUs, Cell-type architectures and other parallel processors such as DSPs. • More info: http://www.khronos.org/opencl/ 12
  • 13. building GPU systems • Building systems to house GPUs can be difficult: • Requires lots of engineering and design work to be able to be able to power and cool them correctly • GPUs were originally designed for visualization and gaming; size and form-factor were not as important • When used for computation data-center space is limited and expensive - need to find a way to implement GPUs in existing infrastructure 13
  • 14. traditional GPU servers •Large tower style cases •Rackmount servers 4U or larger •Either choice is not an efficient Text use of limited data center space 14
  • 15. GPUs are large 1.5” deep lo ng 4.6 .5” ”t 10 all The size of the GPU has limited it’s application 15
  • 16. GPUs are power hungry = •GPU Cards can use a lot of power - as much as 270W •Lots of power equals lots of heat •Difficult to put into a small space and cool effectively 16
  • 17. GPU system options Advanced Clustering has two solutions to the power, heat, and density problems: NVIDIA’s Tesla S1070 Advanced Clustering’s 15XGPU nodes 17
  • 18. NVIDIA’s tesla S1070 • The S1070 is an external 1U box that contains 4x Tesla C1060 GPUs • The S1070 must be connected to one or two host servers to operate • S1070 has one power supply and dedicated cooling for the 4x GPUs • Only available with the C1060 GPU cards pre- installed 18
  • 19. tesla S1070 - front view 19
  • 20. tesla S1070 - rear view 20
  • 21. tesla S1070 - inside view 21
  • 22. host interface cards (HIC) • The Host Interface Card (HIC) • HICs can be installed in 2 connects Tesla S1070 to Server separate servers, or 1 server • Every S1070 requires 2 HICs • HICs are available in PCI-e 8x and • Each HIC bridges the server to 16x widths two of the four GPUs inside of the S1070 22
  • 23. tesla S1070 block diagram Tesla S1070 Cables to HICs in Host System(s) 23
  • 24. connecting S1070 to 2 servers Server #1 Tesla S1070 Most servers do not have enough PCI-e bandwidth, so Server S1070 is designed to allow connecting to 2 separate #2 machines. 24
  • 25. connecting S1070 to 1 server Tesla S1070 Server If the server has enough PCI-e lanes and expansion slots one Tesla S1070 can be connected to one server. 25
  • 26. example cluster of S1070s HIC #1 HIC #2 HIC #1 HIC #2 • 10x 1U compute nodes HIC #1 with 2x CPUs each HIC #2 • 5 Tesla S1070 with 4x GPUs each HIC #1 • Balanced system of 20 CPUs and 20 GPUs HIC #2 • All in 15U of rack space HIC #1 HIC #2 26
  • 27. S1070s pros and cons •Pros •Cons • External enclosure to hold GPUs • Two GPUs share one PCI-e slot in the doesn’t require a special server design host server limiting bandwidth to the to hold the GPUs GPU card • Easy to add GPUs to any existing system • Most 1U servers only have 1x PCI-e • 4 GPUs in only 1U of space expansion slot which is occupied by the HIC - this limits ability to use • Multiple HIC card configurations interconnects like InfiniBand or 10 including PCI-e 8x or 16x Gigabit Ethernet • Thermally tested and validated by • Limited configuration options, only Tesla NVIDIA cards, no GeForce or Quadro options 27
  • 29. advanced clustering GPU nodes • The 15XGPU line of systems is a complete two processor server and GPU in 1U • Server fully configured with latest quad-core Intel Xeon processors, RAM, hard drives, optical, networking, InfiniBand and GPU card • Flexible to support various GPUs, including: • Tesla C1060 card • GeForce series • Quadro series 29
  • 30. GPU node - front 30
  • 31. GPU node - rear 31
  • 32. GPU node - inside 32
  • 33. GPU node - block diagram Advanced Clustering 15XGPU node Simplified design, host server completely integrated with GPU no external components to connect to. 33
  • 34. example cluster of GPU nodes • 15x 1U compute nodes • 2x CPUs each • 1x GPU integrated in each node • Entire system contains 30x CPUs and 15x GPUs • All in 15U of rack space 34
  • 35. GPU nodes - thermals •System carefully engineered to ensure all components will fit in the small form factor •Detailed modeling and testing to make sure the system components (CPU and memory) and the GPU are adequately cooled 35
  • 36. GPU nodes pros and cons •Pros •Cons • Entire server and GPU all enclosed in a • Only 1x GPU per server 1U package • Requires purchase of new servers, not • Flexibility in GPU choice: Tesla, an upgrade or add-on GeForce, and Quadro supported • Not as dense of a solution as S1070 for • Full PCI-e bandwidth to GPU 4x GPUs • Full-featured server with the latest quad-core Intel Xeon CPUs • Can be used for more than computation, use the GPU for video output as well 36
  • 37. GPU nodes • The GPU node concept is unique to Advanced Clustering • Only vendor shipping a 1U with integrated Tesla or high-end GeForce / Quadro card • Available for order as the 1X5GPU2 • Dual Quad-Core Intel Xeon 5500 series processors • Choice of GPU 37
  • 38. 15XGPU2 - specifications • Processor • Management • Two Intel Xeon 5500 Series processors • Integrated IPMI 2.0 module • Next generation "Nehalem" microarchitecture • Integrated management controller providing iKVM • Integrated memory controller and 2x QPI chipset and remote disk emulation. interconnects per processor • Dedicated RJ45 LAN for management network • 45nm process technology • I/O connections • Chipset • Two independent 10/100/1000Base-T (Gigabit) • Intel 5500 I/O controller hub RJ-45 Ethernet interfaces • Memory • Two USB 2.0 ports • 800MHz, 1066MHz, or 1333MHz DDR3 memory • One DB-9 serial port (RS-232) • Twelve DIMM sockets for support up to 144GB of • One VGA port memory • Optional ConnectX DDR or QDR InfiniBand • GPU connector • PCI-e 2.0 16x double height expansion slot for GPU • Electrical Requirements • Multiple options: Tesla, GeForce, or Quadro cards • High-efficiency power supply (greater than 80%) • Storage • Output Power: 560W • Two 3.5" SATA2 drive bay • Universal input voltage 100V to 240V • Support RAID level 0-1 with Linux software RAID • Frequency: 50Hz to 60Hz, single phase (with 2.5" drives) • DVD+RW slim-line optical drive 38
  • 39. availability • Both the Tesla S1070 and 15XGPU GPU nodes are available and shipping now • For price and custom configuration contact your Account Representative • (866) 802-8222 • sales@advancedclustering.com • http://www.advancedclustering.com/go/gpu 39