Gpu Systems

GPU Systems
Advanced Clustering’s offerings for GPGPU computing

advanced clustering technologies
www.advancedclustering.com • 866.802.8222

what is GPU computing
• The use of a GPU (graphics processing unit) to
do general purpose scientiﬁc and engineering
computing
• Model is to use a CPU and GPU together in a
heterogenous computing model
• CPU is used to run sequential portions of
application
• Ofﬂoad parallel computation onto the GPU

2

history of GPUs
• GPUs designed with ﬁxed function pipelines for
real-time 3D graphics
• As complexity of GPU increased they were
designed to be more programable to easily
implement new features
• Scientists and engineers discovered that the
originally purpose built GPUs could also be re-
programmed for General Purpose computing on
a GPU (GPGPU)

3

history of GPUs - continued
• The nature of 3D graphics meant GPUs have
very fast floating-point units, which are also great
for scientific codes
• Originally very difficult to program, GPU vendors
have realized another market for their products
and developed specially designed GPUs and
programming environments for scientific
computing
• Most prominent is NVIDIA Tesla GPU and their
CUDA programming environment

4

GPUs vs. CPUs
•Traditional x86 CPUs are 240 Core Tesla GPU
available today with 4
cores: 6, 8, 12 core in the
future
• NVIDIA’s Tesla GPU is
shipping with 240 cores

Quad-core CPU

5

GPUs vs. CPUs - continued

6

why use GPUs?
• Massively parallel design: 240 cores per GPU
• Nearly 1 teraﬂop of single precision ﬂoating-
point performance
• Designed as an accelerator card to add into your
existing system - does not replace your current
CPU
• Maximum of 4GB of fast dedicated RAM per
GPU
• If your code is highly parallel it’s worth
investigating
7

why not use GPUs?
• Fixed RAM sizes on GPU - not upgradable or
configurable
• Large power requirements of 188W
• Still requires a host server and CPU to operate
• Specialized development tools required, does not
run standard x86 code
• Current development tools are specific to
NVIDIA cards - no support for other
manufacturer’s GPUs
• Your code maybe difficult to parallelize
8

developing for GPUs
• Current development model: CUDA parallel
environment
• The CUDA parallel programming model guides programmers
to partition the problem into coarse sub-problems that can be
solved independently in parallel.

• Fine grain parallelism in the sub-problems is then expressed
such that each sub-problem can be solved cooperatively in
parallel.

• Currently an extension for the C programming
language - other languages in development

9

NVIDIA GPUs
• All of NVIDIA’s recent GPUs support CUDA
development
• Tesla cards designed exclusively for CUDA and
GPGPU code (no graphics support)
• GeForce cards designed for graphics can be used
for CUDA code as well
• Usually slower, less cores, or less RAM - but a
great way to get started at low price points
• Development and testing can be done on almost
any standard GeForce GPU and run on a Tesla
system
10

GeForce vs. Tesla

11

GPU future
• More products coming: AMD Stream processor
line of products, similar to NVIDIA’s Tesla
• Standard, portable programming via OpenCL
• OpenCL (Open Computing Language) is the ﬁrst open, royalty-
free standard for general-purpose parallel programming.
Create portable code for a diverse mix of multi-core CPUs,
GPUs, Cell-type architectures and other parallel processors
such as DSPs.
• More info: http://www.khronos.org/opencl/

12

building GPU systems
• Building systems to house GPUs can be difﬁcult:
• Requires lots of engineering and design work
to be able to be able to power and cool them
correctly
• GPUs were originally designed for visualization
and gaming; size and form-factor were not as
important
• When used for computation data-center space
is limited and expensive - need to ﬁnd a way to
implement GPUs in existing infrastructure

13

traditional GPU servers

•Large tower style cases
•Rackmount servers 4U or larger
•Either choice is not an efﬁcient
Text
use of limited data center space

14

GPUs are large

1.5” deep

lo ng
4.6 .5”
”t 10
all
The size of the GPU has
limited it’s application

15

GPUs are power hungry

=
•GPU Cards can use a lot of power
- as much as 270W
•Lots of power equals lots of heat
•Difﬁcult to put into a small space
and cool effectively
16

GPU system options

Advanced Clustering has two solutions
to the power, heat, and density problems:

NVIDIA’s Tesla S1070

Advanced Clustering’s
15XGPU nodes

17

NVIDIA’s tesla S1070
• The S1070 is an external 1U box that contains
4x Tesla C1060 GPUs
• The S1070 must be connected to one or two
host servers to operate
• S1070 has one power supply and dedicated
cooling for the 4x GPUs
• Only available with the C1060 GPU cards pre-
installed

18

tesla S1070 - front view

19

tesla S1070 - rear view

20

tesla S1070 - inside view

21

host interface cards (HIC)

• The Host Interface Card (HIC) • HICs can be installed in 2
connects Tesla S1070 to Server separate servers, or 1 server
• Every S1070 requires 2 HICs • HICs are available in PCI-e 8x and
• Each HIC bridges the server to 16x widths
two of the four GPUs inside of
the S1070

22

tesla S1070 block diagram

Tesla S1070

Cables to HICs in
Host System(s)

23

connecting S1070 to 2 servers
Server
#1
Tesla S1070

Most servers do not have
enough PCI-e bandwidth, so Server
S1070 is designed to allow
connecting to 2 separate
#2
machines.

24

connecting S1070 to 1 server

Tesla S1070

Server
If the server has enough
PCI-e lanes and expansion
slots one Tesla S1070 can be
connected to one server.

25

example cluster of S1070s
HIC #1
HIC #2

HIC #1
HIC #2
• 10x 1U compute nodes
HIC #1 with 2x CPUs each
HIC #2 • 5 Tesla S1070 with 4x
GPUs each
HIC #1 • Balanced system of 20
CPUs and 20 GPUs
HIC #2
• All in 15U of rack space

HIC #1
HIC #2

26

S1070s pros and cons

•Pros •Cons
• External enclosure to hold GPUs • Two GPUs share one PCI-e slot in the
doesn’t require a special server design host server limiting bandwidth to the
to hold the GPUs GPU card
• Easy to add GPUs to any existing system • Most 1U servers only have 1x PCI-e
• 4 GPUs in only 1U of space expansion slot which is occupied by the
HIC - this limits ability to use
• Multiple HIC card configurations
interconnects like InfiniBand or 10
including PCI-e 8x or 16x
Gigabit Ethernet
• Thermally tested and validated by
• Limited configuration options, only Tesla
NVIDIA
cards, no GeForce or Quadro options

27

S1070 - specifications

28

advanced clustering GPU nodes
• The 15XGPU line of systems is a complete two
processor server and GPU in 1U
• Server fully conﬁgured with latest quad-core Intel
Xeon processors, RAM, hard drives, optical,
networking, InﬁniBand and GPU card
• Flexible to support various GPUs, including:
• Tesla C1060 card
• GeForce series
• Quadro series
29

GPU node - front

30

GPU node - inside

32

GPU node - block diagram

Advanced
Clustering
15XGPU
node

Simpliﬁed design, host server completely
integrated with GPU no external components
to connect to.

33

example cluster of GPU nodes

• 15x 1U compute nodes
• 2x CPUs each
• 1x GPU integrated in
each node
• Entire system contains
30x CPUs and 15x GPUs
• All in 15U of rack space

34

GPU nodes - thermals

•System carefully engineered
to ensure all components
will ﬁt in the small form
factor

•Detailed modeling and
testing to make sure the
system components (CPU
and memory) and the GPU
are adequately cooled

35

GPU nodes pros and cons

•Pros •Cons
• Entire server and GPU all enclosed in a • Only 1x GPU per server
1U package
• Requires purchase of new servers, not
• Flexibility in GPU choice: Tesla, an upgrade or add-on
GeForce, and Quadro supported
• Not as dense of a solution as S1070 for
• Full PCI-e bandwidth to GPU 4x GPUs
• Full-featured server with the latest
quad-core Intel Xeon CPUs
• Can be used for more than
computation, use the GPU for video
output as well

36

GPU nodes
• The GPU node concept is unique to Advanced
Clustering
• Only vendor shipping a 1U with integrated Tesla
or high-end GeForce / Quadro card
• Available for order as the 1X5GPU2
• Dual Quad-Core Intel Xeon 5500 series
processors
• Choice of GPU

37

15XGPU2 - specifications
• Processor • Management
• Two Intel Xeon 5500 Series processors • Integrated IPMI 2.0 module
• Next generation "Nehalem" microarchitecture • Integrated management controller providing iKVM
• Integrated memory controller and 2x QPI chipset and remote disk emulation.
interconnects per processor • Dedicated RJ45 LAN for management network
• 45nm process technology • I/O connections
• Chipset • Two independent 10/100/1000Base-T (Gigabit)
• Intel 5500 I/O controller hub RJ-45 Ethernet interfaces

• Memory • Two USB 2.0 ports

• 800MHz, 1066MHz, or 1333MHz DDR3 memory • One DB-9 serial port (RS-232)
• Twelve DIMM sockets for support up to 144GB of • One VGA port
memory • Optional ConnectX DDR or QDR InﬁniBand
• GPU connector

• PCI-e 2.0 16x double height expansion slot for GPU • Electrical Requirements

• Multiple options: Tesla, GeForce, or Quadro cards • High-efﬁciency power supply (greater than 80%)

• Storage • Output Power: 560W
• Two 3.5" SATA2 drive bay • Universal input voltage 100V to 240V
• Support RAID level 0-1 with Linux software RAID • Frequency: 50Hz to 60Hz, single phase
(with 2.5" drives)
• DVD+RW slim-line optical drive

38

availability
• Both the Tesla S1070 and 15XGPU GPU nodes
are available and shipping now
• For price and custom conﬁguration contact your
Account Representative
• (866) 802-8222
• sales@advancedclustering.com
• http://www.advancedclustering.com/go/gpu

39

Gpu Systems

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Gpu Systems

Similar to Gpu Systems (20)

Gpu Systems