SlideShare a Scribd company logo
1 of 5
Download to read offline
Achieve near-bare-metal inference
throughput for image classification
workloads with the Dell PowerEdge
R7525 server using virtual GPUs
Plus, virtualizing servers with VMware vSphere 7.0
Update 3 and vGPUs could help simplify machine
learning workload management
Virtualization has changed the world of computing, but every hypervisor
adds some amount of computing overhead compared to a bare-metal
environment. While this overhead may not present an issue for many
workloads, businesses may well feel those effects on compute-heavy
workloads, such as machine learning (ML). With VMware®
vSphere®
7.0
Update 3, Dell™
PowerEdge™
R7525 servers powered by AMD EPYC™
processors offer a platform that can minimize performance loss with
virtualized GPUs (vGPUs) while delivering the ease-of-management,
redundancy, reliability, mobility, and security advantages that
virtualization can provide.
In our data center, we tested ML-based image classification inference
throughput on bare-metal and virtual configurations of a Dell PowerEdge
R7525 with two AMD EPYC 7543 processors and an NVIDIA®
A100
Tensor Core GPU. When running the MLPerf™
ResNet50 workload, the
bare-metal configuration processed 28,130 queries per second. The virtual
configuration, with VMware vSphere 7.0 Update 3 and a single guest VM,
processed 27,435 queries per second—97.5 percent of the bare-metal
configuration’s image classification performance. Either configuration of
the PowerEdge R7525 could serve as the backbone in image-classification
workloads, delivering strong performance as a bare-metal or virtual
solution. Virtualizing the PowerEdge R7525 with NVIDIA vGPU software
and vSphere 7 Update 3 could enable IT admins to run multiple virtual
machines (VMs) on the same physical GPU, making it easier to dial in
GPU-level performance as needed.
Minimize performance
degradation from
virtualization
Achieve 97.5% of the
image classification
performance of
a bare-metal configuration
Simplify ML workload
management
Admins can ensure uptime
and resource allocation
through virtual management
Achieve near-bare-metal inference throughput for image classification workloads with the Dell PowerEdge R7525 server using virtual GPUs July 2022
A Principled Technologies report: Hands-on testing. Real-world results.
How we tested
Artificial intelligence (AI) and ML workloads rely heavily
on computing speed. Organizations that want faster AI
or ML workload performance can use GPUs in addition
to CPUs because GPUs offer extreme parallelization and
matrix math capabilities. Organizations may run GPU-
based workloads on bare-metal configurations to avoid the
possibility of hypervisor or virtualization overheads. For those
compute-heavy workloads, IT staff configure optimization,
tuning, and power settings to provide the best GPU-level
performance possible.
For our bare-metal solution, we configured a Dell PowerEdge
R7525 server with Ubuntu 20.04. The PowerEdge server
contained 480 GB of storage and 512 GB of DDR4
3200MT/S memory. We installed, tuned, and ran an MLPerf
AI benchmark on the bare-metal solution according to best
practices. The workload we chose classifies images within a
known dataset using the ResNet50 ML model.
We built our virtual environment on the same server hardware
as our bare-metal environment. After testing the bare-metal
solution, we kept the same BIOS settings, installed VMware
vSphere 7.0 Update 3 onto the server, and created a VM with
the same OS and tunings as our bare-metal configuration.
Then, we ran the same MLPerf ResNet50 workload on our VM
and compared the results of the two configurations.
Although our testing used only one VM, supporting more
VMs is possible. By making vGPU software available in
vSphere 7.0 Update 3, many VMs can share one or more
GPUs. Prior to NVIDIA vGPU (and predecessor NVIDIA GRID
vGPU™
), GPUs were available to VMs only in pass-through
mode, meaning a VM using a GPU would lock it down from
use by other VMs. Sharing GPU resources between many VMs
could allow your organization to use one GPU for multiple
tasks. For example, one Linux®
VM could run an AI or ML
workload while a Windows VM runs a graphic-intensive
ANSYS application for computer-aided engineering (CAE).
Before vGPUs, running multiple workloads on the same
GPU was complex and inefficient, much less doing the same
for multiple VMs.
Potential real-world benefits
Organizations using AI and ML may run
training and inference workloads on
separate servers because the workloads
run at different times and with different
power requirements, tunings, or access
restrictions. If a server goes down, its
workload could be offline until it is
brought up on a new server or IT fixes the
issue that brought the server down.
For manufacturing companies that run
image classification workloads as part of
quality control, that downtime could lead
to lost revenue and delays in production.
With Dell PowerEdge R7525 servers
supporting VMware vSphere 7.0 Update
3 virtual environments, such a company
could combine pairs or multiples of
servers into redundant clusters, with
workloads separated by VM. Then, if a
server goes down, IT staff can quickly
bring up the VMs on a failover server,
with little interruption or downtime.
Virtual environments need, at most, as
much hardware as equivalent bare-metal
environments. However, it’s possible
that a virtual environment with VMware
vSphere 7 and NVIDIA vGPU software
could handle the same amount of
work as a bare-metal environment with
less hardware or do more work with
more hardware.
Achieve near-bare-metal inference throughput for image classification workloads with the Dell PowerEdge R7525 server using virtual GPUs July 2022 | 2
Bare-metal GPU performance on image classification workload
28,130 Average queries per second
(higher is better)
Figure 1: The average number of queries processed per second on the bare-metal Dell PowerEdge R7525 environment.
Source: Principled Technologies.
Classify images quickly with a bare-metal
Dell PowerEdge R7525 environment
While this study highlights the benefits of GPU virtualization, it is worth noting that the Dell PowerEdge R7525
equipped with AMD EPYC processors and an NVIDIA A100 Tensor Core GPU delivered solid bare-metal
performance for the image classification workload we tested. To enable the performance that the NVIDIA A100
Tensor Core GPU provided, the Dell PowerEdge R7525 supported Gen 4 PCIe connectivity, with up to 16 GT/s,
to help maximize GPU throughput and offered 1400W dual PSUs and an optional GPU-enablement kit with high-
powered fans, heatsinks, and air shroud to help keep the GPU cool. In addition, the AMD EPYC CPUs provided
ample processing power for supporting tasks, such as copying between memory and GPU and I/O logic. Having
a powerful server as a host for the NVIDIA A100 Tensor Core GPU provided the necessary system resources,
power, and cooling to help deliver smooth performance during our tests. Our bare-metal solution processed an
average of 28,130.7 queries per second on the MLPerf ResNet50 workload.
About the Dell PowerEdge R7525
The Dell PowerEdge R7525 server, powered by 3rd
Gen AMD EPYC 7003 series processors,
can provide PCIe™
Gen 4 speed with up to three double-width 300W or six single-width 75W
accelerators. Dual 2400W PSUs provide the necessary power to support GPUs, and up to 12
hot-plug fans can keep graphics cards cool.1
To learn more about the PowerEdge R7525, visit https://www.delltechnologies.com/en-us/
servers/poweredge-rack-servers.htm.
Achieve near-bare-metal inference throughput for image classification workloads with the Dell PowerEdge R7525 server using virtual GPUs July 2022 | 3
0 5,000 10,000 15,000 20,000 25,000 30,000
Virtualized configuration
using vGPU
Bare-metal configuration
Average queries per second (higher is better)
Bare-metal and virtualized performance on image classification workload
27435.1
28,130.7
Figure 2: The average number of queries processed per second on the bare-metal and virtualized Dell PowerEdge R7525
environments. Source: Principled Technologies.
Virtualize a Dell PowerEdge R7525 server running VMware vSphere
7.0 Update 3 and get comparable image classification performance
On a VM in our vSphere 7.0 Update 3 environment using vGPUs, the MLPerf ResNet50 workload processed an
average of 27,435.1 queries per second, which was 97.5 percent of the queries per second that the bare-metal
solution classified (see Figure 2).
In addition to the comparable image classification performance in our testing, Dell PowerEdge R7525 servers
with VMware vSphere 7.0 Update 3 can allow admins to migrate, back up, resize, and perform all the other
benefits of virtualization on GPU-enabled VMs. This includes multiple VMs running on a single host, shared
virtual GPUs, cloning, vMotion™
, distributed resource scheduling, and suspending/resuming VMs.
About VMware vSphere 7.0 Update 3
Released in Q3 2021, vSphere 7.0 Update 3 aims to help organizations running and developing
AI and ML workloads. The release supports the latest generation of GPUs from NVIDIA, NVIDIA
Virtual GPU (vGPU) software,2
NVIDIA GPUDirect RDMA for vGPUs,3
and the latest spatial
partitioning-based NVIDIA multi-instance GPUs.4
The software also offers many capabilities for
vSphere with Tanzu.5
To learn more about vSphere 7.0 Update 3, visit https://www.vmware.com/products/vsphere.html.
Achieve near-bare-metal inference throughput for image classification workloads with the Dell PowerEdge R7525 server using virtual GPUs July 2022 | 4
Conclusion
For Dell PowerEdge R7525 servers with AMD EPYC processors and NVIDIA A100 Tensor Core GPUs, running
VMware vSphere 7 Update 3 and NVIDIA vGPUs allows you to share GPU cores for individual VMs to run ML
workloads. In our image classification test, a virtual configuration of the PowerEdge R7525 server with AMD
EPYC processors and NVIDIA vGPUs achieved up to 97.5 percent of bare-metal performance on the same server.
This means that your organization could get the benefits of virtualization, such as flexibility, reliability, and ease
of management, without incurring a significant performance hit on image classification workloads.
1.	 Dell Technologies, “Dell EMC PowerEdge R7525 Technical Guide,” accessed May 27, 2022,
https://i.dell.com/sites/csdocuments/Product_Docs/en/PowerEdge-R7525-Spec-Sheet.pdf.
2.	 VMware, “vSphere 7 Update 3 - What’s New Technical Overview,” accessed June 1, 2022,
https://www.youtube.com/watch?v=yD3cvKpKwZ4.
3.	 NVIDIA, “Virtual GPU Software R384 for VMware vSphere Release Notes,” accessed June 1, 2022,
https://docs.nvidia.com/grid/5.0/grid-vgpu-release-notes-vmware-vsphere/index.html.
4.	 NVIDIA, “NVIDIA Multi-Instance GPU and NVIDIA Virtual Compute Server,” accessed June 1, 2022,
https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/solutions/resources/documents1/Techni-
cal-Brief-Multi-Instance-GPU-NVIDIA-Virtual-Compute-Server.pdf.
5.	 VMware, “Announcing vSphere 7 Update 3,” accessed June 1, 2022,
https://blogs.vmware.com/vsphere/2021/09/announcing-vsphere-7-update-3.html.
Principled Technologies is a registered trademark of Principled Technologies, Inc.
All other product names are the trademarks of their respective owners.
For additional information, review the science behind this report.
Principled
Technologies®
Facts matter.®
Principled
Technologies®
Facts matter.®
This project was commissioned by Dell Technologies.
Read the science behind this report at https://facts.pt/ImXpGA7
Achieve near-bare-metal inference throughput for image classification workloads with the Dell PowerEdge R7525 server using virtual GPUs July 2022 | 5

More Related Content

More from Principled Technologies

Scale up your storage with higher-performing Dell APEX Block Storage for AWS
Scale up your storage with higher-performing Dell APEX Block Storage for AWSScale up your storage with higher-performing Dell APEX Block Storage for AWS
Scale up your storage with higher-performing Dell APEX Block Storage for AWSPrincipled Technologies
 
Get in and stay in the productivity zone with the HP Z2 G9 Tower Workstation
Get in and stay in the productivity zone with the HP Z2 G9 Tower WorkstationGet in and stay in the productivity zone with the HP Z2 G9 Tower Workstation
Get in and stay in the productivity zone with the HP Z2 G9 Tower WorkstationPrincipled Technologies
 
Open up new possibilities with higher transactional database performance from...
Open up new possibilities with higher transactional database performance from...Open up new possibilities with higher transactional database performance from...
Open up new possibilities with higher transactional database performance from...Principled Technologies
 
Improving database performance and value with an easy migration to Azure Data...
Improving database performance and value with an easy migration to Azure Data...Improving database performance and value with an easy migration to Azure Data...
Improving database performance and value with an easy migration to Azure Data...Principled Technologies
 
Realize better value and performance migrating from Azure Database for Postgr...
Realize better value and performance migrating from Azure Database for Postgr...Realize better value and performance migrating from Azure Database for Postgr...
Realize better value and performance migrating from Azure Database for Postgr...Principled Technologies
 
Realize better value and performance migrating from Azure Database for Postgr...
Realize better value and performance migrating from Azure Database for Postgr...Realize better value and performance migrating from Azure Database for Postgr...
Realize better value and performance migrating from Azure Database for Postgr...Principled Technologies
 
Set up students and teachers to excel now and in the future with Intel proces...
Set up students and teachers to excel now and in the future with Intel proces...Set up students and teachers to excel now and in the future with Intel proces...
Set up students and teachers to excel now and in the future with Intel proces...Principled Technologies
 
Finding the path to AI success with the Dell AI portfolio - Summary
Finding the path to AI success with the Dell AI portfolio - SummaryFinding the path to AI success with the Dell AI portfolio - Summary
Finding the path to AI success with the Dell AI portfolio - SummaryPrincipled Technologies
 
Finding the path to AI success with the Dell AI portfolio
Finding the path to AI success with the Dell AI portfolioFinding the path to AI success with the Dell AI portfolio
Finding the path to AI success with the Dell AI portfolioPrincipled Technologies
 
Achieve strong performance and value on Azure SQL Database Hyperscale
Achieve strong performance and value on Azure SQL Database HyperscaleAchieve strong performance and value on Azure SQL Database Hyperscale
Achieve strong performance and value on Azure SQL Database HyperscalePrincipled Technologies
 
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...Principled Technologies
 
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...Principled Technologies
 
Utilizing Azure Cosmos DB for intelligent AI‑powered applications
Utilizing Azure Cosmos DB for intelligent AI‑powered applicationsUtilizing Azure Cosmos DB for intelligent AI‑powered applications
Utilizing Azure Cosmos DB for intelligent AI‑powered applicationsPrincipled Technologies
 
Build an Azure OpenAI application using your own enterprise data
Build an Azure OpenAI application using your own enterprise dataBuild an Azure OpenAI application using your own enterprise data
Build an Azure OpenAI application using your own enterprise dataPrincipled Technologies
 
Dell Chromebooks: Durable, easy to deploy, and easy to service
Dell Chromebooks: Durable, easy to deploy, and easy to serviceDell Chromebooks: Durable, easy to deploy, and easy to service
Dell Chromebooks: Durable, easy to deploy, and easy to servicePrincipled Technologies
 
Back up and restore data faster with a Dell PowerProtect Data Manager Applian...
Back up and restore data faster with a Dell PowerProtect Data Manager Applian...Back up and restore data faster with a Dell PowerProtect Data Manager Applian...
Back up and restore data faster with a Dell PowerProtect Data Manager Applian...Principled Technologies
 
Back up and restore data faster with a Dell PowerProtect Data Manager Appliance
Back up and restore data faster with a Dell PowerProtect Data Manager ApplianceBack up and restore data faster with a Dell PowerProtect Data Manager Appliance
Back up and restore data faster with a Dell PowerProtect Data Manager AppliancePrincipled Technologies
 
Meeting the challenges of AI workloads with the Dell AI portfolio - Summary
Meeting the challenges of AI workloads with the Dell AI portfolio - SummaryMeeting the challenges of AI workloads with the Dell AI portfolio - Summary
Meeting the challenges of AI workloads with the Dell AI portfolio - SummaryPrincipled Technologies
 
The Dell Latitude 5440 survived 30 drops and still functioned properly
The Dell Latitude 5440 survived 30 drops and still functioned properlyThe Dell Latitude 5440 survived 30 drops and still functioned properly
The Dell Latitude 5440 survived 30 drops and still functioned properlyPrincipled Technologies
 
Meeting the challenges of AI workloads with the Dell AI portfolio
Meeting the challenges of AI workloads with the Dell AI portfolioMeeting the challenges of AI workloads with the Dell AI portfolio
Meeting the challenges of AI workloads with the Dell AI portfolioPrincipled Technologies
 

More from Principled Technologies (20)

Scale up your storage with higher-performing Dell APEX Block Storage for AWS
Scale up your storage with higher-performing Dell APEX Block Storage for AWSScale up your storage with higher-performing Dell APEX Block Storage for AWS
Scale up your storage with higher-performing Dell APEX Block Storage for AWS
 
Get in and stay in the productivity zone with the HP Z2 G9 Tower Workstation
Get in and stay in the productivity zone with the HP Z2 G9 Tower WorkstationGet in and stay in the productivity zone with the HP Z2 G9 Tower Workstation
Get in and stay in the productivity zone with the HP Z2 G9 Tower Workstation
 
Open up new possibilities with higher transactional database performance from...
Open up new possibilities with higher transactional database performance from...Open up new possibilities with higher transactional database performance from...
Open up new possibilities with higher transactional database performance from...
 
Improving database performance and value with an easy migration to Azure Data...
Improving database performance and value with an easy migration to Azure Data...Improving database performance and value with an easy migration to Azure Data...
Improving database performance and value with an easy migration to Azure Data...
 
Realize better value and performance migrating from Azure Database for Postgr...
Realize better value and performance migrating from Azure Database for Postgr...Realize better value and performance migrating from Azure Database for Postgr...
Realize better value and performance migrating from Azure Database for Postgr...
 
Realize better value and performance migrating from Azure Database for Postgr...
Realize better value and performance migrating from Azure Database for Postgr...Realize better value and performance migrating from Azure Database for Postgr...
Realize better value and performance migrating from Azure Database for Postgr...
 
Set up students and teachers to excel now and in the future with Intel proces...
Set up students and teachers to excel now and in the future with Intel proces...Set up students and teachers to excel now and in the future with Intel proces...
Set up students and teachers to excel now and in the future with Intel proces...
 
Finding the path to AI success with the Dell AI portfolio - Summary
Finding the path to AI success with the Dell AI portfolio - SummaryFinding the path to AI success with the Dell AI portfolio - Summary
Finding the path to AI success with the Dell AI portfolio - Summary
 
Finding the path to AI success with the Dell AI portfolio
Finding the path to AI success with the Dell AI portfolioFinding the path to AI success with the Dell AI portfolio
Finding the path to AI success with the Dell AI portfolio
 
Achieve strong performance and value on Azure SQL Database Hyperscale
Achieve strong performance and value on Azure SQL Database HyperscaleAchieve strong performance and value on Azure SQL Database Hyperscale
Achieve strong performance and value on Azure SQL Database Hyperscale
 
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
 
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
 
Utilizing Azure Cosmos DB for intelligent AI‑powered applications
Utilizing Azure Cosmos DB for intelligent AI‑powered applicationsUtilizing Azure Cosmos DB for intelligent AI‑powered applications
Utilizing Azure Cosmos DB for intelligent AI‑powered applications
 
Build an Azure OpenAI application using your own enterprise data
Build an Azure OpenAI application using your own enterprise dataBuild an Azure OpenAI application using your own enterprise data
Build an Azure OpenAI application using your own enterprise data
 
Dell Chromebooks: Durable, easy to deploy, and easy to service
Dell Chromebooks: Durable, easy to deploy, and easy to serviceDell Chromebooks: Durable, easy to deploy, and easy to service
Dell Chromebooks: Durable, easy to deploy, and easy to service
 
Back up and restore data faster with a Dell PowerProtect Data Manager Applian...
Back up and restore data faster with a Dell PowerProtect Data Manager Applian...Back up and restore data faster with a Dell PowerProtect Data Manager Applian...
Back up and restore data faster with a Dell PowerProtect Data Manager Applian...
 
Back up and restore data faster with a Dell PowerProtect Data Manager Appliance
Back up and restore data faster with a Dell PowerProtect Data Manager ApplianceBack up and restore data faster with a Dell PowerProtect Data Manager Appliance
Back up and restore data faster with a Dell PowerProtect Data Manager Appliance
 
Meeting the challenges of AI workloads with the Dell AI portfolio - Summary
Meeting the challenges of AI workloads with the Dell AI portfolio - SummaryMeeting the challenges of AI workloads with the Dell AI portfolio - Summary
Meeting the challenges of AI workloads with the Dell AI portfolio - Summary
 
The Dell Latitude 5440 survived 30 drops and still functioned properly
The Dell Latitude 5440 survived 30 drops and still functioned properlyThe Dell Latitude 5440 survived 30 drops and still functioned properly
The Dell Latitude 5440 survived 30 drops and still functioned properly
 
Meeting the challenges of AI workloads with the Dell AI portfolio
Meeting the challenges of AI workloads with the Dell AI portfolioMeeting the challenges of AI workloads with the Dell AI portfolio
Meeting the challenges of AI workloads with the Dell AI portfolio
 

Achieve near-bare-metal inference throughput for image classification workloads with the Dell PowerEdge R7525 server using virtual GPUs

  • 1. Achieve near-bare-metal inference throughput for image classification workloads with the Dell PowerEdge R7525 server using virtual GPUs Plus, virtualizing servers with VMware vSphere 7.0 Update 3 and vGPUs could help simplify machine learning workload management Virtualization has changed the world of computing, but every hypervisor adds some amount of computing overhead compared to a bare-metal environment. While this overhead may not present an issue for many workloads, businesses may well feel those effects on compute-heavy workloads, such as machine learning (ML). With VMware® vSphere® 7.0 Update 3, Dell™ PowerEdge™ R7525 servers powered by AMD EPYC™ processors offer a platform that can minimize performance loss with virtualized GPUs (vGPUs) while delivering the ease-of-management, redundancy, reliability, mobility, and security advantages that virtualization can provide. In our data center, we tested ML-based image classification inference throughput on bare-metal and virtual configurations of a Dell PowerEdge R7525 with two AMD EPYC 7543 processors and an NVIDIA® A100 Tensor Core GPU. When running the MLPerf™ ResNet50 workload, the bare-metal configuration processed 28,130 queries per second. The virtual configuration, with VMware vSphere 7.0 Update 3 and a single guest VM, processed 27,435 queries per second—97.5 percent of the bare-metal configuration’s image classification performance. Either configuration of the PowerEdge R7525 could serve as the backbone in image-classification workloads, delivering strong performance as a bare-metal or virtual solution. Virtualizing the PowerEdge R7525 with NVIDIA vGPU software and vSphere 7 Update 3 could enable IT admins to run multiple virtual machines (VMs) on the same physical GPU, making it easier to dial in GPU-level performance as needed. Minimize performance degradation from virtualization Achieve 97.5% of the image classification performance of a bare-metal configuration Simplify ML workload management Admins can ensure uptime and resource allocation through virtual management Achieve near-bare-metal inference throughput for image classification workloads with the Dell PowerEdge R7525 server using virtual GPUs July 2022 A Principled Technologies report: Hands-on testing. Real-world results.
  • 2. How we tested Artificial intelligence (AI) and ML workloads rely heavily on computing speed. Organizations that want faster AI or ML workload performance can use GPUs in addition to CPUs because GPUs offer extreme parallelization and matrix math capabilities. Organizations may run GPU- based workloads on bare-metal configurations to avoid the possibility of hypervisor or virtualization overheads. For those compute-heavy workloads, IT staff configure optimization, tuning, and power settings to provide the best GPU-level performance possible. For our bare-metal solution, we configured a Dell PowerEdge R7525 server with Ubuntu 20.04. The PowerEdge server contained 480 GB of storage and 512 GB of DDR4 3200MT/S memory. We installed, tuned, and ran an MLPerf AI benchmark on the bare-metal solution according to best practices. The workload we chose classifies images within a known dataset using the ResNet50 ML model. We built our virtual environment on the same server hardware as our bare-metal environment. After testing the bare-metal solution, we kept the same BIOS settings, installed VMware vSphere 7.0 Update 3 onto the server, and created a VM with the same OS and tunings as our bare-metal configuration. Then, we ran the same MLPerf ResNet50 workload on our VM and compared the results of the two configurations. Although our testing used only one VM, supporting more VMs is possible. By making vGPU software available in vSphere 7.0 Update 3, many VMs can share one or more GPUs. Prior to NVIDIA vGPU (and predecessor NVIDIA GRID vGPU™ ), GPUs were available to VMs only in pass-through mode, meaning a VM using a GPU would lock it down from use by other VMs. Sharing GPU resources between many VMs could allow your organization to use one GPU for multiple tasks. For example, one Linux® VM could run an AI or ML workload while a Windows VM runs a graphic-intensive ANSYS application for computer-aided engineering (CAE). Before vGPUs, running multiple workloads on the same GPU was complex and inefficient, much less doing the same for multiple VMs. Potential real-world benefits Organizations using AI and ML may run training and inference workloads on separate servers because the workloads run at different times and with different power requirements, tunings, or access restrictions. If a server goes down, its workload could be offline until it is brought up on a new server or IT fixes the issue that brought the server down. For manufacturing companies that run image classification workloads as part of quality control, that downtime could lead to lost revenue and delays in production. With Dell PowerEdge R7525 servers supporting VMware vSphere 7.0 Update 3 virtual environments, such a company could combine pairs or multiples of servers into redundant clusters, with workloads separated by VM. Then, if a server goes down, IT staff can quickly bring up the VMs on a failover server, with little interruption or downtime. Virtual environments need, at most, as much hardware as equivalent bare-metal environments. However, it’s possible that a virtual environment with VMware vSphere 7 and NVIDIA vGPU software could handle the same amount of work as a bare-metal environment with less hardware or do more work with more hardware. Achieve near-bare-metal inference throughput for image classification workloads with the Dell PowerEdge R7525 server using virtual GPUs July 2022 | 2
  • 3. Bare-metal GPU performance on image classification workload 28,130 Average queries per second (higher is better) Figure 1: The average number of queries processed per second on the bare-metal Dell PowerEdge R7525 environment. Source: Principled Technologies. Classify images quickly with a bare-metal Dell PowerEdge R7525 environment While this study highlights the benefits of GPU virtualization, it is worth noting that the Dell PowerEdge R7525 equipped with AMD EPYC processors and an NVIDIA A100 Tensor Core GPU delivered solid bare-metal performance for the image classification workload we tested. To enable the performance that the NVIDIA A100 Tensor Core GPU provided, the Dell PowerEdge R7525 supported Gen 4 PCIe connectivity, with up to 16 GT/s, to help maximize GPU throughput and offered 1400W dual PSUs and an optional GPU-enablement kit with high- powered fans, heatsinks, and air shroud to help keep the GPU cool. In addition, the AMD EPYC CPUs provided ample processing power for supporting tasks, such as copying between memory and GPU and I/O logic. Having a powerful server as a host for the NVIDIA A100 Tensor Core GPU provided the necessary system resources, power, and cooling to help deliver smooth performance during our tests. Our bare-metal solution processed an average of 28,130.7 queries per second on the MLPerf ResNet50 workload. About the Dell PowerEdge R7525 The Dell PowerEdge R7525 server, powered by 3rd Gen AMD EPYC 7003 series processors, can provide PCIe™ Gen 4 speed with up to three double-width 300W or six single-width 75W accelerators. Dual 2400W PSUs provide the necessary power to support GPUs, and up to 12 hot-plug fans can keep graphics cards cool.1 To learn more about the PowerEdge R7525, visit https://www.delltechnologies.com/en-us/ servers/poweredge-rack-servers.htm. Achieve near-bare-metal inference throughput for image classification workloads with the Dell PowerEdge R7525 server using virtual GPUs July 2022 | 3
  • 4. 0 5,000 10,000 15,000 20,000 25,000 30,000 Virtualized configuration using vGPU Bare-metal configuration Average queries per second (higher is better) Bare-metal and virtualized performance on image classification workload 27435.1 28,130.7 Figure 2: The average number of queries processed per second on the bare-metal and virtualized Dell PowerEdge R7525 environments. Source: Principled Technologies. Virtualize a Dell PowerEdge R7525 server running VMware vSphere 7.0 Update 3 and get comparable image classification performance On a VM in our vSphere 7.0 Update 3 environment using vGPUs, the MLPerf ResNet50 workload processed an average of 27,435.1 queries per second, which was 97.5 percent of the queries per second that the bare-metal solution classified (see Figure 2). In addition to the comparable image classification performance in our testing, Dell PowerEdge R7525 servers with VMware vSphere 7.0 Update 3 can allow admins to migrate, back up, resize, and perform all the other benefits of virtualization on GPU-enabled VMs. This includes multiple VMs running on a single host, shared virtual GPUs, cloning, vMotion™ , distributed resource scheduling, and suspending/resuming VMs. About VMware vSphere 7.0 Update 3 Released in Q3 2021, vSphere 7.0 Update 3 aims to help organizations running and developing AI and ML workloads. The release supports the latest generation of GPUs from NVIDIA, NVIDIA Virtual GPU (vGPU) software,2 NVIDIA GPUDirect RDMA for vGPUs,3 and the latest spatial partitioning-based NVIDIA multi-instance GPUs.4 The software also offers many capabilities for vSphere with Tanzu.5 To learn more about vSphere 7.0 Update 3, visit https://www.vmware.com/products/vsphere.html. Achieve near-bare-metal inference throughput for image classification workloads with the Dell PowerEdge R7525 server using virtual GPUs July 2022 | 4
  • 5. Conclusion For Dell PowerEdge R7525 servers with AMD EPYC processors and NVIDIA A100 Tensor Core GPUs, running VMware vSphere 7 Update 3 and NVIDIA vGPUs allows you to share GPU cores for individual VMs to run ML workloads. In our image classification test, a virtual configuration of the PowerEdge R7525 server with AMD EPYC processors and NVIDIA vGPUs achieved up to 97.5 percent of bare-metal performance on the same server. This means that your organization could get the benefits of virtualization, such as flexibility, reliability, and ease of management, without incurring a significant performance hit on image classification workloads. 1. Dell Technologies, “Dell EMC PowerEdge R7525 Technical Guide,” accessed May 27, 2022, https://i.dell.com/sites/csdocuments/Product_Docs/en/PowerEdge-R7525-Spec-Sheet.pdf. 2. VMware, “vSphere 7 Update 3 - What’s New Technical Overview,” accessed June 1, 2022, https://www.youtube.com/watch?v=yD3cvKpKwZ4. 3. NVIDIA, “Virtual GPU Software R384 for VMware vSphere Release Notes,” accessed June 1, 2022, https://docs.nvidia.com/grid/5.0/grid-vgpu-release-notes-vmware-vsphere/index.html. 4. NVIDIA, “NVIDIA Multi-Instance GPU and NVIDIA Virtual Compute Server,” accessed June 1, 2022, https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/solutions/resources/documents1/Techni- cal-Brief-Multi-Instance-GPU-NVIDIA-Virtual-Compute-Server.pdf. 5. VMware, “Announcing vSphere 7 Update 3,” accessed June 1, 2022, https://blogs.vmware.com/vsphere/2021/09/announcing-vsphere-7-update-3.html. Principled Technologies is a registered trademark of Principled Technologies, Inc. All other product names are the trademarks of their respective owners. For additional information, review the science behind this report. Principled Technologies® Facts matter.® Principled Technologies® Facts matter.® This project was commissioned by Dell Technologies. Read the science behind this report at https://facts.pt/ImXpGA7 Achieve near-bare-metal inference throughput for image classification workloads with the Dell PowerEdge R7525 server using virtual GPUs July 2022 | 5