SlideShare a Scribd company logo
1 of 15
Version 7.0 Webinar Update
Florent.Lebeau@arm.com
What is new in 7.0?
Full
Intel
KNL
support
HBM memory debugging
PAPI
and
Lustre
metrics
Cross-platform
hardware counter
information
Extend IO profiling
capabilities
Create
your
own
metric!
Export
to
JSON
Integration to CI tools
As of December 2016, Allinea is part of ARM
Our objective:
Remain the trusted leader in cross platform HPC tools
• We will continue to work with our customers, partners and you!
The same successful team…
• We can now respond quicker and deliver our roadmap faster
… is stronger than ever…
• We remain 100% committed to providing cross-platforms tools
for HPC
… as committed as ever…
• We are working with vendors to support the next generations
of systems.
… and looking forward to the future.
The Road to Performance
https://youtu.be/l55w3Fy_J0Y
Analyze on Different Platforms
Very simple start-up
Low overhead
Powerful data analysis
For x86_64 and KNL
Understand IO/Lustre Efficiency
• Client request
information about the
file to the MDS
• Client R/W file to the
OST in parallel
• Not designed to
handle a large
number of small files
Lustre Client
Metadata
Server (MDS)
Object
Storage
Server
(OSS)
+ OST
Object
Storage
Server
(OSS)
+ OST
Object
Storage
Server
(OSS)
+ OST
Object
Storage
Server
(OSS)
+ OST
Open
Object
Storage
Target
(OST)
Read/Write
Improve Cache Usage Using PAPI
• PAPI collects hardware counter information
– FLOPS, vectorization, branch prediction, cache usage …
– http://icl.utk.edu/papi/
Registers
L1 Cache
L2 Cache
L3 Cache
Main memory
Improve Cache Usage Using PAPI
• Run “papi_install.sh” from the Allinea Forge
installation folder
• Select the metrics to collect in a configuration file
Code Optimization for All Platforms
• PAPI metrics are portable:
– x86_64, ARMv8 and OpenPower
• Allinea Forge 7.0 extends support to IBM Spectrum
MPI
• Data from MAP files can be exported in JSON
Make the Most of the HBM on KNL
• The high bandwidth memory (HBM) is on the same
CPU chip, next to processing cores
• 3 modes:
Cores
HBM
DDR4
Cores HBM
DDR4Cache Flat Hybrid
Cores
HBM
DDR4
HBM
Make the Most of the HBM on KNL
• Use the memory debugging feature of Allinea DDT
to track the DDR4 / HBM usage
Create your Own Metrics
System
introspection
• Specific counters /
files
Application
introspection
• Tracking application
characteristics
Application
Application
Library
Profile
Library
MAP
Profile
XML
Application Profiler
Update Your Tools Now!
• Debug
• Profile
• For x86_64, KNL,
ARMv8,
OpenPower
Portability
• Hardware counters
with PAPI metrics
• IO Lustre Metrics
In-depth
• Application/system
specific custom
metrics
• Export data to
integrate to your
workflow
Flexibility
• Download our latest version 7.0.2
– https://www.allinea.com/products/forge/download
– https://www.allinea.com/products/reports/download
• Request a trial!
– https://www.allinea.com/trials
Products and Licensing
Performance Report
Workstation /
Supercomputing
Forge
Workstation /
Supercomputing
Forge Professional
Workstation
/Supercomputing
GPU, accelerators
metrics, energy
metrics, PAPI metrics,
Lustre metrics
See our full menu of live or recorded webinars:
https://www.allinea.com/performance-webinars-menu
To take a trial visit:
https://www.allinea.com/get-your-free-allinea-forge-and-allinea-
performance-reports-trial
To contact sales email:
sales@allinea.com

More Related Content

Similar to Version 7 (002)

Presentazione IBM Power System Evento Venaria 14 ottobre
Presentazione IBM Power System Evento Venaria 14 ottobrePresentazione IBM Power System Evento Venaria 14 ottobre
Presentazione IBM Power System Evento Venaria 14 ottobrePRAGMA PROGETTI
 
Exploring the Open Source Linux Ecosystem
Exploring the Open Source Linux EcosystemExploring the Open Source Linux Ecosystem
Exploring the Open Source Linux EcosystemIBM
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Michael Hiskey
 
When HPC meet ML/DL: Manage HPC Data Center with Kubernetes
When HPC meet ML/DL: Manage HPC Data Center with KubernetesWhen HPC meet ML/DL: Manage HPC Data Center with Kubernetes
When HPC meet ML/DL: Manage HPC Data Center with KubernetesYong Feng
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Wes McKinney
 
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)Cheer Chain Enterprise Co., Ltd.
 
HPUX Update Seminar Session 1 Dan Taipala
HPUX Update Seminar Session 1   Dan TaipalaHPUX Update Seminar Session 1   Dan Taipala
HPUX Update Seminar Session 1 Dan Taipaladtaipala
 
Velocity-EHF for Android
Velocity-EHF for AndroidVelocity-EHF for Android
Velocity-EHF for Androidmichaeljfawcett
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stackinside-BigData.com
 
Presentacion day f-core v1.2.1.2-technical - english
Presentacion day f-core v1.2.1.2-technical - englishPresentacion day f-core v1.2.1.2-technical - english
Presentacion day f-core v1.2.1.2-technical - englishJose Luis Sanchez del Coso
 
Linux one vs x86 18 july
Linux one vs x86 18 julyLinux one vs x86 18 july
Linux one vs x86 18 julyDiego Rodriguez
 
Best Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing ClustersBest Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing ClustersIntel® Software
 
Modular middleware components in Apache Mynewt OS - SFO17-507
Modular middleware components in Apache Mynewt OS - SFO17-507Modular middleware components in Apache Mynewt OS - SFO17-507
Modular middleware components in Apache Mynewt OS - SFO17-507Linaro
 
MMS2012-HP VirtualSystem-The Ideal Foundation for a Microsoft Private Cloud
MMS2012-HP VirtualSystem-The Ideal Foundation for a Microsoft Private CloudMMS2012-HP VirtualSystem-The Ideal Foundation for a Microsoft Private Cloud
MMS2012-HP VirtualSystem-The Ideal Foundation for a Microsoft Private CloudHarold Sriver
 
A high profile project with Symfony and API Platform: beIN SPORTS
A high profile project with Symfony and API Platform: beIN SPORTSA high profile project with Symfony and API Platform: beIN SPORTS
A high profile project with Symfony and API Platform: beIN SPORTSSmile I.T is open
 

Similar to Version 7 (002) (20)

Ibm power systems hpc cluster
Ibm power systems hpc cluster Ibm power systems hpc cluster
Ibm power systems hpc cluster
 
Presentazione IBM Power System Evento Venaria 14 ottobre
Presentazione IBM Power System Evento Venaria 14 ottobrePresentazione IBM Power System Evento Venaria 14 ottobre
Presentazione IBM Power System Evento Venaria 14 ottobre
 
Exploring the Open Source Linux Ecosystem
Exploring the Open Source Linux EcosystemExploring the Open Source Linux Ecosystem
Exploring the Open Source Linux Ecosystem
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
When HPC meet ML/DL: Manage HPC Data Center with Kubernetes
When HPC meet ML/DL: Manage HPC Data Center with KubernetesWhen HPC meet ML/DL: Manage HPC Data Center with Kubernetes
When HPC meet ML/DL: Manage HPC Data Center with Kubernetes
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
 
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
 
HPUX Update Seminar Session 1 Dan Taipala
HPUX Update Seminar Session 1   Dan TaipalaHPUX Update Seminar Session 1   Dan Taipala
HPUX Update Seminar Session 1 Dan Taipala
 
Velocity-EHF for Android
Velocity-EHF for AndroidVelocity-EHF for Android
Velocity-EHF for Android
 
Fault Analyzer for z/OS Overview
Fault Analyzer for z/OS OverviewFault Analyzer for z/OS Overview
Fault Analyzer for z/OS Overview
 
Migrating from ibm to hpe
Migrating from ibm to hpeMigrating from ibm to hpe
Migrating from ibm to hpe
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stack
 
Presentacion day f-core v1.2.1.2-technical - english
Presentacion day f-core v1.2.1.2-technical - englishPresentacion day f-core v1.2.1.2-technical - english
Presentacion day f-core v1.2.1.2-technical - english
 
Linux one vs x86
Linux one vs x86 Linux one vs x86
Linux one vs x86
 
Linux one vs x86 18 july
Linux one vs x86 18 julyLinux one vs x86 18 july
Linux one vs x86 18 july
 
Best Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing ClustersBest Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing Clusters
 
Modular middleware components in Apache Mynewt OS - SFO17-507
Modular middleware components in Apache Mynewt OS - SFO17-507Modular middleware components in Apache Mynewt OS - SFO17-507
Modular middleware components in Apache Mynewt OS - SFO17-507
 
MMS2012-HP VirtualSystem-The Ideal Foundation for a Microsoft Private Cloud
MMS2012-HP VirtualSystem-The Ideal Foundation for a Microsoft Private CloudMMS2012-HP VirtualSystem-The Ideal Foundation for a Microsoft Private Cloud
MMS2012-HP VirtualSystem-The Ideal Foundation for a Microsoft Private Cloud
 
A high profile project with Symfony and API Platform: beIN SPORTS
A high profile project with Symfony and API Platform: beIN SPORTSA high profile project with Symfony and API Platform: beIN SPORTS
A high profile project with Symfony and API Platform: beIN SPORTS
 

Recently uploaded

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutionsmonugehlot87
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?Watsoo Telematics
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 

Recently uploaded (20)

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutions
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 

Version 7 (002)

  • 1. Version 7.0 Webinar Update Florent.Lebeau@arm.com
  • 2. What is new in 7.0? Full Intel KNL support HBM memory debugging PAPI and Lustre metrics Cross-platform hardware counter information Extend IO profiling capabilities Create your own metric! Export to JSON Integration to CI tools
  • 3. As of December 2016, Allinea is part of ARM Our objective: Remain the trusted leader in cross platform HPC tools • We will continue to work with our customers, partners and you! The same successful team… • We can now respond quicker and deliver our roadmap faster … is stronger than ever… • We remain 100% committed to providing cross-platforms tools for HPC … as committed as ever… • We are working with vendors to support the next generations of systems. … and looking forward to the future.
  • 4. The Road to Performance https://youtu.be/l55w3Fy_J0Y
  • 5. Analyze on Different Platforms Very simple start-up Low overhead Powerful data analysis For x86_64 and KNL
  • 6. Understand IO/Lustre Efficiency • Client request information about the file to the MDS • Client R/W file to the OST in parallel • Not designed to handle a large number of small files Lustre Client Metadata Server (MDS) Object Storage Server (OSS) + OST Object Storage Server (OSS) + OST Object Storage Server (OSS) + OST Object Storage Server (OSS) + OST Open Object Storage Target (OST) Read/Write
  • 7. Improve Cache Usage Using PAPI • PAPI collects hardware counter information – FLOPS, vectorization, branch prediction, cache usage … – http://icl.utk.edu/papi/ Registers L1 Cache L2 Cache L3 Cache Main memory
  • 8. Improve Cache Usage Using PAPI • Run “papi_install.sh” from the Allinea Forge installation folder • Select the metrics to collect in a configuration file
  • 9. Code Optimization for All Platforms • PAPI metrics are portable: – x86_64, ARMv8 and OpenPower • Allinea Forge 7.0 extends support to IBM Spectrum MPI • Data from MAP files can be exported in JSON
  • 10. Make the Most of the HBM on KNL • The high bandwidth memory (HBM) is on the same CPU chip, next to processing cores • 3 modes: Cores HBM DDR4 Cores HBM DDR4Cache Flat Hybrid Cores HBM DDR4 HBM
  • 11. Make the Most of the HBM on KNL • Use the memory debugging feature of Allinea DDT to track the DDR4 / HBM usage
  • 12. Create your Own Metrics System introspection • Specific counters / files Application introspection • Tracking application characteristics Application Application Library Profile Library MAP Profile XML Application Profiler
  • 13. Update Your Tools Now! • Debug • Profile • For x86_64, KNL, ARMv8, OpenPower Portability • Hardware counters with PAPI metrics • IO Lustre Metrics In-depth • Application/system specific custom metrics • Export data to integrate to your workflow Flexibility • Download our latest version 7.0.2 – https://www.allinea.com/products/forge/download – https://www.allinea.com/products/reports/download • Request a trial! – https://www.allinea.com/trials
  • 14. Products and Licensing Performance Report Workstation / Supercomputing Forge Workstation / Supercomputing Forge Professional Workstation /Supercomputing GPU, accelerators metrics, energy metrics, PAPI metrics, Lustre metrics
  • 15. See our full menu of live or recorded webinars: https://www.allinea.com/performance-webinars-menu To take a trial visit: https://www.allinea.com/get-your-free-allinea-forge-and-allinea- performance-reports-trial To contact sales email: sales@allinea.com

Editor's Notes

  1. Hello everyone, thank you very much for joining this webinar. I am Florent Lebeau, Application engineer at Allinea now part of ARM and during this webinar I will tell you more about the latest features of version 7 of our tools Allinea Forge, which includes Allinea MAP for profiling and Allinea DDT for debugging - and Allinea Performance Reports for application behaviour analysis. If you any have question during the presentation, please ask them to my colleague Mark Clarke using the webex chat window. Mark is here to assist me today: he will collect questions and we will answer to a few of them at the end of the presentation.
  2. Version 7 was released a few months ago and it is a major step forward in our support for the latest architectures, including Intel KNL and OpenPower. This version also extends the capabilities of the tools in order to provide more insight about the hardware usage on different platforms and IO bottlenecks. Not only we are providing the HPC community with ready-to-use solutions, but we are also opening our tools for customisation so that our users can create their own! Finally, version 7.0 is also an evolution towards software development best practices by facilitating the integration to your testing workflows.
  3. But before we start, I would like to get back in time… In December 2016, Allinea joined ARM. With 95 billion ARM-based chips shipped to date and more than 15 billion chips shipped every year, ARM is one of the largest semiconductor company in the world. By joining ARM, Allinea is bringing 15 year of experience in HPC and leading-edge tools for high performance code development. Our tools are used everyday, on the largest supercomputers, on all architectures and this will remain the same. Being part of ARM means that we have a stronger support than ever to deliver our roadmap for cross platform tools. We are committed to help the community overcome performance challenges and look forward to face the challenges to come.
  4. On our YouTube channel, you will be able to find our “Performance Roadmap”. This video illustrates and give you guidelines on how to optimise your applications step by step. Because the reason for inefficiency is sometimes hidden by the symptoms and concentrated in small portions of code, premature optimisation is never a good thing. A good method combined to the right tools are required in order to achieve your performance goals.
  5. The first step is to analyse before you optimise. By prefixing the original command with “perf-report” in your job script, Allinea Performance Reports enables to describe the application behaviour quickly and easily. The tool has very little overhead so the report that you get, as you can see on the left, is as close as possible to the application running in production. The version 7.0 supports x86_64 and KNL architectures to provide a rich set of data and powerful analysis about the bottleneck of the application running on these systems. For example, the report displayed on the slide clearly show that this application is IO bound. How can we go further?
  6. On parallel Lustre filesystems, the data is split across different hard disk drives or solid state devices referred as OST in the model displayed on the screen. This is where the parallel IO performance comes from. However, in order to perform the IO on the OST, the information has to be retrieved from the metadata server or MDS. As a result, the MDS can sometimes be responsible for the bottleneck, especially if a large number of small files are being accesses for example. To detect and understand this kind of situation, the version 7.0 of the Allinea MAP profiler provides Lustre metrics. Allinea MAP will be able to display graphs about read and write transfer rates on the disks, but also about the number of metadata operations and the number of file open operations per second.
  7. There are many more steps in the performance roadmap after improving the IO but at some point, memory accesses patterns need to be investigated. Performance Report can help you identify such inefficiencies thanks to the “Memory access” metric. If this value is high: this is a sign that the compute kernels are memory-bound. To optimise this, it important to take advantage of the levels of caches memory available on the CPU before accessing the main memory. This is a difficult thing to understand and optimise, especially if the same application runs on different architectures or different generations of the same platform but some tools can help. PAPI is one of them, this API developed at the Imperial College London enables to collect hardware counters data about floating point operations per second, vectorisation and cache usage. However, using PAPI requires important code changes to collect and display these information.
  8. Instead of instrumenting your application with PAPI, Allinea Forge 7.0 allows you to rely on the profiler sampler to collect these information automatically with very little overhead. Furthermore, you can now rely on the profiler’s user friendly interface to display the data over time. To install the additional metrics, run the “papi_install.sh” script from the installation folder and follow the instructions displayed to specify the metrics you are interested in, for example: Overview – with FLOPS and cycles per instruction. CachesMisses – with L1,L2 and L3 caches misses. PAPI metrics are cross-platform …
  9. … and enable to understand cache usage for instance on ARM and Power architecture as well as on x86. In addition to this regarding our cross-platform support Allinea Forge 7.0 extends the support to the new IBM Spectrum MPI for Power and x86_64 platforms. Finally, Allinea Forge profiling data can be exported to JSON files in order to facilitate continuous integration. Thanks to this, performance regressions on multiple platforms can be tracked using Jenkins or Bamboo for example. Optimisation of memory access patterns sometimes require the use of a debugger to catch errors or understand how and where the data are allocated in the source code of the application. This is particularly critical on the latest Intel KNL architecture …
  10. … whose performance comes from the high bandwidth memory located on the same chip than the 72 compute cores. The KNL can be configured to use the HBM as a cache between the core and the system memory of as a separate, distinct memory accesses explicitly by the programmer – this is the flat mode. The KNL can also be configured to use part of the HBM as a cache or part as a distinct memory. Giving programmer the key to understand where allocation are performed on the system is necessary and that’s why
  11. Allinea Forge 7.0 extends the support of KNL architecture to HBM memory debugging. HPC developers are now able to detect memory errors on the HBM but also track usage, as the screenshot show, where 2MB of data are allocated by process 0 on the main memory and 2 MB on the HBM. The Lustre and PAPI metrics presented earlier are just an example of the flexibility of Allinea tools…
  12. … Version 7.0 actually allows you create your own metrics, whether you are interested in the evolution of very specific hardware metrics when running your application or in how in the evolution of the application internal parameters. The idea to write your own metrics is the same: Allinea Forge’s profiler Allinea MAP requires a library to collect the samples and an XML descriptor to specify how the metric should be displayed in the GUI. Here are a few examples that have been developed for some projects…
  13. This concludes this webinar. We have presented 7.0’s latest features that illustrate Allinea tools’ capabilities for all HPC architectures: x86, KNL, ARM and OpenPower. Not only this strengthen our position to be the cross-platform tool of choice, but it also shows our commitment to provide a broad solution that covers all aspects of development from debugging to I/O, MPI, fine-grain CPU and application-specific profiling. We also help enforce best practise by designing our tools to be easily integrated to development and testing workflow. You can download and update your version of our tools from our website. The current revision is 7.0.2. If you don’t have a licence, feel free to request a trial licence for our website to try the latest features!
  14. Thanks for very much for attending this webinar. We now have time for a few questions, please