SlideShare a Scribd company logo
1 of 49
1
Distributed Memory
Programming with MPI
Slides extended from
An Introduction to Parallel Programming by Peter
Pacheco
Dilum Bandara
Dilum.Bandara@uom.lk
2
Distributed Memory Systems
 We discuss about developing programs for these
systems using MPI
 MPI – Message Passing Interface
 Are set of libraries that can be called from C, C++, &
Fortran
Copyright © 2010, Elsevier Inc. All rights Reserved
3
Why MPI?
 Standardized & portable message-passing
system
 One of the oldest libraries
 Wide-spread adoption
 Minimal requirements on underlying hardware
 Explicit parallelization
 Achieves high performance
 Scales to a large no of processors
 Intellectually demanding
Copyright © 2010, Elsevier Inc. All rights Reserved
4
Our First MPI Program
Copyright © 2010, Elsevier Inc. All rights Reserved
5
Compilation
Copyright © 2010, Elsevier Inc. All rights Reserved
mpicc -g -Wall -o mpi_hello mpi_hello.c
wrapper script to compile
turns on all warnings
source file
create this executable file name
(as opposed to default a.out)
produce
debugging
information
6
Execution
Copyright © 2010, Elsevier Inc. All rights Reserved
mpiexec -n <no of processes> <executable>
mpiexec -n 1 ./mpi_hello
mpiexec -n 4 ./mpi_hello
run with 1 process
run with 4 processes
7
Execution
Copyright © 2010, Elsevier Inc. All rights Reserved
mpiexec -n 1 ./mpi_hello
mpiexec -n 4 ./mpi_hello
Greetings from process 0 of 1 !
Greetings from process 0 of 4 !
Greetings from process 1 of 4 !
Greetings from process 2 of 4 !
Greetings from process 3 of 4 !
8
MPI Programs
 Need to add mpi.h header file
 Identifiers defined by MPI start with “MPI_”
 1st letter following underscore is uppercase
 For function names & MPI-defined types
 Helps to avoid confusion
Copyright © 2010, Elsevier Inc. All rights Reserved
9
6 Golden MPI Functions
Copyright © 2010, Elsevier Inc. All rights Reserved
10
MPI Components
 MPI_Init
 Tells MPI to do all necessary setup
 e.g., allocate storage for message buffers, decide rank of a
process
 argc_p & argv_p are pointers to argc & argv
arguments in main( )
 Function returns error codes
Copyright © 2010, Elsevier Inc. All rights Reserved
11
 MPI_Finalize
 Tells MPI we’re done, so clean up anything allocated
for this program
MPI Components (Cont.)
Copyright © 2010, Elsevier Inc. All rights Reserved
12
Communicators
 Collection of processes that can send messages
to each other
 Messages from others communicators are ignored
 MPI_Init defines a communicator that consists of
all processes created when the program is
started
 Called MPI_COMM_WORLD
Copyright © 2010, Elsevier Inc. All rights Reserved
13
Communicators (Cont.)
Copyright © 2010, Elsevier Inc. All rights Reserved
My rank
(process making this call)
No of processes in the communicator
14
Single-Program Multiple-Data (SPMD)
 We compile 1 program
 Process 0 does something different
 Receives messages & prints them while the
other processes do the work
 if-else construct makes our program
SPMD
 We can run this program on any no of
processors
 e.g., 4, 8, 32, 1000, …
Copyright © 2010, Elsevier Inc. All rights Reserved
15
Communication
 msg_buf_p, msg_size, msg_type
 Determines content of message
 dest – destination processor’s rank
 tag – use to distinguish messages that are identical in
content
Copyright © 2010, Elsevier Inc. All rights Reserved
16
Data Types
Copyright © 2010, Elsevier Inc. All rights Reserved
17
Communication (Cont.)
Copyright © 2010, Elsevier Inc. All rights Reserved
MPI_ANY_SOURCE to receive messages (from any
source) in order for arrival
18
Message Matching
Copyright © 2010, Elsevier Inc. All rights Reserved
MPI_Send
src = q
MPI_Recv
dest = r
r
q
19
Receiving Messages
 Receiver can get a message without
knowing
 Amount of data in message
 Sender of message
 Tag of message
 How can those be found out?
Copyright © 2010, Elsevier Inc. All rights Reserved
20
How Much Data am I Receiving?
Copyright © 2010, Elsevier Inc. All rights Reserved
21
 Exact behavior is determined by MPI
implementation
 MPI_Send may behave differently with regard to
buffer size, cutoffs, & blocking
 Cutoff
 if message size < cutoff  buffer
 if message size ≥ cutoff  MPI_Send will block
 MPI_Recv always blocks until a matching
message is received
 Preserve message ordering from a sender
 Know your implementation
 Don’t make assumptions!
Issues With Send & Receive
Copyright © 2010, Elsevier Inc. All rights Reserved
22
Trapezoidal Rule
Copyright © 2010, Elsevier Inc. All rights Reserved
23
Trapezoidal Rule (Cont.)
Copyright © 2010, Elsevier Inc. All rights Reserved
24
Serial Pseudo-code
Copyright © 2010, Elsevier Inc. All rights Reserved
25
Parallel Pseudo-Code
Copyright © 2010, Elsevier Inc. All rights Reserved
26
Tasks & Communications for
Trapezoidal Rule
Copyright © 2010, Elsevier Inc. All rights Reserved
27
First Version
Copyright © 2010, Elsevier Inc. All rights Reserved
28
First Version (Cont.)
Copyright © 2010, Elsevier Inc. All rights Reserved
29
First Version (Cont.)
Copyright © 2010, Elsevier Inc. All rights Reserved
30
COLLECTIVE
COMMUNICATION
Copyright © 2010, Elsevier Inc. All rights Reserved
31
Collective Communication
Copyright © 2010, Elsevier Inc. All rights Reserved
A tree-structured global sum
32
Alternative Tree-Structured Global Sum
Copyright © 2010, Elsevier Inc. All rights Reserved
Which is most optimum?
Can we do better?
33
MPI_Reduce
Copyright © 2010, Elsevier Inc. All rights Reserved
34
Predefined Reduction Operators
Copyright © 2010, Elsevier Inc. All rights Reserved
35
Collective vs. Point-to-Point Communications
 All processes in the communicator must call the
same collective function
 e.g., a program that attempts to match a call to
MPI_Reduce on 1 process with a call to MPI_Recv on
another process is erroneous
 Program will hang or crash
 Arguments passed by each process to an MPI
collective communication must be “compatible”
 e.g., if 1 process passes in 0 as dest_process &
another passes in 1, then the outcome of a call to
MPI_Reduce is erroneous
 Program is likely to hang or crash
Copyright © 2010, Elsevier Inc. All rights Reserved
36
Collective vs. P-to-P Communications (Cont.)
 output_data_p argument is only used on
dest_process
 However, all of the processes still need to pass in an
actual argument corresponding to output_data_p,
even if it’s just NULL
 Point-to-point communications are matched on
the basis of tags & communicators
 Collective communications don’t use tags
 Matched solely on the basis of communicator & order
in which they’re called
Copyright © 2010, Elsevier Inc. All rights Reserved
37
MPI_Allreduce
 Useful when all processes need result of a global
sum to complete some larger computation
Copyright © 2010, Elsevier Inc. All rights Reserved
38
Copyright © 2010, Elsevier Inc. All rights Reserved
Global sum followed
by distribution of result
MPI_Allreduce (Cont.)
39
Butterfly-Structured Global Sum
Copyright © 2010, Elsevier Inc. All rights Reserved
Processes exchange partial results
40
Broadcast
 Data belonging to a single process is sent to all
of the processes in communicator
Copyright © 2010, Elsevier Inc. All rights Reserved
41
Tree-Structured Broadcast
Copyright © 2010, Elsevier Inc. All rights Reserved
42
Data Distributions – Compute a Vector Sum
Copyright © 2010, Elsevier Inc. All rights Reserved
Serial implementation
43
Partitioning Options
Copyright © 2010, Elsevier Inc. All rights Reserved
 Block partitioning
 Assign blocks of consecutive components to each process
 Cyclic partitioning
 Assign components in a round robin fashion
 Block-cyclic partitioning
 Use a cyclic distribution of blocks of components
44
Parallel Implementation
Copyright © 2010, Elsevier Inc. All rights Reserved
45
MPI_Scatter
 Can be used in a function that reads in an entire
vector on process 0 but only sends needed
components to other processes
Copyright © 2010, Elsevier Inc. All rights Reserved
46
Reading & Distributing a Vector
Copyright © 2010, Elsevier Inc. All rights Reserved
47
MPI_Gather
 Collect all components of a vector onto process 0
 Then process 0 can process all of components
Copyright © 2010, Elsevier Inc. All rights Reserved
48
MPI_Allgather
 Concatenates contents of each process’
send_buf_p & stores this in each process’
recv_buf_p
 recv_count is the amount of data being received from
each process
Copyright © 2010, Elsevier Inc. All rights Reserved
49
Summary
Copyright © 2010, Elsevier Inc. All rights Reserved
Source: https://computing.llnl.gov/tutorials/mpi/

More Related Content

What's hot

Inter Process Communication
Inter Process CommunicationInter Process Communication
Inter Process CommunicationAdeel Rasheed
 
3D Transformation
3D Transformation3D Transformation
3D TransformationSwatiHans10
 
3D Transformation in Computer Graphics
3D Transformation in Computer Graphics3D Transformation in Computer Graphics
3D Transformation in Computer Graphicssabbirantor
 
Remote Procedure Call in Distributed System
Remote Procedure Call in Distributed SystemRemote Procedure Call in Distributed System
Remote Procedure Call in Distributed SystemPoojaBele1
 
Computer graphics iv unit
Computer graphics iv unitComputer graphics iv unit
Computer graphics iv unitaravindangc
 
Line drawing algo.
Line drawing algo.Line drawing algo.
Line drawing algo.Mohd Arif
 
Random scan displays and raster scan displays
Random scan displays and raster scan displaysRandom scan displays and raster scan displays
Random scan displays and raster scan displaysSomya Bagai
 
Chapter 3 Output Primitives
Chapter 3 Output PrimitivesChapter 3 Output Primitives
Chapter 3 Output PrimitivesPrathimaBaliga
 
Processes and Processors in Distributed Systems
Processes and Processors in Distributed SystemsProcesses and Processors in Distributed Systems
Processes and Processors in Distributed SystemsDr Sandeep Kumar Poonia
 
Raster Scan and Raster Scan Displays
Raster Scan and Raster Scan DisplaysRaster Scan and Raster Scan Displays
Raster Scan and Raster Scan DisplaysSaravana Priya
 
Multiprogramming&timesharing
Multiprogramming&timesharingMultiprogramming&timesharing
Multiprogramming&timesharingTanuj Tyagi
 
Computer Graphics - Bresenham's line drawing algorithm & Mid Point Circle alg...
Computer Graphics - Bresenham's line drawing algorithm & Mid Point Circle alg...Computer Graphics - Bresenham's line drawing algorithm & Mid Point Circle alg...
Computer Graphics - Bresenham's line drawing algorithm & Mid Point Circle alg...Saikrishna Tanguturu
 
Distributed operating system(os)
Distributed operating system(os)Distributed operating system(os)
Distributed operating system(os)Dinesh Modak
 
Multimedia synchronization
Multimedia synchronizationMultimedia synchronization
Multimedia synchronizationI World Tech
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memoryAshish Kumar
 
Computer graphics - bresenham line drawing algorithm
Computer graphics - bresenham line drawing algorithmComputer graphics - bresenham line drawing algorithm
Computer graphics - bresenham line drawing algorithmRuchi Maurya
 

What's hot (20)

Inter Process Communication
Inter Process CommunicationInter Process Communication
Inter Process Communication
 
3D Transformation
3D Transformation3D Transformation
3D Transformation
 
3D Transformation in Computer Graphics
3D Transformation in Computer Graphics3D Transformation in Computer Graphics
3D Transformation in Computer Graphics
 
Remote Procedure Call in Distributed System
Remote Procedure Call in Distributed SystemRemote Procedure Call in Distributed System
Remote Procedure Call in Distributed System
 
Distributed Operating System_1
Distributed Operating System_1Distributed Operating System_1
Distributed Operating System_1
 
BRESENHAM’S LINE DRAWING ALGORITHM
BRESENHAM’S  LINE DRAWING ALGORITHMBRESENHAM’S  LINE DRAWING ALGORITHM
BRESENHAM’S LINE DRAWING ALGORITHM
 
Computer graphics iv unit
Computer graphics iv unitComputer graphics iv unit
Computer graphics iv unit
 
JDBC Driver Types
JDBC Driver TypesJDBC Driver Types
JDBC Driver Types
 
Line drawing algo.
Line drawing algo.Line drawing algo.
Line drawing algo.
 
Random scan displays and raster scan displays
Random scan displays and raster scan displaysRandom scan displays and raster scan displays
Random scan displays and raster scan displays
 
Chapter 3 Output Primitives
Chapter 3 Output PrimitivesChapter 3 Output Primitives
Chapter 3 Output Primitives
 
Processes and Processors in Distributed Systems
Processes and Processors in Distributed SystemsProcesses and Processors in Distributed Systems
Processes and Processors in Distributed Systems
 
Raster Scan and Raster Scan Displays
Raster Scan and Raster Scan DisplaysRaster Scan and Raster Scan Displays
Raster Scan and Raster Scan Displays
 
Multiprogramming&timesharing
Multiprogramming&timesharingMultiprogramming&timesharing
Multiprogramming&timesharing
 
Depth Buffer Method
Depth Buffer MethodDepth Buffer Method
Depth Buffer Method
 
Computer Graphics - Bresenham's line drawing algorithm & Mid Point Circle alg...
Computer Graphics - Bresenham's line drawing algorithm & Mid Point Circle alg...Computer Graphics - Bresenham's line drawing algorithm & Mid Point Circle alg...
Computer Graphics - Bresenham's line drawing algorithm & Mid Point Circle alg...
 
Distributed operating system(os)
Distributed operating system(os)Distributed operating system(os)
Distributed operating system(os)
 
Multimedia synchronization
Multimedia synchronizationMultimedia synchronization
Multimedia synchronization
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
 
Computer graphics - bresenham line drawing algorithm
Computer graphics - bresenham line drawing algorithmComputer graphics - bresenham line drawing algorithm
Computer graphics - bresenham line drawing algorithm
 

Similar to Distributed Memory Programming with MPI

What is [Open] MPI?
What is [Open] MPI?What is [Open] MPI?
What is [Open] MPI?Jeff Squyres
 
High Performance Computing using MPI
High Performance Computing using MPIHigh Performance Computing using MPI
High Performance Computing using MPIAnkit Mahato
 
Safetty systems intro_embedded_c
Safetty systems intro_embedded_cSafetty systems intro_embedded_c
Safetty systems intro_embedded_cMaria Cida Rosa
 
6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptxSimRelokasi2
 
MPI message passing interface
MPI message passing interfaceMPI message passing interface
MPI message passing interfaceMohit Raghuvanshi
 
Advanced Scalable Decomposition Method with MPICH Environment for HPC
Advanced Scalable Decomposition Method with MPICH Environment for HPCAdvanced Scalable Decomposition Method with MPICH Environment for HPC
Advanced Scalable Decomposition Method with MPICH Environment for HPCIJSRD
 
Introduction to MPI
Introduction to MPI Introduction to MPI
Introduction to MPI Hanif Durad
 
Programming using MPI and OpenMP
Programming using MPI and OpenMPProgramming using MPI and OpenMP
Programming using MPI and OpenMPDivya Tiwari
 
Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2Marcirio Chaves
 
cs556-2nd-tutorial.pdf
cs556-2nd-tutorial.pdfcs556-2nd-tutorial.pdf
cs556-2nd-tutorial.pdfssuserada6a9
 
Rgk cluster computing project
Rgk cluster computing projectRgk cluster computing project
Rgk cluster computing projectOstopD
 
Intro to MPI
Intro to MPIIntro to MPI
Intro to MPIjbp4444
 
Pivotal Cloud Foundry 2.3: A First Look
Pivotal Cloud Foundry 2.3: A First LookPivotal Cloud Foundry 2.3: A First Look
Pivotal Cloud Foundry 2.3: A First LookVMware Tanzu
 

Similar to Distributed Memory Programming with MPI (20)

MPI - 2
MPI - 2MPI - 2
MPI - 2
 
My ppt hpc u4
My ppt hpc u4My ppt hpc u4
My ppt hpc u4
 
MPI - 3
MPI - 3MPI - 3
MPI - 3
 
What is [Open] MPI?
What is [Open] MPI?What is [Open] MPI?
What is [Open] MPI?
 
MPI - 1
MPI - 1MPI - 1
MPI - 1
 
High Performance Computing using MPI
High Performance Computing using MPIHigh Performance Computing using MPI
High Performance Computing using MPI
 
MPI - 4
MPI - 4MPI - 4
MPI - 4
 
Safetty systems intro_embedded_c
Safetty systems intro_embedded_cSafetty systems intro_embedded_c
Safetty systems intro_embedded_c
 
6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx
 
MPI message passing interface
MPI message passing interfaceMPI message passing interface
MPI message passing interface
 
Advanced Scalable Decomposition Method with MPICH Environment for HPC
Advanced Scalable Decomposition Method with MPICH Environment for HPCAdvanced Scalable Decomposition Method with MPICH Environment for HPC
Advanced Scalable Decomposition Method with MPICH Environment for HPC
 
Introduction to MPI
Introduction to MPI Introduction to MPI
Introduction to MPI
 
Chapter 1.ppt
Chapter 1.pptChapter 1.ppt
Chapter 1.ppt
 
Programming using MPI and OpenMP
Programming using MPI and OpenMPProgramming using MPI and OpenMP
Programming using MPI and OpenMP
 
Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2
 
Inter Process Communication
Inter Process CommunicationInter Process Communication
Inter Process Communication
 
cs556-2nd-tutorial.pdf
cs556-2nd-tutorial.pdfcs556-2nd-tutorial.pdf
cs556-2nd-tutorial.pdf
 
Rgk cluster computing project
Rgk cluster computing projectRgk cluster computing project
Rgk cluster computing project
 
Intro to MPI
Intro to MPIIntro to MPI
Intro to MPI
 
Pivotal Cloud Foundry 2.3: A First Look
Pivotal Cloud Foundry 2.3: A First LookPivotal Cloud Foundry 2.3: A First Look
Pivotal Cloud Foundry 2.3: A First Look
 

More from Dilum Bandara

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningDilum Bandara
 
Time Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeTime Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeDilum Bandara
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCADilum Bandara
 
Introduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsIntroduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsDilum Bandara
 
Introduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresIntroduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresDilum Bandara
 
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixHard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixDilum Bandara
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopDilum Bandara
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsDilum Bandara
 
Introduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersIntroduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersDilum Bandara
 
Introduction to Thread Level Parallelism
Introduction to Thread Level ParallelismIntroduction to Thread Level Parallelism
Introduction to Thread Level ParallelismDilum Bandara
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesDilum Bandara
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsDilum Bandara
 
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesInstruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesDilum Bandara
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesDilum Bandara
 
CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionDilum Bandara
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
High Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPHigh Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPDilum Bandara
 
Introduction to Content Delivery Networks
Introduction to Content Delivery NetworksIntroduction to Content Delivery Networks
Introduction to Content Delivery NetworksDilum Bandara
 
Peer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingPeer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingDilum Bandara
 

More from Dilum Bandara (20)

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Time Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeTime Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in Practice
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCA
 
Introduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsIntroduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive Analytics
 
Introduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresIntroduction to Concurrent Data Structures
Introduction to Concurrent Data Structures
 
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixHard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel Problems
 
Introduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersIntroduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale Computers
 
Introduction to Thread Level Parallelism
Introduction to Thread Level ParallelismIntroduction to Thread Level Parallelism
Introduction to Thread Level Parallelism
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching Techniques
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in Microprocessors
 
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesInstruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware Techniques
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler Techniques
 
CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An Introduction
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
High Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPHigh Performance Networking with Advanced TCP
High Performance Networking with Advanced TCP
 
Introduction to Content Delivery Networks
Introduction to Content Delivery NetworksIntroduction to Content Delivery Networks
Introduction to Content Delivery Networks
 
Peer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingPeer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and Streaming
 
Mobile Services
Mobile ServicesMobile Services
Mobile Services
 

Recently uploaded

A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 

Recently uploaded (20)

A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 

Distributed Memory Programming with MPI

  • 1. 1 Distributed Memory Programming with MPI Slides extended from An Introduction to Parallel Programming by Peter Pacheco Dilum Bandara Dilum.Bandara@uom.lk
  • 2. 2 Distributed Memory Systems  We discuss about developing programs for these systems using MPI  MPI – Message Passing Interface  Are set of libraries that can be called from C, C++, & Fortran Copyright © 2010, Elsevier Inc. All rights Reserved
  • 3. 3 Why MPI?  Standardized & portable message-passing system  One of the oldest libraries  Wide-spread adoption  Minimal requirements on underlying hardware  Explicit parallelization  Achieves high performance  Scales to a large no of processors  Intellectually demanding Copyright © 2010, Elsevier Inc. All rights Reserved
  • 4. 4 Our First MPI Program Copyright © 2010, Elsevier Inc. All rights Reserved
  • 5. 5 Compilation Copyright © 2010, Elsevier Inc. All rights Reserved mpicc -g -Wall -o mpi_hello mpi_hello.c wrapper script to compile turns on all warnings source file create this executable file name (as opposed to default a.out) produce debugging information
  • 6. 6 Execution Copyright © 2010, Elsevier Inc. All rights Reserved mpiexec -n <no of processes> <executable> mpiexec -n 1 ./mpi_hello mpiexec -n 4 ./mpi_hello run with 1 process run with 4 processes
  • 7. 7 Execution Copyright © 2010, Elsevier Inc. All rights Reserved mpiexec -n 1 ./mpi_hello mpiexec -n 4 ./mpi_hello Greetings from process 0 of 1 ! Greetings from process 0 of 4 ! Greetings from process 1 of 4 ! Greetings from process 2 of 4 ! Greetings from process 3 of 4 !
  • 8. 8 MPI Programs  Need to add mpi.h header file  Identifiers defined by MPI start with “MPI_”  1st letter following underscore is uppercase  For function names & MPI-defined types  Helps to avoid confusion Copyright © 2010, Elsevier Inc. All rights Reserved
  • 9. 9 6 Golden MPI Functions Copyright © 2010, Elsevier Inc. All rights Reserved
  • 10. 10 MPI Components  MPI_Init  Tells MPI to do all necessary setup  e.g., allocate storage for message buffers, decide rank of a process  argc_p & argv_p are pointers to argc & argv arguments in main( )  Function returns error codes Copyright © 2010, Elsevier Inc. All rights Reserved
  • 11. 11  MPI_Finalize  Tells MPI we’re done, so clean up anything allocated for this program MPI Components (Cont.) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 12. 12 Communicators  Collection of processes that can send messages to each other  Messages from others communicators are ignored  MPI_Init defines a communicator that consists of all processes created when the program is started  Called MPI_COMM_WORLD Copyright © 2010, Elsevier Inc. All rights Reserved
  • 13. 13 Communicators (Cont.) Copyright © 2010, Elsevier Inc. All rights Reserved My rank (process making this call) No of processes in the communicator
  • 14. 14 Single-Program Multiple-Data (SPMD)  We compile 1 program  Process 0 does something different  Receives messages & prints them while the other processes do the work  if-else construct makes our program SPMD  We can run this program on any no of processors  e.g., 4, 8, 32, 1000, … Copyright © 2010, Elsevier Inc. All rights Reserved
  • 15. 15 Communication  msg_buf_p, msg_size, msg_type  Determines content of message  dest – destination processor’s rank  tag – use to distinguish messages that are identical in content Copyright © 2010, Elsevier Inc. All rights Reserved
  • 16. 16 Data Types Copyright © 2010, Elsevier Inc. All rights Reserved
  • 17. 17 Communication (Cont.) Copyright © 2010, Elsevier Inc. All rights Reserved MPI_ANY_SOURCE to receive messages (from any source) in order for arrival
  • 18. 18 Message Matching Copyright © 2010, Elsevier Inc. All rights Reserved MPI_Send src = q MPI_Recv dest = r r q
  • 19. 19 Receiving Messages  Receiver can get a message without knowing  Amount of data in message  Sender of message  Tag of message  How can those be found out? Copyright © 2010, Elsevier Inc. All rights Reserved
  • 20. 20 How Much Data am I Receiving? Copyright © 2010, Elsevier Inc. All rights Reserved
  • 21. 21  Exact behavior is determined by MPI implementation  MPI_Send may behave differently with regard to buffer size, cutoffs, & blocking  Cutoff  if message size < cutoff  buffer  if message size ≥ cutoff  MPI_Send will block  MPI_Recv always blocks until a matching message is received  Preserve message ordering from a sender  Know your implementation  Don’t make assumptions! Issues With Send & Receive Copyright © 2010, Elsevier Inc. All rights Reserved
  • 22. 22 Trapezoidal Rule Copyright © 2010, Elsevier Inc. All rights Reserved
  • 23. 23 Trapezoidal Rule (Cont.) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 24. 24 Serial Pseudo-code Copyright © 2010, Elsevier Inc. All rights Reserved
  • 25. 25 Parallel Pseudo-Code Copyright © 2010, Elsevier Inc. All rights Reserved
  • 26. 26 Tasks & Communications for Trapezoidal Rule Copyright © 2010, Elsevier Inc. All rights Reserved
  • 27. 27 First Version Copyright © 2010, Elsevier Inc. All rights Reserved
  • 28. 28 First Version (Cont.) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 29. 29 First Version (Cont.) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 30. 30 COLLECTIVE COMMUNICATION Copyright © 2010, Elsevier Inc. All rights Reserved
  • 31. 31 Collective Communication Copyright © 2010, Elsevier Inc. All rights Reserved A tree-structured global sum
  • 32. 32 Alternative Tree-Structured Global Sum Copyright © 2010, Elsevier Inc. All rights Reserved Which is most optimum? Can we do better?
  • 33. 33 MPI_Reduce Copyright © 2010, Elsevier Inc. All rights Reserved
  • 34. 34 Predefined Reduction Operators Copyright © 2010, Elsevier Inc. All rights Reserved
  • 35. 35 Collective vs. Point-to-Point Communications  All processes in the communicator must call the same collective function  e.g., a program that attempts to match a call to MPI_Reduce on 1 process with a call to MPI_Recv on another process is erroneous  Program will hang or crash  Arguments passed by each process to an MPI collective communication must be “compatible”  e.g., if 1 process passes in 0 as dest_process & another passes in 1, then the outcome of a call to MPI_Reduce is erroneous  Program is likely to hang or crash Copyright © 2010, Elsevier Inc. All rights Reserved
  • 36. 36 Collective vs. P-to-P Communications (Cont.)  output_data_p argument is only used on dest_process  However, all of the processes still need to pass in an actual argument corresponding to output_data_p, even if it’s just NULL  Point-to-point communications are matched on the basis of tags & communicators  Collective communications don’t use tags  Matched solely on the basis of communicator & order in which they’re called Copyright © 2010, Elsevier Inc. All rights Reserved
  • 37. 37 MPI_Allreduce  Useful when all processes need result of a global sum to complete some larger computation Copyright © 2010, Elsevier Inc. All rights Reserved
  • 38. 38 Copyright © 2010, Elsevier Inc. All rights Reserved Global sum followed by distribution of result MPI_Allreduce (Cont.)
  • 39. 39 Butterfly-Structured Global Sum Copyright © 2010, Elsevier Inc. All rights Reserved Processes exchange partial results
  • 40. 40 Broadcast  Data belonging to a single process is sent to all of the processes in communicator Copyright © 2010, Elsevier Inc. All rights Reserved
  • 41. 41 Tree-Structured Broadcast Copyright © 2010, Elsevier Inc. All rights Reserved
  • 42. 42 Data Distributions – Compute a Vector Sum Copyright © 2010, Elsevier Inc. All rights Reserved Serial implementation
  • 43. 43 Partitioning Options Copyright © 2010, Elsevier Inc. All rights Reserved  Block partitioning  Assign blocks of consecutive components to each process  Cyclic partitioning  Assign components in a round robin fashion  Block-cyclic partitioning  Use a cyclic distribution of blocks of components
  • 44. 44 Parallel Implementation Copyright © 2010, Elsevier Inc. All rights Reserved
  • 45. 45 MPI_Scatter  Can be used in a function that reads in an entire vector on process 0 but only sends needed components to other processes Copyright © 2010, Elsevier Inc. All rights Reserved
  • 46. 46 Reading & Distributing a Vector Copyright © 2010, Elsevier Inc. All rights Reserved
  • 47. 47 MPI_Gather  Collect all components of a vector onto process 0  Then process 0 can process all of components Copyright © 2010, Elsevier Inc. All rights Reserved
  • 48. 48 MPI_Allgather  Concatenates contents of each process’ send_buf_p & stores this in each process’ recv_buf_p  recv_count is the amount of data being received from each process Copyright © 2010, Elsevier Inc. All rights Reserved
  • 49. 49 Summary Copyright © 2010, Elsevier Inc. All rights Reserved Source: https://computing.llnl.gov/tutorials/mpi/

Editor's Notes

  1. 8 January 2024
  2. Began in Supercomputing ’92
  3. Tag – 2 messages – content in 1 should be printed while other should be used for calculation
  4. sendbuf address of send buffer (choice) count no of elements in send buffer (integer) datatype data type of elements of send buffer (handle) op reduce operation (handle) root rank of root process (integer) comm communicator (handle)