SlideShare a Scribd company logo
1 of 21
Embarrassingly Parallel
Problems
CS5225 Parallel and Concurrent Programming
Dilum Bandara
Dilum.Bandara@uom.lk
Some slides adapted from Dr. Srinath Perera
Embarrassingly Parallel Problems
 A.k.a. Delightfully Parallel Problems
 Can be easily parallelizable
 Usually use simple communication patterns
 Usually work without much communication
among each other
 Map-Reduce programming model provides a
powerful abstraction to handle embarrassingly
parallel problems
2
Map-Reduce
 Common pattern to solve parallel problems
 Based on 2 constructs from functional programming,
map & reduce
 Introduced by Google
 Dean et. al., “MapReduce: Simplified Data Processing
on Large Clusters,” OSDI, 2004
 Extensible for different applications
 Scale to very large number of nodes
 Hide details like failures from users
3
High-Order Functions
 Programming languages (e.g., Java) pass data
as parameters & results of functions
 Higher-order functions pass both data as well as
functions as parameters or results of functions
 E.g., Python, Ruby, JavaScript
 For example
def f(x):
return x + 3
def g(function, x):
return function(x) * function(x)
print g(f, 7) 4
Map-Reduce
 Accepts 2 functions as inputs
1. Map function
 Y fn1(X)
 Accepts input X & outputs another Y
2. Reduce function
 Z fn2(List<Y>)
 Accepts array of Y’s & returns another output Z
5
Map-Reduce (Contd.)
 Map-reduce support is provided by a function
like following
 Y map-reduce(mapfn, reducefn, List<X>)
 Map reduce implementation takes list of inputs
(list) & does following
 Apply map function to each entry in the list, which
emit (key, value) pairs
 Collect results, group them by keys, & then pass them
to reduce function as array
6
Map-Reduce (Contd.)
7
Source: www.datasciencecentral.com/profiles/blogs/practical-
illustration-of-map-reduce-hadoop-style-on-real-data
Map-Reduce for Word Counting
8
Source: http://xiaochongzhang.me/blog/?p=338
How to do this for a large dataset using a distributed system?
In Class Activity
1. Card sorting
2. Card sorting with 2 rounds
3. Identify missing cards
9
Inspired by Marcio Silva's “The MapReduce Card Game” at
http://blog.marciosilva.com/2012/10/the-mapreduce-card-game.html
Why Map-Reduce?
 Implementing same pattern in a distributed
system isn’t that easy
 Need to worry about communication, failures,
initialization, etc.
 MapReduce frameworks worry about all those
 You write map & reduce functions & call
framework
 It forces you to think parallel in design time
 It gives you a higher-level of abstraction to think in
 It’s very generic, & covers lot of usecases
 See http://wiki.apache.org/hadoop/PoweredBy
10
Map-Reduce Implementations
 Can be implemented in many ways
 In-memory implementation
 Distributed implementation
 Communication by messages
 Communication by file system
 Communication by databases
 Communication Requirements
 Need broadcast & reduce operations only
11
Map-Reduce with Hadoop
 Apache Hadoop is an implementation of Map-
reduce
 Handles all details about distributed execution
 You just have to give Map & Reduce functions
12
Map-Reduce Data Model
13
Source: http://slides.com/bearrito/pittsburgh-nosql-_-mapreduce#/
Map-Reduce Data Model (Cont.)
 Hadoop breaks input data into multiple data items by
new lines & runs map function once for each data item
 When executed, map function outputs (key, value) pairs
 Hadoop collects all (key, value) pairs generated by map
function, sorts them by the key, & groups values with the
same key together into groups
 For each distinct key, Hadoop runs reduce function once
while passing key & list of values for that key as input
 Reduce function outputs (key, value) pairs, & Hadoop
writes them to a file as final result
14
Execution on a Cluster/Cloud
15
Source: www.cbsolution.net/techniques/ontarget/mapreduce_vs_data_warehouse
MapReduce Execution
16
Source: Dean et. al.,
“MapReduce, OSDI, 2004
Designing Map-Reduce Applications
 You control task granularity by changing no of
map & reduce tasks
 How many map tasks?
 How many reduce tasks?
 Fine Grain  more parallelism  more
communication overhead and vise versa
 Usually frameworks handle load balancing &
failures
 If large number of maps are there, you need a
Combine Function as well
17
Examples
 Sorting
 How to sort an array of 1 million integers using
MapReduce?
 Inverted Index
 Normal index is a mapping from document to terms
 Inverted index is mapping from terms to documents
 If we have a million documents, how do we build a
inverted index using MapReduce?
 Frequency Distribution of Word Occurrences
 Count number of occurrences & build a histogram
18
Examples (Cont.)
 Stitch Imagery
 For Google maps, Google need to combine many
map data into a single set of data
 Business Intelligence
 A business want to create a graph of income
generated by each region & marketing money spend
on each region
19
Examples (Cont.)
 K-Means
 Assume you are given a list of earth quakes
coordinates happened in the world in last 50 years.
 You are asked to use K-Means Clustering algorithm
to find 10 locations around which those earth quakes
were located.
 K-Means starts with 10 random cluster locations.
 It proceeds iteratively, & at each iteration, it assigns each
data point (earth quake) to the closest cluster location
 At end of each iteration, it recalculates each cluster location
using mean of all data point coordinates assigned to that
location
 It stops when cluster locations doesn’t change after
recalculation 20
K-Means Algorithm
List kmeans(datapointsList , initialClustersList){
oldlocations = null;
newLocations = initialClustersList ;
while(oldlocations != newLocations){
for(d in datapointsList){
oldlocations = newLocations ;
newLocations = //recalculate locations
}
//assign d to closest location in newLocations
}
}
return newLocations ;
21

More Related Content

Similar to Embarrassingly/Delightfully Parallel Problems

Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine ParallelismSri Prasanna
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: OverviewGeoffrey Fox
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingcoolmirza143
 
Map reduce
Map reduceMap reduce
Map reducexydii
 
MAP REDUCE SLIDESHARE
MAP REDUCE SLIDESHAREMAP REDUCE SLIDESHARE
MAP REDUCE SLIDESHAREdharanis15
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poliivascucristian
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsRobert Grossman
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniquesijsrd.com
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Yahoo Developer Network
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabadsreehari orienit
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersAbhishek Singh
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentationNoha Elprince
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comsoftwarequery
 

Similar to Embarrassingly/Delightfully Parallel Problems (20)

Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: Overview
 
E031201032036
E031201032036E031201032036
E031201032036
 
Big data
Big dataBig data
Big data
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
 
Map reduce
Map reduceMap reduce
Map reduce
 
Mapreduce Osdi04
Mapreduce Osdi04Mapreduce Osdi04
Mapreduce Osdi04
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
MAP REDUCE SLIDESHARE
MAP REDUCE SLIDESHAREMAP REDUCE SLIDESHARE
MAP REDUCE SLIDESHARE
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 
Hadoop
HadoopHadoop
Hadoop
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
 

More from Dilum Bandara

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningDilum Bandara
 
Time Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeTime Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeDilum Bandara
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCADilum Bandara
 
Introduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsIntroduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsDilum Bandara
 
Introduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresIntroduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresDilum Bandara
 
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixHard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixDilum Bandara
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopDilum Bandara
 
Introduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersIntroduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersDilum Bandara
 
Introduction to Thread Level Parallelism
Introduction to Thread Level ParallelismIntroduction to Thread Level Parallelism
Introduction to Thread Level ParallelismDilum Bandara
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesDilum Bandara
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsDilum Bandara
 
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesInstruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesDilum Bandara
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesDilum Bandara
 
CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionDilum Bandara
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
High Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPHigh Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPDilum Bandara
 
Introduction to Content Delivery Networks
Introduction to Content Delivery NetworksIntroduction to Content Delivery Networks
Introduction to Content Delivery NetworksDilum Bandara
 
Peer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingPeer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingDilum Bandara
 
Wired Broadband Communication
Wired Broadband CommunicationWired Broadband Communication
Wired Broadband CommunicationDilum Bandara
 

More from Dilum Bandara (20)

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Time Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeTime Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in Practice
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCA
 
Introduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsIntroduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive Analytics
 
Introduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresIntroduction to Concurrent Data Structures
Introduction to Concurrent Data Structures
 
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixHard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
 
Introduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersIntroduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale Computers
 
Introduction to Thread Level Parallelism
Introduction to Thread Level ParallelismIntroduction to Thread Level Parallelism
Introduction to Thread Level Parallelism
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching Techniques
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in Microprocessors
 
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesInstruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware Techniques
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler Techniques
 
CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An Introduction
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
High Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPHigh Performance Networking with Advanced TCP
High Performance Networking with Advanced TCP
 
Introduction to Content Delivery Networks
Introduction to Content Delivery NetworksIntroduction to Content Delivery Networks
Introduction to Content Delivery Networks
 
Peer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingPeer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and Streaming
 
Mobile Services
Mobile ServicesMobile Services
Mobile Services
 
Wired Broadband Communication
Wired Broadband CommunicationWired Broadband Communication
Wired Broadband Communication
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Embarrassingly/Delightfully Parallel Problems

  • 1. Embarrassingly Parallel Problems CS5225 Parallel and Concurrent Programming Dilum Bandara Dilum.Bandara@uom.lk Some slides adapted from Dr. Srinath Perera
  • 2. Embarrassingly Parallel Problems  A.k.a. Delightfully Parallel Problems  Can be easily parallelizable  Usually use simple communication patterns  Usually work without much communication among each other  Map-Reduce programming model provides a powerful abstraction to handle embarrassingly parallel problems 2
  • 3. Map-Reduce  Common pattern to solve parallel problems  Based on 2 constructs from functional programming, map & reduce  Introduced by Google  Dean et. al., “MapReduce: Simplified Data Processing on Large Clusters,” OSDI, 2004  Extensible for different applications  Scale to very large number of nodes  Hide details like failures from users 3
  • 4. High-Order Functions  Programming languages (e.g., Java) pass data as parameters & results of functions  Higher-order functions pass both data as well as functions as parameters or results of functions  E.g., Python, Ruby, JavaScript  For example def f(x): return x + 3 def g(function, x): return function(x) * function(x) print g(f, 7) 4
  • 5. Map-Reduce  Accepts 2 functions as inputs 1. Map function  Y fn1(X)  Accepts input X & outputs another Y 2. Reduce function  Z fn2(List<Y>)  Accepts array of Y’s & returns another output Z 5
  • 6. Map-Reduce (Contd.)  Map-reduce support is provided by a function like following  Y map-reduce(mapfn, reducefn, List<X>)  Map reduce implementation takes list of inputs (list) & does following  Apply map function to each entry in the list, which emit (key, value) pairs  Collect results, group them by keys, & then pass them to reduce function as array 6
  • 8. Map-Reduce for Word Counting 8 Source: http://xiaochongzhang.me/blog/?p=338 How to do this for a large dataset using a distributed system?
  • 9. In Class Activity 1. Card sorting 2. Card sorting with 2 rounds 3. Identify missing cards 9 Inspired by Marcio Silva's “The MapReduce Card Game” at http://blog.marciosilva.com/2012/10/the-mapreduce-card-game.html
  • 10. Why Map-Reduce?  Implementing same pattern in a distributed system isn’t that easy  Need to worry about communication, failures, initialization, etc.  MapReduce frameworks worry about all those  You write map & reduce functions & call framework  It forces you to think parallel in design time  It gives you a higher-level of abstraction to think in  It’s very generic, & covers lot of usecases  See http://wiki.apache.org/hadoop/PoweredBy 10
  • 11. Map-Reduce Implementations  Can be implemented in many ways  In-memory implementation  Distributed implementation  Communication by messages  Communication by file system  Communication by databases  Communication Requirements  Need broadcast & reduce operations only 11
  • 12. Map-Reduce with Hadoop  Apache Hadoop is an implementation of Map- reduce  Handles all details about distributed execution  You just have to give Map & Reduce functions 12
  • 13. Map-Reduce Data Model 13 Source: http://slides.com/bearrito/pittsburgh-nosql-_-mapreduce#/
  • 14. Map-Reduce Data Model (Cont.)  Hadoop breaks input data into multiple data items by new lines & runs map function once for each data item  When executed, map function outputs (key, value) pairs  Hadoop collects all (key, value) pairs generated by map function, sorts them by the key, & groups values with the same key together into groups  For each distinct key, Hadoop runs reduce function once while passing key & list of values for that key as input  Reduce function outputs (key, value) pairs, & Hadoop writes them to a file as final result 14
  • 15. Execution on a Cluster/Cloud 15 Source: www.cbsolution.net/techniques/ontarget/mapreduce_vs_data_warehouse
  • 16. MapReduce Execution 16 Source: Dean et. al., “MapReduce, OSDI, 2004
  • 17. Designing Map-Reduce Applications  You control task granularity by changing no of map & reduce tasks  How many map tasks?  How many reduce tasks?  Fine Grain  more parallelism  more communication overhead and vise versa  Usually frameworks handle load balancing & failures  If large number of maps are there, you need a Combine Function as well 17
  • 18. Examples  Sorting  How to sort an array of 1 million integers using MapReduce?  Inverted Index  Normal index is a mapping from document to terms  Inverted index is mapping from terms to documents  If we have a million documents, how do we build a inverted index using MapReduce?  Frequency Distribution of Word Occurrences  Count number of occurrences & build a histogram 18
  • 19. Examples (Cont.)  Stitch Imagery  For Google maps, Google need to combine many map data into a single set of data  Business Intelligence  A business want to create a graph of income generated by each region & marketing money spend on each region 19
  • 20. Examples (Cont.)  K-Means  Assume you are given a list of earth quakes coordinates happened in the world in last 50 years.  You are asked to use K-Means Clustering algorithm to find 10 locations around which those earth quakes were located.  K-Means starts with 10 random cluster locations.  It proceeds iteratively, & at each iteration, it assigns each data point (earth quake) to the closest cluster location  At end of each iteration, it recalculates each cluster location using mean of all data point coordinates assigned to that location  It stops when cluster locations doesn’t change after recalculation 20
  • 21. K-Means Algorithm List kmeans(datapointsList , initialClustersList){ oldlocations = null; newLocations = initialClustersList ; while(oldlocations != newLocations){ for(d in datapointsList){ oldlocations = newLocations ; newLocations = //recalculate locations } //assign d to closest location in newLocations } } return newLocations ; 21

Editor's Notes

  1. Shovel example