SlideShare a Scribd company logo
1 of 40
MapReduce
Programming
Model
OUTLINE
Motivation
Sales exemples
words count exemple
1.wordcount in Hadoop
using python
2.Arraysum Demo using
Java
MapReduce daemons
in Hadoop
Big Data
Map
Reduce
INTRODUCTION
Parallel Processing
01 02
04
Task tracker
Job tracker
MapReduce
03
Demo code
05
2
Summary
Conclusion
06
INTRODUCTION
01
3
Parallel
Processing
4
Task is broken up to multiple parts with a software tool
and each part is distributed to a processor, then each
processor will perform the assigned part.
5
Finally, the parts are reassembled to deliver
the final solution or execute the task.
Reminder !
6
Multiprocessing
Parallel Processing is not
Motivation
7
8
NO !
9
What is the proposed
solution to deal with
?
Motivation
10
• Motivations
● Large-scale data processing on clusters
● Massively parallel (hundreds or thousands of CPUs)
● Reliable execution with easy data access
• Functions
● Fault-tolerance
● Status and monitoring tools
● A clean abstraction for programmers
Inspired by LISP
Function
Programming
Map
11
Reduce
12
Lisp map function
● Input parameters: a function and a set of values
● This function is applied to each of the values.
Lisp reduce function
● given a binary function and a set of values.
● It combines all the values together using the
binary function.
(map ‘length ‘(() (a) (ab) (abc)))
(length(()) length(a) length(ab)
length(abc))
(0 1 2 3)
use the + (add) function to reduce the
list
(reduce #'+ '(0 1 2 3))
6
Example
MapReduce
02
13
14
Instead of browsing the file sequentially, it is divided into chunks that are browsed in
parallel.
Example 1 :
Principal
15
Calculate the total sales for the current year ?
Solution
16
+
++
Instead of having one person
cover the whole book
we hire several !
A first group is called mappers
the second is called reducers
Divide the book in several parts
and give one to each mapper .
17
18
(key , value)
(key , values)
Intermediate registration
Results
shuffle & sort
The Famous
words count
example
02
19
20
Example 2 :
More Details
21
Input/output specification of the WC mapreduce job
Input : a set of (key values) stored in files
key: document ID
value: a list of words as content of each document
Output: a set of (key values) stored in files
key: wordID
value: word frequency appeared in all documents
MapReduce function specification:
map(String input_key, String input_value):
reduce(String output_key, Iterator intermediate_values):
22
Pseudo-code
23
MapReduce
Daemons in
Hadoop
03
24
25
“MapReduce has been implemented in many
programming languages and frameworks, such
as Apache Hadoop, Pig, Hive, etc. “
26
Divides the work on mappers
and reducers
runs on each node to execute
the real mapreduce tasks
Brief introduction for later use
mapReduce daemons
Demo Code
1
05
27
Sum array elements using mapReduce
28
Map: Split the array of 1000
elements into 10 small data
chunks (each chunk will have 100
elements)
Each chunk will be processed by a
separate thread concurrently.
We will have 10 threads and each
thread will iterate 100 elements to
produce the sum of those 100
elements.
Reducer: takes the output of
these 10 threads and will be
summed again to produce the
final output.
Sum array elements using mapReduce with java
29
Project structure Main
Call map task and Reduce Task to
perform mapReduce fn
Environnement
30
create thread pool of 10
save each task
of each chunk
in queue
split array of 1k into
chunks each of
100
save map result
of each chunk
into mapOutput
31
getoutput of map and
aggregate results
For each element
in mapOut(
the result from
previous map)
source code link : https://github.com/HabibaAbderrahim/thread_mapReduce
Demo Code
2
32
Words count using Hadoop framework
33
Environnement
Pseudo Distributed environment
PS : This is a pseudo environment that simulate a fully distributed environment since
we have one server / one pc
java should be installed
create hadoop
sudo user
install hadoop
for the official
website
check hadoop
is installed
version : 3.2.1
34
Environnement
Pseudo Distributed environment
Files configuration
version : 3.2.1
java home and hadoop home
add java path
HDFS : hadoop file system
HDFS configuration : namenode/datanode/replication
mapReduce configuration
mapReduce runs on Yarn
Verify Hadoop daemons
35
We decided to work with python
just to test hadoop
streaming Features
Environnement
version : 3.2.1
version : 3.5.1
word count using mapReduce in Hadoop with python
36
Environnement
version : 3.2.1
version : 3.5.1
Mapper
Reducer
37
Environnement
version : 3.2.1
version : 3.5.1
see what is inside our file
data.txt
Words count in
data.txt
MapReduce
sort results alphabetic
Conclusion
06
38
The ideas, concepts and diagrams are taken from the following websites:
● http://www.metz.supelec.fr/metz/personnel/vialle/course/BigData-2A-CS/poly-
pdf/Poly-chap6.pdf
● https://sites.cs.ucsb.edu/~tyang/class/240a17/slides/CS240TopicMapReduce.
pdf
● https://fr.slideshare.net/LiliaSfaxi/bigdatachp2-hadoop-mapreduce
● https://algodaily.com/lessons/what-is-mapreduce-and-how-does-it-work
[References]
39
Thanks!
Do you have any questions?
40

More Related Content

Similar to mapReduce.pptx

Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Mohamed Ali Mahmoud khouder
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Xiao Qin
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
Pallav Jha
 
Map reduce
Map reduceMap reduce
Map reduce
xydii
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
coolmirza143
 

Similar to mapReduce.pptx (20)

Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticians
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
Hadoop
HadoopHadoop
Hadoop
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Unit 2
Unit 2Unit 2
Unit 2
 
Map reduce
Map reduceMap reduce
Map reduce
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
 

Recently uploaded

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
rknatarajan
 

Recently uploaded (20)

Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 

mapReduce.pptx

  • 2. OUTLINE Motivation Sales exemples words count exemple 1.wordcount in Hadoop using python 2.Arraysum Demo using Java MapReduce daemons in Hadoop Big Data Map Reduce INTRODUCTION Parallel Processing 01 02 04 Task tracker Job tracker MapReduce 03 Demo code 05 2 Summary Conclusion 06
  • 5. Task is broken up to multiple parts with a software tool and each part is distributed to a processor, then each processor will perform the assigned part. 5 Finally, the parts are reassembled to deliver the final solution or execute the task.
  • 9. 9 What is the proposed solution to deal with ?
  • 10. Motivation 10 • Motivations ● Large-scale data processing on clusters ● Massively parallel (hundreds or thousands of CPUs) ● Reliable execution with easy data access • Functions ● Fault-tolerance ● Status and monitoring tools ● A clean abstraction for programmers
  • 12. 12 Lisp map function ● Input parameters: a function and a set of values ● This function is applied to each of the values. Lisp reduce function ● given a binary function and a set of values. ● It combines all the values together using the binary function. (map ‘length ‘(() (a) (ab) (abc))) (length(()) length(a) length(ab) length(abc)) (0 1 2 3) use the + (add) function to reduce the list (reduce #'+ '(0 1 2 3)) 6 Example
  • 14. 14 Instead of browsing the file sequentially, it is divided into chunks that are browsed in parallel. Example 1 : Principal
  • 15. 15 Calculate the total sales for the current year ? Solution
  • 16. 16 + ++ Instead of having one person cover the whole book we hire several ! A first group is called mappers the second is called reducers Divide the book in several parts and give one to each mapper .
  • 17. 17
  • 18. 18 (key , value) (key , values) Intermediate registration Results shuffle & sort
  • 21. 21 Input/output specification of the WC mapreduce job Input : a set of (key values) stored in files key: document ID value: a list of words as content of each document Output: a set of (key values) stored in files key: wordID value: word frequency appeared in all documents MapReduce function specification: map(String input_key, String input_value): reduce(String output_key, Iterator intermediate_values):
  • 23. 23
  • 25. 25 “MapReduce has been implemented in many programming languages and frameworks, such as Apache Hadoop, Pig, Hive, etc. “
  • 26. 26 Divides the work on mappers and reducers runs on each node to execute the real mapreduce tasks Brief introduction for later use mapReduce daemons
  • 27. Demo Code 1 05 27 Sum array elements using mapReduce
  • 28. 28 Map: Split the array of 1000 elements into 10 small data chunks (each chunk will have 100 elements) Each chunk will be processed by a separate thread concurrently. We will have 10 threads and each thread will iterate 100 elements to produce the sum of those 100 elements. Reducer: takes the output of these 10 threads and will be summed again to produce the final output. Sum array elements using mapReduce with java
  • 29. 29 Project structure Main Call map task and Reduce Task to perform mapReduce fn Environnement
  • 30. 30 create thread pool of 10 save each task of each chunk in queue split array of 1k into chunks each of 100 save map result of each chunk into mapOutput
  • 31. 31 getoutput of map and aggregate results For each element in mapOut( the result from previous map) source code link : https://github.com/HabibaAbderrahim/thread_mapReduce
  • 32. Demo Code 2 32 Words count using Hadoop framework
  • 33. 33 Environnement Pseudo Distributed environment PS : This is a pseudo environment that simulate a fully distributed environment since we have one server / one pc java should be installed create hadoop sudo user install hadoop for the official website check hadoop is installed version : 3.2.1
  • 34. 34 Environnement Pseudo Distributed environment Files configuration version : 3.2.1 java home and hadoop home add java path HDFS : hadoop file system HDFS configuration : namenode/datanode/replication mapReduce configuration mapReduce runs on Yarn Verify Hadoop daemons
  • 35. 35 We decided to work with python just to test hadoop streaming Features Environnement version : 3.2.1 version : 3.5.1 word count using mapReduce in Hadoop with python
  • 37. 37 Environnement version : 3.2.1 version : 3.5.1 see what is inside our file data.txt Words count in data.txt MapReduce sort results alphabetic
  • 39. The ideas, concepts and diagrams are taken from the following websites: ● http://www.metz.supelec.fr/metz/personnel/vialle/course/BigData-2A-CS/poly- pdf/Poly-chap6.pdf ● https://sites.cs.ucsb.edu/~tyang/class/240a17/slides/CS240TopicMapReduce. pdf ● https://fr.slideshare.net/LiliaSfaxi/bigdatachp2-hadoop-mapreduce ● https://algodaily.com/lessons/what-is-mapreduce-and-how-does-it-work [References] 39
  • 40. Thanks! Do you have any questions? 40

Editor's Notes

  1. should not be confused with Multiprocessing in where multiple processors or cores are working on solving different tasks, instead of parts of the same task as in parallel processing.
  2. before driven into detail , take a moment and ask yourself what does
  3. » Functional programming meets distributed computing » A batch data processing system
  4. Traditional approach In this approach we will iterate each element in an array and will add it to produce final sum.