SlideShare a Scribd company logo
1 of 19
MapReduce
Programming Model
Adarsha Dhakal
Jaydeep Shah
Prakash Upadhyaya
Ritu Ratnam 1
MapReduce Programming Model
2
Introduction
• MapReduce is a programming model introduced by Google for processing
and generating large data sets on clusters of computers.
• Google first formulated the framework for the purpose of serving Google’s
Web page indexing, and the new framework replaced earlier indexing
algorithms.
• Beginner developers find the MapReduce framework beneficial because
library routines can be used to create parallel programs without any worries
about infra-cluster communication, task monitoring or failure handling
processes.
• MapReduce runs on a large cluster of commodity machines and is highly
scalable.
• It has several forms of implementation provided by multiple programming
languages, like Java, C# and C++.
• MapReduce is a general-purpose programming model for data-
intensive computing.
• It was introduced by Google in 2004 to construct its web index.
• It is also used at Yahoo, Facebook etc. It uses a parallel computing
model that distributes computational tasks to large number of
nodes(approximately 1000-10000 nodes.)
• It is fault-tolerable. It can work even when 1600 nodes among 1800
nodes fails.
• Hadoop framework from Apache Software Foundation is an
implementation of MapReduce Programming Model
Phases for MapReduce
1. Input Splits
2. Mapping
3. Shuffling
4. Sorting
5. Reducing
Steps for MapReduce
• Step 1: Transform raw data into key/value pairs in parallel.
• The mapper will get the data file and make the Rating the key and
the values will be the reviews. We will add number 1 for reviews.
• Step 2: Shuffle and sort by the MapReduce model.
• The process of transferring mappers’ intermediate output to the
reducer is known as shuffling. It will collect all the reviews(number
1s) together with the individual key and it will sort them. it will get
sorted by key.
• Step3: Process the data using Reduce.
• Reduce will count each value(number 1) for each key.
• Although, the map and reduce functions in MapReduce model is not
exactly same as in functional programming.
• Map and Reduce functions in MapReduce model:
• Map: It process a (key, value) pair and returns a list of
(intermediate key, value) pairs
map(k1, v1)→list(k2, v2)
• Reduce: It merges all intermediate values having the same
intermediate key
reduce(k2, list(v2))→list(v3)
Basic Concept
• In MapReduce model, user has to write only two functions map and
reduce.
• Few examples that can be easily expressed as MapReduce
computations:
• Distributed Grep ( is an efficient way to utilize a Hadoop cluster to
find log messages hidden within terabytes of log data)
• Count of URL Access Frequency
• Inverted Index
• Mining
Advantages
• MapReduce facilitates automatic parallelization and distribution,
reducing the time required to run the programs
• MapReduce provides fault tolerance by re-executing, writing map
output to a distributed file system, and restarting failed map or reducer
task
• MapReduce is a cost-effective solution for processing of data
• MapReduce processes large volume of unprocessed data very quickly
• MapReduce utilizes simple programming model to handle tasks more
efficiently and quickly and is easy to learn
• MapReduce is flexible and works with several Hadoop languages to
handle and store data
Limitations
• MapReduce is a low-level programming model which involves a lot of
writing code
• The batch-based processing nature of MapReduce makes it unsuitable for
real-time processing
• It does not support data pipelining or overlapping of Map and Reduce
functions
• Task initialization, coordination, monitoring, and scheduling take up a large
chunk of MapReduce's execution time and reduce its performance
• MapReduce cannot cache the intermediate data in memory, thereby
diminishing Hadoop’s performance
The data we have has 20491 rows and 2 columns, and
our task is to provide individual count of ratings.
MAPPING each rating with a shuffle and giving counter of 1.
Later sorting the ratings with the count.
REDUCING leads to giving lesser number of data.
Each rating has their total count from the data from Review of Hotel
Implementing MapReduce Programming
Model
• Hadoop, developed by Apache
• Spark, developed by AMPLab at UC Berkley
• Phoenix++, developed at Stanford University
• MARISSA (MApReduce Implementation for Streaming Science Application,
developed at SUNY Binghamton
• DRYAD and DRYADLINQ, developed by Microsoft
• MapReduce-MPI, Developed by Steve Plimpton (Sandia)
• Disco, developed by NOKIA
• Themis, developed by Rasmussen et al
• MR4C, developed by Skybox Imaging
Bibliography
• MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and Sanjay Ghemawat Google, Inc.
• MapReduce Tutorial, https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html
• Hadoop – MapReduce, https://www.tutorialspoint.com/hadoop/hadoop_mapreduce.htm
• MapReduce-Implementation-in-Python, https://github.com/rshah204/MapReduce-Implementation-in-
Python/blob/master/MapReduce.ipynb
• Hotel Reviews, https://www.kaggle.com/datasets/yash10kundu/hotel-reviews?resource=download
• MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON, Zeba Khanam
and Shafali Agarwal, Department of Computer Application, JSSATE, Noida, IJCSIT Vol 7, No 4, August
2015

More Related Content

What's hot

Routing algorithm
Routing algorithmRouting algorithm
Routing algorithmBushra M
 
Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop Rajesh Ananda Kumar
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systemssumitjain2013
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platformsSyed Zaid Irshad
 
Software Project Management (monitoring and control)
Software Project Management (monitoring and control)Software Project Management (monitoring and control)
Software Project Management (monitoring and control)IsrarDewan
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemMahabubur Rahaman
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Unit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed SystemsUnit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed SystemsNandakumar P
 
Network layer - design Issues
Network layer - design IssuesNetwork layer - design Issues
Network layer - design Issuesقصي نسور
 
Register allocation and assignment
Register allocation and assignmentRegister allocation and assignment
Register allocation and assignmentKarthi Keyan
 
Connection Establishment & Flow and Congestion Control
Connection Establishment & Flow and Congestion ControlConnection Establishment & Flow and Congestion Control
Connection Establishment & Flow and Congestion ControlAdeel Rasheed
 
Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)Sri Prasanna
 
Transport services
Transport servicesTransport services
Transport servicesNavin Kumar
 
Congestion control
Congestion controlCongestion control
Congestion controlAman Jaiswal
 

What's hot (20)

Routing algorithm
Routing algorithmRouting algorithm
Routing algorithm
 
Coda file system
Coda file systemCoda file system
Coda file system
 
Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systems
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platforms
 
Software Project Management (monitoring and control)
Software Project Management (monitoring and control)Software Project Management (monitoring and control)
Software Project Management (monitoring and control)
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Resource management
Resource managementResource management
Resource management
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Unit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed SystemsUnit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed Systems
 
Network layer - design Issues
Network layer - design IssuesNetwork layer - design Issues
Network layer - design Issues
 
Transport layer protocol
Transport layer protocolTransport layer protocol
Transport layer protocol
 
Register allocation and assignment
Register allocation and assignmentRegister allocation and assignment
Register allocation and assignment
 
Connection Establishment & Flow and Congestion Control
Connection Establishment & Flow and Congestion ControlConnection Establishment & Flow and Congestion Control
Connection Establishment & Flow and Congestion Control
 
Message passing in Distributed Computing Systems
Message passing in Distributed Computing SystemsMessage passing in Distributed Computing Systems
Message passing in Distributed Computing Systems
 
Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
Transport services
Transport servicesTransport services
Transport services
 
Congestion control
Congestion controlCongestion control
Congestion control
 

Similar to MapReduce Programming Model

writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programsjani shaik
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfWasyihunSema2
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfTSANKARARAO
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce scriptHaripritha
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigKhanKhaja1
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationAhmad El Tawil
 
High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)Jose Luis Lopez Pino
 
Map reduce advantages over parallel databases
Map reduce advantages over parallel databases Map reduce advantages over parallel databases
Map reduce advantages over parallel databases Ahmad El Tawil
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersAbhishek Singh
 
Hadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupHadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupCsaba Toth
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionDong Ngoc
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 

Similar to MapReduce Programming Model (20)

writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programs
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdf
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce script
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pig
 
Hadoop
HadoopHadoop
Hadoop
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
Mapreduce Hadop.pptx
Mapreduce Hadop.pptxMapreduce Hadop.pptx
Mapreduce Hadop.pptx
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)
 
Map reduce advantages over parallel databases
Map reduce advantages over parallel databases Map reduce advantages over parallel databases
Map reduce advantages over parallel databases
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
 
E031201032036
E031201032036E031201032036
E031201032036
 
Hadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupHadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User Group
 
Mapreduce Osdi04
Mapreduce Osdi04Mapreduce Osdi04
Mapreduce Osdi04
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 

More from AdarshaDhakal

Concealed Object Recognition
Concealed Object RecognitionConcealed Object Recognition
Concealed Object RecognitionAdarshaDhakal
 
An IoT based smart irrigation management system(SIMS) using machine learning ...
An IoT based smart irrigation management system(SIMS) using machine learning ...An IoT based smart irrigation management system(SIMS) using machine learning ...
An IoT based smart irrigation management system(SIMS) using machine learning ...AdarshaDhakal
 
Concept Sorting in Knowledge Elicitation
Concept Sorting in Knowledge ElicitationConcept Sorting in Knowledge Elicitation
Concept Sorting in Knowledge ElicitationAdarshaDhakal
 
Shape Preserving Interpolation Using C2 Rational Cubic Spline
Shape Preserving Interpolation Using C2 Rational Cubic SplineShape Preserving Interpolation Using C2 Rational Cubic Spline
Shape Preserving Interpolation Using C2 Rational Cubic SplineAdarshaDhakal
 

More from AdarshaDhakal (6)

cloud_ch1.pptx
cloud_ch1.pptxcloud_ch1.pptx
cloud_ch1.pptx
 
Concealed Object Recognition
Concealed Object RecognitionConcealed Object Recognition
Concealed Object Recognition
 
Highway Networks
Highway NetworksHighway Networks
Highway Networks
 
An IoT based smart irrigation management system(SIMS) using machine learning ...
An IoT based smart irrigation management system(SIMS) using machine learning ...An IoT based smart irrigation management system(SIMS) using machine learning ...
An IoT based smart irrigation management system(SIMS) using machine learning ...
 
Concept Sorting in Knowledge Elicitation
Concept Sorting in Knowledge ElicitationConcept Sorting in Knowledge Elicitation
Concept Sorting in Knowledge Elicitation
 
Shape Preserving Interpolation Using C2 Rational Cubic Spline
Shape Preserving Interpolation Using C2 Rational Cubic SplineShape Preserving Interpolation Using C2 Rational Cubic Spline
Shape Preserving Interpolation Using C2 Rational Cubic Spline
 

Recently uploaded

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 

Recently uploaded (20)

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 

MapReduce Programming Model

  • 1. MapReduce Programming Model Adarsha Dhakal Jaydeep Shah Prakash Upadhyaya Ritu Ratnam 1
  • 3. Introduction • MapReduce is a programming model introduced by Google for processing and generating large data sets on clusters of computers. • Google first formulated the framework for the purpose of serving Google’s Web page indexing, and the new framework replaced earlier indexing algorithms. • Beginner developers find the MapReduce framework beneficial because library routines can be used to create parallel programs without any worries about infra-cluster communication, task monitoring or failure handling processes. • MapReduce runs on a large cluster of commodity machines and is highly scalable. • It has several forms of implementation provided by multiple programming languages, like Java, C# and C++.
  • 4. • MapReduce is a general-purpose programming model for data- intensive computing. • It was introduced by Google in 2004 to construct its web index. • It is also used at Yahoo, Facebook etc. It uses a parallel computing model that distributes computational tasks to large number of nodes(approximately 1000-10000 nodes.) • It is fault-tolerable. It can work even when 1600 nodes among 1800 nodes fails. • Hadoop framework from Apache Software Foundation is an implementation of MapReduce Programming Model
  • 5.
  • 6. Phases for MapReduce 1. Input Splits 2. Mapping 3. Shuffling 4. Sorting 5. Reducing
  • 7.
  • 8. Steps for MapReduce • Step 1: Transform raw data into key/value pairs in parallel. • The mapper will get the data file and make the Rating the key and the values will be the reviews. We will add number 1 for reviews. • Step 2: Shuffle and sort by the MapReduce model. • The process of transferring mappers’ intermediate output to the reducer is known as shuffling. It will collect all the reviews(number 1s) together with the individual key and it will sort them. it will get sorted by key. • Step3: Process the data using Reduce. • Reduce will count each value(number 1) for each key.
  • 9. • Although, the map and reduce functions in MapReduce model is not exactly same as in functional programming. • Map and Reduce functions in MapReduce model: • Map: It process a (key, value) pair and returns a list of (intermediate key, value) pairs map(k1, v1)→list(k2, v2) • Reduce: It merges all intermediate values having the same intermediate key reduce(k2, list(v2))→list(v3)
  • 10.
  • 11. Basic Concept • In MapReduce model, user has to write only two functions map and reduce. • Few examples that can be easily expressed as MapReduce computations: • Distributed Grep ( is an efficient way to utilize a Hadoop cluster to find log messages hidden within terabytes of log data) • Count of URL Access Frequency • Inverted Index • Mining
  • 12.
  • 13. Advantages • MapReduce facilitates automatic parallelization and distribution, reducing the time required to run the programs • MapReduce provides fault tolerance by re-executing, writing map output to a distributed file system, and restarting failed map or reducer task • MapReduce is a cost-effective solution for processing of data • MapReduce processes large volume of unprocessed data very quickly • MapReduce utilizes simple programming model to handle tasks more efficiently and quickly and is easy to learn • MapReduce is flexible and works with several Hadoop languages to handle and store data
  • 14. Limitations • MapReduce is a low-level programming model which involves a lot of writing code • The batch-based processing nature of MapReduce makes it unsuitable for real-time processing • It does not support data pipelining or overlapping of Map and Reduce functions • Task initialization, coordination, monitoring, and scheduling take up a large chunk of MapReduce's execution time and reduce its performance • MapReduce cannot cache the intermediate data in memory, thereby diminishing Hadoop’s performance
  • 15. The data we have has 20491 rows and 2 columns, and our task is to provide individual count of ratings.
  • 16. MAPPING each rating with a shuffle and giving counter of 1. Later sorting the ratings with the count.
  • 17. REDUCING leads to giving lesser number of data. Each rating has their total count from the data from Review of Hotel
  • 18. Implementing MapReduce Programming Model • Hadoop, developed by Apache • Spark, developed by AMPLab at UC Berkley • Phoenix++, developed at Stanford University • MARISSA (MApReduce Implementation for Streaming Science Application, developed at SUNY Binghamton • DRYAD and DRYADLINQ, developed by Microsoft • MapReduce-MPI, Developed by Steve Plimpton (Sandia) • Disco, developed by NOKIA • Themis, developed by Rasmussen et al • MR4C, developed by Skybox Imaging
  • 19. Bibliography • MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and Sanjay Ghemawat Google, Inc. • MapReduce Tutorial, https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html • Hadoop – MapReduce, https://www.tutorialspoint.com/hadoop/hadoop_mapreduce.htm • MapReduce-Implementation-in-Python, https://github.com/rshah204/MapReduce-Implementation-in- Python/blob/master/MapReduce.ipynb • Hotel Reviews, https://www.kaggle.com/datasets/yash10kundu/hotel-reviews?resource=download • MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON, Zeba Khanam and Shafali Agarwal, Department of Computer Application, JSSATE, Noida, IJCSIT Vol 7, No 4, August 2015