SlideShare a Scribd company logo
1 of 39
MAPREDUCE
Hadoop Mapreduce paradigm
• Hadoop is an open-source software framework
for storing and processing large datasets ranging
in size from gigabytes to petabytes.
• developed at the Apache Software Foundation.
• basically two components in Hadoop:
1. Massive data storage
2. Faster data processing
2
Hadoop Mapreduce paradigm
• Hadoop distributed File System (HDFS):
• It allows you to store data of various formats
across a cluster.
• Map-Reduce:
• For resource management in Hadoop. It allows
parallel processing over the data stored across
HDFS.
3
History of Hadoop
4
Why Hadoop?
• Cost Effective System
• Computing power
• Scalability
• Storage flexibility
• Inherent data protection
• Varied Data Sources
• Fault-Tolerant
• Highly Available
• Low Network Traffic
• High Throughput
• Multiple Languages Supported
5
Disadvantages of Hadoop
• Issue With Small Files
• Vulnerable By Nature
• Processing Overhead
• Supports Only Batch Processing
• Iterative Processing
• Security
6
Traditional restaurant scenerio
7
Traditional Scenario
8
Distributed Processing Scenario
9
Distributed Processing Scenario Failure
10
Solution of Restaurant problem
11
Hadoop in Restaurant Analogy
12
Map tasks
• Process independent chunks in a parallel manner
• Out of map task stored as intermediate data on
local disk of that server
13
• Out of mapper automatically shuffled and stored
by framework
• Sorts the output based on key
• Provide reduced output by combining the output
f various mappers
Reduce task
14
Map-reduce daemons
1. JobTrackers
2. TaskTrackers
15
JobTracker
• Master daemon
• Single JobTracker per Hadoop cluster
• Provide connectivity between Hadoop and client
application
• Execution plan creation(which task to assign to
which node)
• Monitor all running tasks
• If task failed then rescheduling
16
Task Tracker
• Responsible for executing individual task which
is assigned by JobTracker
• Single Task Tracker per slave
• Continuously sends heartbeat message to Job
Tracker
• If no heartbeat message then task will be
allocated to other Task Trackers
17
Map-reduce execution pipeline
18
Mapper
• Mapper maps the input key-value pairs into a set of
intermediate key-value pairs
• Phases:
1. RecordReader:
• Converts tasks with key value pairs
• <Key , value>  <positional information, chunk of
data that constitutes the record>
2. Map:
• generate zero or more intermediate key-value pairs
19
3. Combiner
• Optimization technique for mapreduce job,
applies user specific aggregate function to only
that mapper
• Also known as Local reducer
4. Partitioner
• Intermediate key-value pairs
• Usually Number of partitions are equal to the
number of reducer
20
Mapper
Reducer
1. Shuffle and sort:
• consumes the output of Mapping phase
• consolidate the relevant records from Mapping
phase output.
• the same words are clubbed together along with
their respective frequency.
21
Reducer
2. Reducer:
• Grouped data produced by the shuffle and sort phase
• Apply reduce function
• Process one group at a time
• Reducer function iterate all the values associated with that key
• Aggregation, filtering,combining
22
3. Output format:
• Separates key value pair with tab
• Write it out to a file using record writer
23
API
• Main Class file Packages
• Mapper Class Packages
• Reducer Class Packages
24
Main class file packages
25
• import org.apache.hadoop.conf.Configured; (Configuration of system parameters)
• import org.apache.hadoop.fs.Path; (Configuration of file system path)
• import org.apache.hadoop.io.IntWritable; (Input/output package to display in output screen)
• import org.apache.hadoop.io.Text; ( to read and write the text)
• import org.apache.hadoop.mapred.FileInputFormat; ( MapRed file input format)
• import org.apache.hadoop.mapred.FileOutputFormat; ; ( MapRed file output format)
• import org.apache.hadoop.mapred.JobClient; ( assign the input job and process)
• import org.apache.hadoop.mapred.JobConf; (configuration file to execute I/O process)
• import org.apache.hadoop.util.Tool; (interface
(command line options) used to access MapRed
functions)
• import org.apache.hadoop.util.ToolRunner;
( Interface use to call run function)
26
Mapper File Packages
• import java.io.IOException; ( Exception handle)
• import org.apache.hadoop.io.IntWritable; ( to read the integer file)
• import org.apache.hadoop.io.LongWritable; (to read files range exceeding integer)
• import org.apache.hadoop.io.Text; (Input and output text)
• import org.apache.hadoop.mapred.MapReduceBase;( Inherited class of MapReduce functions)
• import org.apache.hadoop.mapred.Mapper; (Mapper Class)
• import org.apache.hadoop.mapred.OutputCollector; ( to collect and display class)
• import org.apache.hadoop.mapred.Reporter; (to display the information)
27
Reducer file Package
• import java.io.IOException; ( Exception handle)
• import java.util.Iterator; (to call utility function has more elements from iterator class)
• import org.apache.hadoop.io.IntWritable; ( to read the integer file)
• import org.apache.hadoop.io.Text; (Input and output text)
28
Reducer file Package
• import org.apache.hadoop.mapred.MapReduceBase; ( Inherited class of
MapReduce functions)
• import org.apache.hadoop.mapred.OutputCollector; ( to collect and
display class)
• import org.apache.hadoop.mapred.Reducer; (Reducer Class)
• import org.apache.hadoop.mapred.Reporter; (to display the
information)
29
Hadoop 2.0 features
• HDFS Federation – horizontal scalability of
NameNode
• NameNode High Availability – NameNode is no
longer a Single Point of Failure
• YARN – ability to process Terabytes and
Petabytes of data available in HDFS using Non-
MapReduce applications such as MPI, GIRAPH
30
Hadoop 2.0 features
• Resource Manager – splits up the two major
functionalities of overburdened JobTracker
(resource management and job
scheduling/monitoring) into two separate
daemons: a global Resource Manager and per-
application ApplicationMaster
• Capacity Scheduler
• Data Snapshot
• Support for Windows
31
Namenode high availability
• Hadoop 1.x, NameNode was single point of failure
• Hadoop Administrators need to manually recover
the NameNode using Secondary NameNode.
• Hadoop 2.0 Architecture supports multiple
NameNodes to remove this bottleneck
• Passive Standby NameNode support.
• In case of Active NameNode failure, the passive
NameNode becomes the Active NameNode and
starts writing to the shared storage
32
YARN(Yet Another Resource Negotiator)
• Main idea is splitting the JobTracker
responsibility of resource management and Job
scheduling into separate daemons.
33
YARN daemons
1. Global resource manager:
a) Scheduler(allocation of resources among
various running applications)
b) Application manager(Accepting job
submission, restarting application master in
case of failure)
34
YARN daemons
2. Node manager:
• Pre machine slave daemon
• Launching application container for application
execution
• Report usage of resources to the global resource
manager
35
YARN daemons
3. Application master:
• Application specific entity
• Negotiate required resources for execution from
the resource manager
• Works with node manager for executing and
monitoring component tasks
36
YARN
37
YARN workflow
1. Client submits an application
2. The Resource Manager allocates a container to start the
Application Manager
3. The Application Manager registers itself with the Resource
Manager
4. The Application Manager negotiates containers from the Resource
Manager
5. The Application Manager notifies the Node Manager to launch
containers
6. Application code is executed in the container
7. Client contacts Resource Manager/Application Manager to
monitor application’s status
8. Once the processing is complete, the Application Manager un-
registers with the Resource Manager
38
39

More Related Content

Similar to Hadoop Map-Reduce from the subject: Big Data Analytics

Similar to Hadoop Map-Reduce from the subject: Big Data Analytics (20)

Hadoop data analysis
Hadoop data analysisHadoop data analysis
Hadoop data analysis
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programs
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Hadoop
HadoopHadoop
Hadoop
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Big Data Technologies - Hadoop
Big Data Technologies - HadoopBig Data Technologies - Hadoop
Big Data Technologies - Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
Hadoop – Architecture.pptx
Hadoop – Architecture.pptxHadoop – Architecture.pptx
Hadoop – Architecture.pptx
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Presentation
PresentationPresentation
Presentation
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 

Recently uploaded

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 

Recently uploaded (20)

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 

Hadoop Map-Reduce from the subject: Big Data Analytics

  • 2. Hadoop Mapreduce paradigm • Hadoop is an open-source software framework for storing and processing large datasets ranging in size from gigabytes to petabytes. • developed at the Apache Software Foundation. • basically two components in Hadoop: 1. Massive data storage 2. Faster data processing 2
  • 3. Hadoop Mapreduce paradigm • Hadoop distributed File System (HDFS): • It allows you to store data of various formats across a cluster. • Map-Reduce: • For resource management in Hadoop. It allows parallel processing over the data stored across HDFS. 3
  • 5. Why Hadoop? • Cost Effective System • Computing power • Scalability • Storage flexibility • Inherent data protection • Varied Data Sources • Fault-Tolerant • Highly Available • Low Network Traffic • High Throughput • Multiple Languages Supported 5
  • 6. Disadvantages of Hadoop • Issue With Small Files • Vulnerable By Nature • Processing Overhead • Supports Only Batch Processing • Iterative Processing • Security 6
  • 12. Hadoop in Restaurant Analogy 12
  • 13. Map tasks • Process independent chunks in a parallel manner • Out of map task stored as intermediate data on local disk of that server 13 • Out of mapper automatically shuffled and stored by framework • Sorts the output based on key • Provide reduced output by combining the output f various mappers Reduce task
  • 14. 14
  • 16. JobTracker • Master daemon • Single JobTracker per Hadoop cluster • Provide connectivity between Hadoop and client application • Execution plan creation(which task to assign to which node) • Monitor all running tasks • If task failed then rescheduling 16
  • 17. Task Tracker • Responsible for executing individual task which is assigned by JobTracker • Single Task Tracker per slave • Continuously sends heartbeat message to Job Tracker • If no heartbeat message then task will be allocated to other Task Trackers 17
  • 19. Mapper • Mapper maps the input key-value pairs into a set of intermediate key-value pairs • Phases: 1. RecordReader: • Converts tasks with key value pairs • <Key , value>  <positional information, chunk of data that constitutes the record> 2. Map: • generate zero or more intermediate key-value pairs 19
  • 20. 3. Combiner • Optimization technique for mapreduce job, applies user specific aggregate function to only that mapper • Also known as Local reducer 4. Partitioner • Intermediate key-value pairs • Usually Number of partitions are equal to the number of reducer 20 Mapper
  • 21. Reducer 1. Shuffle and sort: • consumes the output of Mapping phase • consolidate the relevant records from Mapping phase output. • the same words are clubbed together along with their respective frequency. 21
  • 22. Reducer 2. Reducer: • Grouped data produced by the shuffle and sort phase • Apply reduce function • Process one group at a time • Reducer function iterate all the values associated with that key • Aggregation, filtering,combining 22 3. Output format: • Separates key value pair with tab • Write it out to a file using record writer
  • 23. 23
  • 24. API • Main Class file Packages • Mapper Class Packages • Reducer Class Packages 24
  • 25. Main class file packages 25 • import org.apache.hadoop.conf.Configured; (Configuration of system parameters) • import org.apache.hadoop.fs.Path; (Configuration of file system path) • import org.apache.hadoop.io.IntWritable; (Input/output package to display in output screen) • import org.apache.hadoop.io.Text; ( to read and write the text) • import org.apache.hadoop.mapred.FileInputFormat; ( MapRed file input format) • import org.apache.hadoop.mapred.FileOutputFormat; ; ( MapRed file output format) • import org.apache.hadoop.mapred.JobClient; ( assign the input job and process) • import org.apache.hadoop.mapred.JobConf; (configuration file to execute I/O process)
  • 26. • import org.apache.hadoop.util.Tool; (interface (command line options) used to access MapRed functions) • import org.apache.hadoop.util.ToolRunner; ( Interface use to call run function) 26
  • 27. Mapper File Packages • import java.io.IOException; ( Exception handle) • import org.apache.hadoop.io.IntWritable; ( to read the integer file) • import org.apache.hadoop.io.LongWritable; (to read files range exceeding integer) • import org.apache.hadoop.io.Text; (Input and output text) • import org.apache.hadoop.mapred.MapReduceBase;( Inherited class of MapReduce functions) • import org.apache.hadoop.mapred.Mapper; (Mapper Class) • import org.apache.hadoop.mapred.OutputCollector; ( to collect and display class) • import org.apache.hadoop.mapred.Reporter; (to display the information) 27
  • 28. Reducer file Package • import java.io.IOException; ( Exception handle) • import java.util.Iterator; (to call utility function has more elements from iterator class) • import org.apache.hadoop.io.IntWritable; ( to read the integer file) • import org.apache.hadoop.io.Text; (Input and output text) 28
  • 29. Reducer file Package • import org.apache.hadoop.mapred.MapReduceBase; ( Inherited class of MapReduce functions) • import org.apache.hadoop.mapred.OutputCollector; ( to collect and display class) • import org.apache.hadoop.mapred.Reducer; (Reducer Class) • import org.apache.hadoop.mapred.Reporter; (to display the information) 29
  • 30. Hadoop 2.0 features • HDFS Federation – horizontal scalability of NameNode • NameNode High Availability – NameNode is no longer a Single Point of Failure • YARN – ability to process Terabytes and Petabytes of data available in HDFS using Non- MapReduce applications such as MPI, GIRAPH 30
  • 31. Hadoop 2.0 features • Resource Manager – splits up the two major functionalities of overburdened JobTracker (resource management and job scheduling/monitoring) into two separate daemons: a global Resource Manager and per- application ApplicationMaster • Capacity Scheduler • Data Snapshot • Support for Windows 31
  • 32. Namenode high availability • Hadoop 1.x, NameNode was single point of failure • Hadoop Administrators need to manually recover the NameNode using Secondary NameNode. • Hadoop 2.0 Architecture supports multiple NameNodes to remove this bottleneck • Passive Standby NameNode support. • In case of Active NameNode failure, the passive NameNode becomes the Active NameNode and starts writing to the shared storage 32
  • 33. YARN(Yet Another Resource Negotiator) • Main idea is splitting the JobTracker responsibility of resource management and Job scheduling into separate daemons. 33
  • 34. YARN daemons 1. Global resource manager: a) Scheduler(allocation of resources among various running applications) b) Application manager(Accepting job submission, restarting application master in case of failure) 34
  • 35. YARN daemons 2. Node manager: • Pre machine slave daemon • Launching application container for application execution • Report usage of resources to the global resource manager 35
  • 36. YARN daemons 3. Application master: • Application specific entity • Negotiate required resources for execution from the resource manager • Works with node manager for executing and monitoring component tasks 36
  • 38. YARN workflow 1. Client submits an application 2. The Resource Manager allocates a container to start the Application Manager 3. The Application Manager registers itself with the Resource Manager 4. The Application Manager negotiates containers from the Resource Manager 5. The Application Manager notifies the Node Manager to launch containers 6. Application code is executed in the container 7. Client contacts Resource Manager/Application Manager to monitor application’s status 8. Once the processing is complete, the Application Manager un- registers with the Resource Manager 38
  • 39. 39