SlideShare a Scribd company logo
1 of 26
Download to read offline
BIG DATA WITH HADOOP LAB MANUAL
1. Pre-requisites:
➢ Java Programming
➢ Database Knowledge
2. Course Educational Objectives:
This course provides practical, foundation level training that enables immediate and
effective participation in Big Data and other Analytics projects using Hadoop and R.
3. Course Outcomes:
After the completion of this course, the students will be able to:
CO1: Preparing for data summarization, query, and analysis.
CO2: Applying data modelling techniques to large data sets.
CO3: Creating applications for Big Data analytics.
CO4: Improve individual / teamwork skills, communication & report writing skills with
ethical values.
4. Course Articulation Matrix:
COs PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3
CO1 3 3 3 2 3 - - - - - - - 2 3 -
CO2 3 2 2 2 3 - - - - - - - 2 2 -
CO3 3 3 3 2 3 1 - - - - - - 2 3 -
CO4 - - - - - - - 2 2 2 - - - - -
1 - Low 2 –Medium 3 - High
3
LIST OF EXPERIMENTS
Week-1:
Refreshing Linux Commands and Installation of Hadoop
Week-2:
Implementation of Run a basic Word Count Map Reduce program
Week-3:
Implementation of Matrix Multiplication with Hadoop Map Reduce.
Week-4:
Implementation of Weather mining by taking weather dataset using Map Reduce.
Week-5:
Installation of Hive along with practice examples.
Week-6:
Installation of Sqoop along with Practice examples.
4
WEEK 01: Refreshing Linux Commands and Installation of Hadoop
Linux commands:
• ls: This command is used to list all the files. It will print all the directories present in HDFS
$hadoop fs –ls /dir
• mkdir: To create a directory. In Hadoop dfs there is no home directory by default. So let’s first create
it.
$hadoop fs –mkdir /directory_name
• touchz: It creates an empty file.
$hadoop fs –touchz /filename
• copyFromLocal (or) put: To copy files/folders from local file system to hdfs store. This is the most
important command. Local filesystem means the files present on the OS.
$hadoop fs –put filename(which you want to put) /path
$hadoop fs –copyFromLocal filename(which you want to put) /path
• copyToLocal (or) get: To copy files/folders from hdfs store to local file system.
$hadoop fs –get /file(path)
$hadoop fs –copyToLocal /file(path)
• cat: To print file contents
$hado op fs –cat /file(path)
5
• moveFromLocal: This command will move file from local to hdfs.
$hadoop fs –moveFromLocal file_name(which you want to move) /path
• cp: This command is used to copy files within hdfs.
$hadoop fs –cp /path1/file /path2/file
• mv: This command is used to move files within hdfs. It cut-paste a file.
$hadoop fs –mv /path1/file /path2/file
• rmr: This command deletes a file from HDFS recursively. It is very useful command when you want to
delete a non-empty directory.
$hadoop fs –rmr /file(path)
• du: It will give the size of each file in directory.
$hadoop fs –du /file(path)
• dus:: This command will give the total size of directory/file.
$hadoop fs –dus /file(path)
• stat: It will give the last modified time of the directory or path. In short, it will give stats of the
directory or file.
$hadoop fs –stat /dir(file)
6
Installation of Hadoop:
Open terminal in UBUNTU and execute the fallowing commands one by one1)sudo
apt update
2)sudo apt install openjdk-8-jdk -y
3)java -version; javac -version
4)sudo apt install openssh-server openssh-client -y
5)ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
6)cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
7)chmod 0600 ~/.ssh/authorized_keys
8) ssh localhost
9)wget https://downloads.apache.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz10)tar
xzf hadoop-3.2.1.tar.gz
11)sudo nano .bashrc
#Hadoop Related Options
export HADOOP_HOME=/home/hadoop-3.2.1 export
HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME export
HADOOP_COMMON_HOME=$HADOOP_HOME
7
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport
PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib/native”
12)source ~/.bashrc
13) sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 which
javac
readlink -f /usr/bin/javac
14) sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/tmpdata</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>
15) sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/dfsdata/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/dfsdata/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
16) sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
8
</property>
</configuration>
17) sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CO
NF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRE
D_HOME</value>
</property>
</configuration>
18) hdfs namenode -format
Navigate to the hadoop-3.2.1/sbin directory
19)./start-dfs.sh
20)./start-yarn.sh
21)jps
22) http://localhost:9870
23)http://localhost:9864
24)http://localhost:8088
9
WEEK 02: Implementation of Run a basic Word Count Map Reduce
program tounderstand Map Reduce Paradigm.
Mapper Logic:
package com.lbrce.wordcount; import java.io.IOException; import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import
org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<LongWritable, Text, Text,IntWritable>
{
@Override
public void map(LongWritable key, Text value, Context con)throws IOException, InterruptedException
{
String line = value.toString(); String[] words = line.split("s"); for(String s:words)
{
con.write(new Text(s), new IntWritable(1));
}
}
}
Reducer Logic:
package com.lbrce.wordcount;import
java.io.IOException;
import org.apache.hadoop.io.IntWritable;import
org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text, IntWritable,Text,IntWritable>
{
@Override
public void reduce(Text key, Iterable<IntWritable> values,Contextcon) throws
IOException, InterruptedException
{
int sum = 0; for(IntWritable
i:values)
{
sum = sum + i.get();
}
con.write(key, new IntWritable(sum));
}
}
10
Driver Logic:
Job job = new Job(); job.setJarByClass(WordCountDriver.class);
package com.lbrce.wordcount;
import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import
org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCountDriver
{
public static void main(String[] args) throws Exception
{
if (args.length != 2)
{
<output path>");
}
System.err.println("Usage: WordCount <input path> System.exit(-1);
Job job = new Job(); job.setJarByClass(WordCountDriver.class);
job.setJobName("Word Count"); FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class); job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class); System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
OUTPUT:
Local file
Step 1: create a text file and place in Hadoop and run with the exported jar.
Step 2: Running of Mapper and Reducer.
11
Step 3: Output file generation and execution part is shown below.
Hadoop file
12
WEEK 03: Implementation of Matrix Multiplication with Hadoop Map Reduce.
Mapper Logic:
package com.lbrce.matrixmul; import
java.io.IOException; import
org.apache.hadoop.conf.*;import
org.apache.hadoop.io.*;
importorg.apache.hadoop.mapreduce.*;
public class MatrixMapper extends Mapper<LongWritable, Text, Text, Text>
{
public void map(LongWritable key, Text value, Context context)throws IOException,
InterruptedException
{
Configuration conf = context.getConfiguration();int m =
Integer.parseInt(conf.get("m"));
int p = Integer.parseInt(conf.get("p"));String line =
value.toString();
String[] indicesAndValue = line.split(",");Text outputKey
= new Text();
Text outputValue = new Text();
if (indicesAndValue[0].equals("M"))
{
for (int k = 0; k < p; k++)
{
outputKey.set(indicesAndValue[1] + "," + k);
outputValue.set("M," + indicesAndValue[2] + ","
+indicesAndValue[3]);
context.write(outputKey,outputValue);
}
else
{
}
for (int i = 0; i < m; i++)
{
outputKey.set(i + "," + indicesAndValue[2]); outputValue.set("N," +
indicesAndValue[1] + ","
+indicesAndValue[3]); context.write(outputKey,outputValue);
}
}
}
}
Reducer Logic:
package com.lbrce.matrixmul;import
java.io.IOException; import
java.util.*;
import org.apache.hadoop.io.*; import
org.apache.hadoop.mapreduce.*;
public class MatrixReducer extends Reducer<Text, Text, Text, Text>
{
public void reduce(Text key, Iterable<Text> values, Contextcontext)throws
IOException, InterruptedException
13
{
String[] value;
HashMap<Integer, Float> hashA = new HashMap<Integer,Float>();
HashMap<Integer, Float> hashB = new HashMap<Integer, Float>();for (Text val :
values)
{
value = val.toString().split(",");
if (value[0].equals("M"))
{
else
} {
hashB.put(Integer.parseInt(value[1]),Float.parseFloat(value[2]));
}
}
int n= Integer.parseInt(context.getConfiguration().get("n")); float result = 0.0f;
float a_ij; float b_jk;
for (int j = 0; j < n; j++)
{
a_ij = hashA.containsKey(j) ? hashA.get(j) : 0.0f; b_jk = hashB.containsKey(j) ?
hashB.get(j) : 0.0f; result += a_ij * b_jk;
}
if (result != 0.0f)
{
context.write(null, new Text(key.toString() + ","
+Float.toString(result)));
}
}
}
hashA.put(Integer.parseInt(value[1]),Float.parseFloat(value[2]));
14
Driver Logic:
import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import
org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class MatrixDriver
{
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
// M is an m-by-n matrix; N is an n-by-p matrix.
conf.set("m", "2");
conf.set("n", "2");
conf.set("p", "2");
Job job = Job.getInstance(conf, "MatrixMultiplication"); job.setJarByClass(MatrixDriver.class);
job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class);
job.setMapperClass(MatrixMapper.class); job.setReducerClass(MatrixReducer.class);
job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new
Path(args[1])); job.submit();
}
}
Execution Part in Termius:
Step1: checking whether jar is present or not in directory
15
Step 02: Now for executing the Matrix Multiplication operation we need an input file (i.e.MN as
given in logic) which consists of two 4 x 3 matrices.
Step 03: After successfully copying the input file to the Hadoop File system now we have toexecute the
jar file. Additionally, the output directory must be created called“MMOutput”.
Step 04: Verifying whether the job done successfully by opening the output directory.
Step 05: Now opening the output file in read mode to print the output of the operation.
The matrix multiplication of two matrices given in input file will be executed and, displayed.
16
Week 04: Implementation of Weather mining by taking weather dataset using Map Reduce
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.fs.Path
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.conf.Configuration;
public class MyMaxMin {
public static class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, Text>
{
public static final int MISSING=9999;
public void map(LongWritable arg0, Text Value, Context context) throws IOException, InterruptedException
{
String line = Value.toString();
if (!(line.length() == 0))
{
String date = line.substring(6, 14);
float temp_Min = Float.parseFloat(line.substring(39, 45).trim());
float temp_Max = Float.parseFloat(line.substring(47, 53).trim());
if (temp_Max > 35.0 && temp_Max!=MISSING)
{
context.write(new Text("Hot Day " + date),new Text(String.valueOf(temp_Max)));
}
if (temp_Min < 10 && temp_Max!=MISSING)
{
context.write(new Text("Cold Day " + date),new Text(String.valueOf(temp_Min)));
}
}
}
}
public static class MaxTemperatureReducer extends Reducer<Text, Text, Text, Text>
{
public void reduce(Text Key, Iterator<Text> Values, Context context)throws IOException,
InterruptedException
{
String temperature = Values.next().toString();
context.write(Key, new Text(temperature));
}
}
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
Job job = new Job(conf, "weather example");
job.setJarByClass(MyMaxMin.class);
job.setMapOutputKeyClass(Text.class);
17
job.setMapOutputValueClass(Text.class);
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path OutputPath = new Path(args[1]);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}}
Output: Step 01: checking wheather the jar file exists or not
Step 02: : Checking whether the job is done successfully or not and open the output directory
Step 03: : Now opening the output file in read mode to print the output of the operation
18
Week 05: Installation of Sqoop along with Practice examples.
Step 1: Connect to Mysql Database, next create a database(20761A5433) to create the tables required.
Step2: creating two tables called employee and student where employee table consist of primary key
Step 3: Insert values in to tables that created.
. Viewing tables with inserted values using select command
19
Hadoop commands Using SQOOP
Step 1:Display all commands and their uses by “sqoop help”
20
Step 2: Hadoop directory with files.
Step 3: Connecting database with sqoop and accessing the created tables as files and running the mapper and
reducer logic by below commands.
Step 4: Now check our directory which consist of both files .
Step 5: Display the contents of the file use “cat “ command and check the output.
21
Step 6: Tables can be placed by import all tables command ,as below example.
Step 7: Retrieved the columns based on the specified criteria
Step 8: Our directory after performing all above commands.
22
Week 06: Installation of Hive along with Practice Examples.
Hive tutorial provides basic and advanced concepts of Hive. Our Hive tutorial is designed for beginners and
professionals.
Apache Hive is a data ware house system for Hadoop that runs SQL like queries called HQL (Hive query
language) which gets internally converted to map reduce jobs. Hive was developed by Facebook. It supports Data
definition Language, Data Manipulation Language and user defined functions.
Our Hive tutorial includes all topics of Apache Hive with Hive Installation, Hive Data Types, Hive Table
partitioning, Hive DDL commands, Hive DML commands, Hive sort by vs order by, Hive Joining tables etc.
Step 1: Open hive work space and create a directory and a database with a sample table.
Step 2: Insert values into a table in two ways that are by creating a file and loading data and by inserting
values one after the another.
Step 3: creating an internal table in hive:
Step 4: loading data into internal table.
23
Step 5: creating an external table in the hive.
Step 6: external table doesn’t need a load command,so we can verify direactly.
Step 7:Hive user-defined function:
Package.com.lbrce.hiveudf;
Import org.apache.hadoop.hive.ql.exec.UDF;
public class Percentage extends UDF{
public float evaluate(int s1, int s2, int s3)
{
return(s1+s2+s3)/3.0f;
}
}
24
Step 8: preparing a jar file and loading into hive for execution.
Step 9: creating a temporary file for execution.
Step 10: create any another table and check the result.
Step 11:Another user-defined function to place some words between the input file.
25
Step 12: Datatypes in hive with examples.
a. Array:
b. Map
c. Struct
Step 13: Partition of the tables in hive:
26

More Related Content

Similar to Big data using Hadoop, Hive, Sqoop with Installation

Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Nag Arvind Gudiseva
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows habeebulla g
 
Implementing Hadoop on a single cluster
Implementing Hadoop on a single clusterImplementing Hadoop on a single cluster
Implementing Hadoop on a single clusterSalil Navgire
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase clientShashwat Shriparv
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS filesRupak Roy
 
Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014rpbrehm
 
Hadoop installation steps
Hadoop installation stepsHadoop installation steps
Hadoop installation stepsMayank Sharma
 
Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0venkatakrishnan k
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configurationSubhas Kumar Ghosh
 
250hadoopinterviewquestions
250hadoopinterviewquestions250hadoopinterviewquestions
250hadoopinterviewquestionsRamana Swamy
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14jijukjoseph
 
Data science bootcamp day2
Data science bootcamp day2Data science bootcamp day2
Data science bootcamp day2Chetan Khatri
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Mandakini Kumari
 

Similar to Big data using Hadoop, Hive, Sqoop with Installation (20)

Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows
 
Hadoop completereference
Hadoop completereferenceHadoop completereference
Hadoop completereference
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Implementing Hadoop on a single cluster
Implementing Hadoop on a single clusterImplementing Hadoop on a single cluster
Implementing Hadoop on a single cluster
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS files
 
Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014
 
Hadoop installation steps
Hadoop installation stepsHadoop installation steps
Hadoop installation steps
 
Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0
 
Linux presentation
Linux presentationLinux presentation
Linux presentation
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
250hadoopinterviewquestions
250hadoopinterviewquestions250hadoopinterviewquestions
250hadoopinterviewquestions
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
 
Hadoop
HadoopHadoop
Hadoop
 
#WeSpeakLinux Session
#WeSpeakLinux Session#WeSpeakLinux Session
#WeSpeakLinux Session
 
MapReduce1.pptx
MapReduce1.pptxMapReduce1.pptx
MapReduce1.pptx
 
Data science bootcamp day2
Data science bootcamp day2Data science bootcamp day2
Data science bootcamp day2
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
 
Hdfs java api
Hdfs java apiHdfs java api
Hdfs java api
 

Recently uploaded

Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 

Recently uploaded (20)

Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 

Big data using Hadoop, Hive, Sqoop with Installation

  • 1. BIG DATA WITH HADOOP LAB MANUAL
  • 2. 1. Pre-requisites: ➢ Java Programming ➢ Database Knowledge 2. Course Educational Objectives: This course provides practical, foundation level training that enables immediate and effective participation in Big Data and other Analytics projects using Hadoop and R. 3. Course Outcomes: After the completion of this course, the students will be able to: CO1: Preparing for data summarization, query, and analysis. CO2: Applying data modelling techniques to large data sets. CO3: Creating applications for Big Data analytics. CO4: Improve individual / teamwork skills, communication & report writing skills with ethical values. 4. Course Articulation Matrix: COs PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3 CO1 3 3 3 2 3 - - - - - - - 2 3 - CO2 3 2 2 2 3 - - - - - - - 2 2 - CO3 3 3 3 2 3 1 - - - - - - 2 3 - CO4 - - - - - - - 2 2 2 - - - - - 1 - Low 2 –Medium 3 - High
  • 3. 3 LIST OF EXPERIMENTS Week-1: Refreshing Linux Commands and Installation of Hadoop Week-2: Implementation of Run a basic Word Count Map Reduce program Week-3: Implementation of Matrix Multiplication with Hadoop Map Reduce. Week-4: Implementation of Weather mining by taking weather dataset using Map Reduce. Week-5: Installation of Hive along with practice examples. Week-6: Installation of Sqoop along with Practice examples.
  • 4. 4 WEEK 01: Refreshing Linux Commands and Installation of Hadoop Linux commands: • ls: This command is used to list all the files. It will print all the directories present in HDFS $hadoop fs –ls /dir • mkdir: To create a directory. In Hadoop dfs there is no home directory by default. So let’s first create it. $hadoop fs –mkdir /directory_name • touchz: It creates an empty file. $hadoop fs –touchz /filename • copyFromLocal (or) put: To copy files/folders from local file system to hdfs store. This is the most important command. Local filesystem means the files present on the OS. $hadoop fs –put filename(which you want to put) /path $hadoop fs –copyFromLocal filename(which you want to put) /path • copyToLocal (or) get: To copy files/folders from hdfs store to local file system. $hadoop fs –get /file(path) $hadoop fs –copyToLocal /file(path) • cat: To print file contents $hado op fs –cat /file(path)
  • 5. 5 • moveFromLocal: This command will move file from local to hdfs. $hadoop fs –moveFromLocal file_name(which you want to move) /path • cp: This command is used to copy files within hdfs. $hadoop fs –cp /path1/file /path2/file • mv: This command is used to move files within hdfs. It cut-paste a file. $hadoop fs –mv /path1/file /path2/file • rmr: This command deletes a file from HDFS recursively. It is very useful command when you want to delete a non-empty directory. $hadoop fs –rmr /file(path) • du: It will give the size of each file in directory. $hadoop fs –du /file(path) • dus:: This command will give the total size of directory/file. $hadoop fs –dus /file(path) • stat: It will give the last modified time of the directory or path. In short, it will give stats of the directory or file. $hadoop fs –stat /dir(file)
  • 6. 6 Installation of Hadoop: Open terminal in UBUNTU and execute the fallowing commands one by one1)sudo apt update 2)sudo apt install openjdk-8-jdk -y 3)java -version; javac -version 4)sudo apt install openssh-server openssh-client -y 5)ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa 6)cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 7)chmod 0600 ~/.ssh/authorized_keys 8) ssh localhost 9)wget https://downloads.apache.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz10)tar xzf hadoop-3.2.1.tar.gz 11)sudo nano .bashrc #Hadoop Related Options export HADOOP_HOME=/home/hadoop-3.2.1 export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME
  • 7. 7 export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib/native” 12)source ~/.bashrc 13) sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 which javac readlink -f /usr/bin/javac 14) sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml <configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/tmpdata</value> </property> <property> <name>fs.default.name</name> <value>hdfs://127.0.0.1:9000</value> </property> </configuration> 15) sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml <configuration> <property> <name>dfs.data.dir</name> <value>/home/dfsdata/namenode</value> </property> <property> <name>dfs.data.dir</name> <value>/home/dfsdata/datanode</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> 16) sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value>
  • 8. 8 </property> </configuration> 17) sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>127.0.0.1</value> </property> <property> <name>yarn.acl.enable</name> <value>0</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CO NF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRE D_HOME</value> </property> </configuration> 18) hdfs namenode -format Navigate to the hadoop-3.2.1/sbin directory 19)./start-dfs.sh 20)./start-yarn.sh 21)jps 22) http://localhost:9870 23)http://localhost:9864 24)http://localhost:8088
  • 9. 9 WEEK 02: Implementation of Run a basic Word Count Map Reduce program tounderstand Map Reduce Paradigm. Mapper Logic: package com.lbrce.wordcount; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WordCountMapper extends Mapper<LongWritable, Text, Text,IntWritable> { @Override public void map(LongWritable key, Text value, Context con)throws IOException, InterruptedException { String line = value.toString(); String[] words = line.split("s"); for(String s:words) { con.write(new Text(s), new IntWritable(1)); } } } Reducer Logic: package com.lbrce.wordcount;import java.io.IOException; import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class WordCountReducer extends Reducer<Text, IntWritable,Text,IntWritable> { @Override public void reduce(Text key, Iterable<IntWritable> values,Contextcon) throws IOException, InterruptedException { int sum = 0; for(IntWritable i:values) { sum = sum + i.get(); } con.write(key, new IntWritable(sum)); } }
  • 10. 10 Driver Logic: Job job = new Job(); job.setJarByClass(WordCountDriver.class); package com.lbrce.wordcount; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCountDriver { public static void main(String[] args) throws Exception { if (args.length != 2) { <output path>"); } System.err.println("Usage: WordCount <input path> System.exit(-1); Job job = new Job(); job.setJarByClass(WordCountDriver.class); job.setJobName("Word Count"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1]));job.setMapperClass(WordCountMapper.class); job.setReducerClass(WordCountReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); System.exit(job.waitForCompletion(true) ? 0 : 1); } } OUTPUT: Local file Step 1: create a text file and place in Hadoop and run with the exported jar. Step 2: Running of Mapper and Reducer.
  • 11. 11 Step 3: Output file generation and execution part is shown below. Hadoop file
  • 12. 12 WEEK 03: Implementation of Matrix Multiplication with Hadoop Map Reduce. Mapper Logic: package com.lbrce.matrixmul; import java.io.IOException; import org.apache.hadoop.conf.*;import org.apache.hadoop.io.*; importorg.apache.hadoop.mapreduce.*; public class MatrixMapper extends Mapper<LongWritable, Text, Text, Text> { public void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException { Configuration conf = context.getConfiguration();int m = Integer.parseInt(conf.get("m")); int p = Integer.parseInt(conf.get("p"));String line = value.toString(); String[] indicesAndValue = line.split(",");Text outputKey = new Text(); Text outputValue = new Text(); if (indicesAndValue[0].equals("M")) { for (int k = 0; k < p; k++) { outputKey.set(indicesAndValue[1] + "," + k); outputValue.set("M," + indicesAndValue[2] + "," +indicesAndValue[3]); context.write(outputKey,outputValue); } else { } for (int i = 0; i < m; i++) { outputKey.set(i + "," + indicesAndValue[2]); outputValue.set("N," + indicesAndValue[1] + "," +indicesAndValue[3]); context.write(outputKey,outputValue); } } } } Reducer Logic: package com.lbrce.matrixmul;import java.io.IOException; import java.util.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; public class MatrixReducer extends Reducer<Text, Text, Text, Text> { public void reduce(Text key, Iterable<Text> values, Contextcontext)throws IOException, InterruptedException
  • 13. 13 { String[] value; HashMap<Integer, Float> hashA = new HashMap<Integer,Float>(); HashMap<Integer, Float> hashB = new HashMap<Integer, Float>();for (Text val : values) { value = val.toString().split(","); if (value[0].equals("M")) { else } { hashB.put(Integer.parseInt(value[1]),Float.parseFloat(value[2])); } } int n= Integer.parseInt(context.getConfiguration().get("n")); float result = 0.0f; float a_ij; float b_jk; for (int j = 0; j < n; j++) { a_ij = hashA.containsKey(j) ? hashA.get(j) : 0.0f; b_jk = hashB.containsKey(j) ? hashB.get(j) : 0.0f; result += a_ij * b_jk; } if (result != 0.0f) { context.write(null, new Text(key.toString() + "," +Float.toString(result))); } } } hashA.put(Integer.parseInt(value[1]),Float.parseFloat(value[2]));
  • 14. 14 Driver Logic: import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class MatrixDriver { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); // M is an m-by-n matrix; N is an n-by-p matrix. conf.set("m", "2"); conf.set("n", "2"); conf.set("p", "2"); Job job = Job.getInstance(conf, "MatrixMultiplication"); job.setJarByClass(MatrixDriver.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.setMapperClass(MatrixMapper.class); job.setReducerClass(MatrixReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.submit(); } } Execution Part in Termius: Step1: checking whether jar is present or not in directory
  • 15. 15 Step 02: Now for executing the Matrix Multiplication operation we need an input file (i.e.MN as given in logic) which consists of two 4 x 3 matrices. Step 03: After successfully copying the input file to the Hadoop File system now we have toexecute the jar file. Additionally, the output directory must be created called“MMOutput”. Step 04: Verifying whether the job done successfully by opening the output directory. Step 05: Now opening the output file in read mode to print the output of the operation. The matrix multiplication of two matrices given in input file will be executed and, displayed.
  • 16. 16 Week 04: Implementation of Weather mining by taking weather dataset using Map Reduce import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.fs.Path import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.conf.Configuration; public class MyMaxMin { public static class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, Text> { public static final int MISSING=9999; public void map(LongWritable arg0, Text Value, Context context) throws IOException, InterruptedException { String line = Value.toString(); if (!(line.length() == 0)) { String date = line.substring(6, 14); float temp_Min = Float.parseFloat(line.substring(39, 45).trim()); float temp_Max = Float.parseFloat(line.substring(47, 53).trim()); if (temp_Max > 35.0 && temp_Max!=MISSING) { context.write(new Text("Hot Day " + date),new Text(String.valueOf(temp_Max))); } if (temp_Min < 10 && temp_Max!=MISSING) { context.write(new Text("Cold Day " + date),new Text(String.valueOf(temp_Min))); } } } } public static class MaxTemperatureReducer extends Reducer<Text, Text, Text, Text> { public void reduce(Text Key, Iterator<Text> Values, Context context)throws IOException, InterruptedException { String temperature = Values.next().toString(); context.write(Key, new Text(temperature)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "weather example"); job.setJarByClass(MyMaxMin.class); job.setMapOutputKeyClass(Text.class);
  • 17. 17 job.setMapOutputValueClass(Text.class); job.setMapperClass(MaxTemperatureMapper.class); job.setReducerClass(MaxTemperatureReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); Path OutputPath = new Path(args[1]); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }} Output: Step 01: checking wheather the jar file exists or not Step 02: : Checking whether the job is done successfully or not and open the output directory Step 03: : Now opening the output file in read mode to print the output of the operation
  • 18. 18 Week 05: Installation of Sqoop along with Practice examples. Step 1: Connect to Mysql Database, next create a database(20761A5433) to create the tables required. Step2: creating two tables called employee and student where employee table consist of primary key Step 3: Insert values in to tables that created. . Viewing tables with inserted values using select command
  • 19. 19 Hadoop commands Using SQOOP Step 1:Display all commands and their uses by “sqoop help”
  • 20. 20 Step 2: Hadoop directory with files. Step 3: Connecting database with sqoop and accessing the created tables as files and running the mapper and reducer logic by below commands. Step 4: Now check our directory which consist of both files . Step 5: Display the contents of the file use “cat “ command and check the output.
  • 21. 21 Step 6: Tables can be placed by import all tables command ,as below example. Step 7: Retrieved the columns based on the specified criteria Step 8: Our directory after performing all above commands.
  • 22. 22 Week 06: Installation of Hive along with Practice Examples. Hive tutorial provides basic and advanced concepts of Hive. Our Hive tutorial is designed for beginners and professionals. Apache Hive is a data ware house system for Hadoop that runs SQL like queries called HQL (Hive query language) which gets internally converted to map reduce jobs. Hive was developed by Facebook. It supports Data definition Language, Data Manipulation Language and user defined functions. Our Hive tutorial includes all topics of Apache Hive with Hive Installation, Hive Data Types, Hive Table partitioning, Hive DDL commands, Hive DML commands, Hive sort by vs order by, Hive Joining tables etc. Step 1: Open hive work space and create a directory and a database with a sample table. Step 2: Insert values into a table in two ways that are by creating a file and loading data and by inserting values one after the another. Step 3: creating an internal table in hive: Step 4: loading data into internal table.
  • 23. 23 Step 5: creating an external table in the hive. Step 6: external table doesn’t need a load command,so we can verify direactly. Step 7:Hive user-defined function: Package.com.lbrce.hiveudf; Import org.apache.hadoop.hive.ql.exec.UDF; public class Percentage extends UDF{ public float evaluate(int s1, int s2, int s3) { return(s1+s2+s3)/3.0f; } }
  • 24. 24 Step 8: preparing a jar file and loading into hive for execution. Step 9: creating a temporary file for execution. Step 10: create any another table and check the result. Step 11:Another user-defined function to place some words between the input file.
  • 25. 25 Step 12: Datatypes in hive with examples. a. Array: b. Map c. Struct Step 13: Partition of the tables in hive:
  • 26. 26