1. BIG DATA ANALYSIS
USING HADOOP
SEMINAR: HIVE QL
BY:
SHREYA JAISWAL [ENG20CA0042]
NANDINI GARG [ENG20CA0023]
2022-2023
2. Content Overview
› Introduction
› Difference between HIVE and RDBMS
› Hive QL
› Difference between SQL and HIVE QL
› HIVE QL built in operators
2
3. INTRODUCTION
HIVE:
Hive is a data warehouse software system that provides
data query and analysis. Hive gives an interface like SQL
to query data stored in various databases and file
systems that integrate with Hadoop. Hive helps with
querying and managing large datasets real fast. It is an
ETL tool for Hadoop ecosystem.
3
4. Difference
4
RDBMS HIVE
It is used to maintain database It is used to maintain data
warehouse.
It uses SQL( structured query
language.
It uses HQL( Hive query language
Schema is fixed in RDBMS. Schema varies in it.
Normalized data is stored. Normalized and de-normalized both
type of data is stored.
Tables in RDMS are sparse. Tables in Hive are dense.
5. 5
HIVE QUERY LANGUAGE( HIVE QL):
(HiveQL) is a query language in Apache Hive for
processing and analyzing structured data. It is a mixture
of SQL-92, MySQL, and Oracle’s SQL. It is very much
similar to SQL and highly scalable. It reuses familiar
concepts from the relational database world, such as
tables, rows, columns and schema, to ease learning.
6. › Hive provides a CLI for Hive query writing using
Hive Query Language (HiveQL).
› Data Definition Language (DDL) is used for creating, altering
and dropping databases, tables, views, functions and
indexes.
› DDL and DML are the parts of HIVE QL.
› Most interactions tend to take place over a command line
interface (CLI). Generally, HiveQL syntax is similar to
the SQL syntax that most data analysts are familiar with.
6
8. Difference
8
ON THE BASIS OF SQL Hive SQL
Update-commands in
table structure.
Update, delete, insert. Update, delete, insert.
Manages Relational data Data structures.
Transaction Supported Limited support supported.
Indexes Supported Supported
Data types It contain a total of five data types i.e.,
Integral, floating-point, fixed-point, text
and binary strings, temporal
It contains Boolean, integral, floating-
point, fixed-point, timestamp (nanosecond
precision) , Date, text and binary strings,
temporal, array, map, struct, Union
Functions Hundreds of built-in functions Hundreds of built-in functions
Map reduce Not supported Supported
9. HiveQL Built-in Operators
› Hive provides Built-in operators for Data operations to be
implemented on the tables present inside Hive warehouse.
› These operators are used for mathematical operations on
operands, and it will return specific value as per the logic
applied.
› Below are the main types of Built-in Operators in HiveQL:
• Relational Operators
• Arithmetic Operators
• Logical Operators
• Operators on Complex types
9
10. 10
RELATIONAL OPERATORS IN HIVE SQL
We use Relational operators for relationship comparisons
between two operands.
Operators such as equals, Not equals, less than, greater
than …etc.
The operand types are all number types in these Operators.
11. 11
Built-in
Operator
Description Operand
X = Y TRUE: if expression X is equivalent to expression Y
Otherwise FALSE.
It takes all primitive types
X != Y TRUE: If expression X is not equivalent to expression Y
Otherwise FALSE.
It takes all primitive types
X < Y TRUE: if expression X is less than expression Y
Otherwise FALSE.
It takes all primitive types
X <= Y TRUE: if expression X is less than or equal to expression Y
Otherwise FALSE.
It takes all primitive types
X>Y TRUE: if expression X is greater than expression Y
Otherwise FALSE.
It takes all primitive types
X>= Y TRUE: if expression X is greater than or equal to expression Y
Otherwise FALSE.
It takes all primitive types
X IS NULL TRUE: if expression X evaluates to NULL otherwise FALSE. It takes all types
X IS NOT NULL FALSE: If expression X evaluates to NULL otherwise TRUE. It takes all types
X REGEXP Y Same as RLIKE. Takes only Strings
The following Table will give us details about Relational operators and its usage in
HiveQL:
12. 12
HiveQL Arithmetic Operators
We use Arithmetic operators for performing arithmetic operations
on operands.
Arithmetic operations such as addition, subtraction,
multiplication and division between operands we use these
Operators.
The operand types all are number types in these Operators.
Sample Example:
2 + 3 gives result 5.
In this example, ‘+’ is theoperator and 2 and 3 are operands. The
return value is 5
13. 13
The following Table will give us details about Arithmetic operators in
Hive Query Language:
Built-in
Operator
Description Operand
X + Y It will return the output of adding X and Y value. It takes all number types
X – Y It will return the output of subtracting Y from X value. It takes all number types
X * Y It will return the output of multiplying X and Y values. It takes all number types
X / Y It will return the output of dividing Y from X. It takes all number types
X % Y It will return the remainder resulting from dividing X by Y. It takes all number types
X & Y It will return the output of bitwise AND of X and Y. It takes all number types
X | Y It will return the output of bitwise OR of X and Y. It takes all number types
X ^ Y It will return the output of bitwise XOR of X and Y. It takes all number types
~X It will return the output of bitwise NOT of X. It takes all number types
14. 14
HiveQL Logical Operators
We use Logical operators for performing Logical
operations on operands.
Logical operations such as AND, OR, NOT
between operands we use these Operators.
The operand types all are BOOLEAN type in these
Operators.
15. 15
The following Table will give us details about Logical
operators in HiveSQL:
Operator
s
Description Operands
X AND Y TRUE if both X and Y are TRUE, otherwise FALSE. Boolean types only
X && Y Same as X AND Y but here we using && symbol Boolean types only
X OR Y TRUE if either X or Y or both are TRUE, otherwise FALSE. Boolean types only
X || Y Same as X OR Y but here we using || symbol Boolean types only
NOT X TRUE if X is FALSE, otherwise FALSE. Boolean types only
!X Same as NOT X but here we using! symbol Boolean types only
16. 16
OPERATORS ON COMPLEX TYPES
The following Table will give us details about Complex Type
Operators.
These are operators which will provide a different mechanism to
access elements in complex types.
Operators Operands Description
A[n] A is an Array and n is an
integer type.
It will return nth element in
the array A. The first element
has index of 0.
M[key] M is a Map<K, V> and
key has type K.
It will return the values
belongs to the key in the
map.
Nanu
RDBMS stands for Relational Database Management System. RDBMS is a such type of database management system which is specifically designed for relational databases. RDBMS is a subset of DBMS. A relational database refers to a database that stores data in a structured format using rows and columns and that structured form is known as table.
Shreya
SQL-92 was the third revision of the SQL database query language. MySQL is an open-source relational database management system. It is procedural language extension to SQL often called as PL/SQL.
Shreya
HIVEQL ia an query language for hive to process and analyze structured data in metastore.
SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system also known as RDBMS. It is also useful in handling structured data, i.e., data incorporating relations among entities and variables. SQL is a standard language for storing, manipulating, and retrieving data in databases.
MapReduce is a programming paradigm that enables massive scalability across hundreds or thousands of servers in a Hadoop cluster.