SlideShare a Scribd company logo
1 of 14
Apache Kite 
MAKING LIFE EASIER IN HADOOP
Why Kite? 
1. Codify expert patterns and practices for building data-oriented systems and applications. 
2. Let developers focus on business logic, not plumbing or infrastructure. 
3. Provide smart defaults for platform choices. 
4. Support piecemeal adoption via loosely-coupled modules.
Kite Data Module 
Provides APIs to interact with data in Hadoop. 
The data module contains APIs and utilities for defining and performing actions on datasets. 
 entities 
 schemas 
 datasets 
 dataset repositories 
 loading data 
 dataset writers 
 viewing data
Entities, Schemas 
Entity: 
A single record in the dataset. (More like a plain java object, analogous to a row in relational 
RDBMS table) 
Schemas: 
A schema specify fieldname and the data type for a dataset. Kite relies on Apache AVRO for the 
same. 
◦ Using Java API (AVRO schema is inferred from a Java class/ AVRO data) 
◦ Using command line argument (AVRO schema is inferred from a Java class/ CSV data)
Datasets 
Datasets: 
A collection of zero or more entities. It is analogous to a RDBMS table. 
The HDFS implementation of dataset is stored as Snappy-compressed Avro data files by default. 
We can also store it in column oriented Parquet format. 
◦ Performance of a dataset can be increased by partition strategy 
◦ Based on one or more fields in the entity 
◦ Partitioning can be done using Hash, Identity or Date (year, month, day, hour) strategies 
◦ It provides coarse grained organization 
◦ Partition strategy is configured with a JSON-based format 
◦ Partition strategy can be applied only when dataset is created and cannot be altered later on. 
◦ We can work with a subset of dataset entities using Views API. 
◦ Datasets are identified using URIs.
Datasets 
◦ Dataset URIs: 
Depending on the data set scheme, we can specify dataset URI using one of the following pattern. 
Hive dataset:hive:<namespace>/<dataset> 
HDFS dataset:hdfs:/<path>/<namespace>/<dataset-name> 
Local FS dataset:file:/<path>/<namespace>/<dataset-name> 
HBase dataset:hbase:<zookeeper>/<dataset-name> 
◦ View URIs: 
A view URI is constructed by changing the prefix of a dataset URI from ‘dataset:’ to ‘view:’. The 
query arguments can be added as name/value pairs, similar to query arguments in HTTP URL.
Dataset Repositories, Loading, Dataset 
Writers, Viewing Data 
Dataset Repositories: 
• The physical storage location for datasets. It is equivalent to database in RDBMS model. 
• Required for logical grouping, security, access controls, backup policies, etc. 
• Each dataset belong to exactly one dataset repository. 
• Kite does not provide the functionality of copying/moving a dataset from one dataset repository to 
another. (However, it can be done via Map Reduce) 
Loading: 
• We can load comma separated values into dataset repository using CLI. 
Dataset Writers: 
• Used to add entities to datasets. 
Viewing Data: 
• We can query the data using Hive/Impala 
• We can also use CLI.
Kite Dataset Lifecycle
Generate Schema 
•A Kite dataset is defined using an Avro schema. 
•It can manually written or generated from Java object/CSV data file. 
• CLI command for: 
• Schema generation from Java class 
kite-dataset obj-schema org.kitesdk.cli.example.Movie -o movie.avsc 
• Schema generation from CSV file 
kite-dataset csv-schema movie.csv --class Movie -o movie.avsc
Example – Schema Generation 
package org.kitesdk.examples.data; 
/** Movie class */ 
class Movie { 
private int id; 
private String title; 
private String releaseDate; 
. . . 
public Movie() { 
// Empty constructor for serialization 
purposes 
} 
} 
{ 
"type":"record", 
"name":"Movie", 
"namespace":"org.kitesdk.examples.data", 
"fields":[ 
{"name":"id","type":"int"}, 
{"name":"title","type":"string"}, 
{"name":"releaseDate","type":"string"}, 
] 
}
Create Dataset 
Dataset is created using the Avro schema. 
kite-dataset create movie --schema movie.avsc 
Partition Strategy: 
• Logical partitions for improving performance 
• Specified using a JSON file 
Example: movie.json 
[ { 
"source" : "id", 
"type" : "int", 
"name" : "id" 
}] 
kite-dataset create movie --schema movie.avsc partition-by movie.json
Create Dataset 
Column Mapping: 
•Specifies how data should be stored in Hbase for maximum performance 
•Specified in JSON file 
• Each definition is a JSON object with following fields 
• SOURCE – The field in the entity 
• TYPE – Where the field data is stored (cells in Hbase) 
• FAMILY – the column family in Hbase table 
• QUALIFIER – the column name in Hbase table 
Example 
{"source" : "timestamp", "type" : "column", "family" : "m", "qualifier" : "ts"} 
• There are five mapping types: 
1. Column 2. Counter 3. keyAsColumn 4. Key 5. Version
Populate-Validate-Update-Annihilate 
Dataset 
Populate Dataset: 
There are various ways to populate data to dataset 
• Importing form csv files 
• Copying from another dataset 
• Using Flume ingestion, etc. 
Validate Dataset: 
‘SHOW’ command can be used to validate the data loaded. 
Update Dataset: 
Kite supports schema evolution as AVRO. 
Annihilate Dataset: 
Delete dataset when it is not required.
QUERIES?

More Related Content

What's hot

Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopGruter
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondGruter
 
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special EventApache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special EventGruter
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataGruter
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drillJulien Le Dem
 
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Wes McKinney
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalDataWorks Summit
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalogmarkgrover
 
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11Hyunsik Choi
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Gruter
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Apache Hive authorization models
Apache Hive authorization modelsApache Hive authorization models
Apache Hive authorization modelsThejas Nair
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorialawesomesos
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...DataWorks Summit
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoopmarkgrover
 
An intriduction to hive
An intriduction to hiveAn intriduction to hive
An intriduction to hiveReza Ameri
 

What's hot (20)

Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its Beyond
 
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special EventApache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Hive Hadoop
Hive HadoopHive Hadoop
Hive Hadoop
 
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposal
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalog
 
Hadoop
HadoopHadoop
Hadoop
 
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Apache Hive authorization models
Apache Hive authorization modelsApache Hive authorization models
Apache Hive authorization models
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
 
An intriduction to hive
An intriduction to hiveAn intriduction to hive
An intriduction to hive
 

Similar to Apache Kite

DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)Data Finder
 
Leveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesLeveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesThanigai Vellore
 
Solid fire cloudstack storage overview - CloudStack European User Group
Solid fire cloudstack storage overview - CloudStack European User GroupSolid fire cloudstack storage overview - CloudStack European User Group
Solid fire cloudstack storage overview - CloudStack European User GroupShapeBlue
 
CloudStack Meetup London - Primary Storage Presentation by SolidFire
CloudStack Meetup London - Primary Storage Presentation by SolidFire CloudStack Meetup London - Primary Storage Presentation by SolidFire
CloudStack Meetup London - Primary Storage Presentation by SolidFire NetApp
 
Deep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech TalksDeep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech TalksAmazon Web Services
 
Building highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache SparkBuilding highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache SparkMartin Toshev
 
DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementAndreas Schreiber
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG
 
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...Peter Keane
 
Accessing external hadoop data sources using pivotal e xtension framework (px...
Accessing external hadoop data sources using pivotal e xtension framework (px...Accessing external hadoop data sources using pivotal e xtension framework (px...
Accessing external hadoop data sources using pivotal e xtension framework (px...Sameer Tiwari
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsAndreas Schreiber
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfan
 
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...Ivanti
 
Building Hadoop Data Applications with Kite
Building Hadoop Data Applications with KiteBuilding Hadoop Data Applications with Kite
Building Hadoop Data Applications with Kitehuguk
 
ASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH dataASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH dataJohn Beresniewicz
 

Similar to Apache Kite (20)

DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)
 
Apache hive
Apache hiveApache hive
Apache hive
 
Leveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesLeveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot Architectures
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
 
Solid fire cloudstack storage overview - CloudStack European User Group
Solid fire cloudstack storage overview - CloudStack European User GroupSolid fire cloudstack storage overview - CloudStack European User Group
Solid fire cloudstack storage overview - CloudStack European User Group
 
CloudStack Meetup London - Primary Storage Presentation by SolidFire
CloudStack Meetup London - Primary Storage Presentation by SolidFire CloudStack Meetup London - Primary Storage Presentation by SolidFire
CloudStack Meetup London - Primary Storage Presentation by SolidFire
 
Deep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech TalksDeep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech Talks
 
Building highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache SparkBuilding highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache Spark
 
DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data Management
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
 
Php frameworks
Php frameworksPhp frameworks
Php frameworks
 
Accessing external hadoop data sources using pivotal e xtension framework (px...
Accessing external hadoop data sources using pivotal e xtension framework (px...Accessing external hadoop data sources using pivotal e xtension framework (px...
Accessing external hadoop data sources using pivotal e xtension framework (px...
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of Scientists
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -
 
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
 
Practical OData
Practical ODataPractical OData
Practical OData
 
Building Hadoop Data Applications with Kite
Building Hadoop Data Applications with KiteBuilding Hadoop Data Applications with Kite
Building Hadoop Data Applications with Kite
 
ASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH dataASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH data
 

Recently uploaded

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

Apache Kite

  • 1. Apache Kite MAKING LIFE EASIER IN HADOOP
  • 2. Why Kite? 1. Codify expert patterns and practices for building data-oriented systems and applications. 2. Let developers focus on business logic, not plumbing or infrastructure. 3. Provide smart defaults for platform choices. 4. Support piecemeal adoption via loosely-coupled modules.
  • 3. Kite Data Module Provides APIs to interact with data in Hadoop. The data module contains APIs and utilities for defining and performing actions on datasets.  entities  schemas  datasets  dataset repositories  loading data  dataset writers  viewing data
  • 4. Entities, Schemas Entity: A single record in the dataset. (More like a plain java object, analogous to a row in relational RDBMS table) Schemas: A schema specify fieldname and the data type for a dataset. Kite relies on Apache AVRO for the same. ◦ Using Java API (AVRO schema is inferred from a Java class/ AVRO data) ◦ Using command line argument (AVRO schema is inferred from a Java class/ CSV data)
  • 5. Datasets Datasets: A collection of zero or more entities. It is analogous to a RDBMS table. The HDFS implementation of dataset is stored as Snappy-compressed Avro data files by default. We can also store it in column oriented Parquet format. ◦ Performance of a dataset can be increased by partition strategy ◦ Based on one or more fields in the entity ◦ Partitioning can be done using Hash, Identity or Date (year, month, day, hour) strategies ◦ It provides coarse grained organization ◦ Partition strategy is configured with a JSON-based format ◦ Partition strategy can be applied only when dataset is created and cannot be altered later on. ◦ We can work with a subset of dataset entities using Views API. ◦ Datasets are identified using URIs.
  • 6. Datasets ◦ Dataset URIs: Depending on the data set scheme, we can specify dataset URI using one of the following pattern. Hive dataset:hive:<namespace>/<dataset> HDFS dataset:hdfs:/<path>/<namespace>/<dataset-name> Local FS dataset:file:/<path>/<namespace>/<dataset-name> HBase dataset:hbase:<zookeeper>/<dataset-name> ◦ View URIs: A view URI is constructed by changing the prefix of a dataset URI from ‘dataset:’ to ‘view:’. The query arguments can be added as name/value pairs, similar to query arguments in HTTP URL.
  • 7. Dataset Repositories, Loading, Dataset Writers, Viewing Data Dataset Repositories: • The physical storage location for datasets. It is equivalent to database in RDBMS model. • Required for logical grouping, security, access controls, backup policies, etc. • Each dataset belong to exactly one dataset repository. • Kite does not provide the functionality of copying/moving a dataset from one dataset repository to another. (However, it can be done via Map Reduce) Loading: • We can load comma separated values into dataset repository using CLI. Dataset Writers: • Used to add entities to datasets. Viewing Data: • We can query the data using Hive/Impala • We can also use CLI.
  • 9. Generate Schema •A Kite dataset is defined using an Avro schema. •It can manually written or generated from Java object/CSV data file. • CLI command for: • Schema generation from Java class kite-dataset obj-schema org.kitesdk.cli.example.Movie -o movie.avsc • Schema generation from CSV file kite-dataset csv-schema movie.csv --class Movie -o movie.avsc
  • 10. Example – Schema Generation package org.kitesdk.examples.data; /** Movie class */ class Movie { private int id; private String title; private String releaseDate; . . . public Movie() { // Empty constructor for serialization purposes } } { "type":"record", "name":"Movie", "namespace":"org.kitesdk.examples.data", "fields":[ {"name":"id","type":"int"}, {"name":"title","type":"string"}, {"name":"releaseDate","type":"string"}, ] }
  • 11. Create Dataset Dataset is created using the Avro schema. kite-dataset create movie --schema movie.avsc Partition Strategy: • Logical partitions for improving performance • Specified using a JSON file Example: movie.json [ { "source" : "id", "type" : "int", "name" : "id" }] kite-dataset create movie --schema movie.avsc partition-by movie.json
  • 12. Create Dataset Column Mapping: •Specifies how data should be stored in Hbase for maximum performance •Specified in JSON file • Each definition is a JSON object with following fields • SOURCE – The field in the entity • TYPE – Where the field data is stored (cells in Hbase) • FAMILY – the column family in Hbase table • QUALIFIER – the column name in Hbase table Example {"source" : "timestamp", "type" : "column", "family" : "m", "qualifier" : "ts"} • There are five mapping types: 1. Column 2. Counter 3. keyAsColumn 4. Key 5. Version
  • 13. Populate-Validate-Update-Annihilate Dataset Populate Dataset: There are various ways to populate data to dataset • Importing form csv files • Copying from another dataset • Using Flume ingestion, etc. Validate Dataset: ‘SHOW’ command can be used to validate the data loaded. Update Dataset: Kite supports schema evolution as AVRO. Annihilate Dataset: Delete dataset when it is not required.