SlideShare a Scribd company logo
1 of 28
BIG DATA
BY: ZEESHAN ALAM KHAN(MCA, AMU)
Big Data: A definition
• Big data is a collection of data sets so large and complex
that it becomes difficult to process using on-hand
database management tools. The challenges include
capture, curation, storage, search, sharing, analysis, and
visualization. The trend to larger data sets is due to the
additional information derivable from analysis of a single
large set of related data, as compared to separate smaller
sets with the same total amount of data, allowing
correlations to be found to "spot business trends,
determine quality of research, prevent diseases, link legal
citations, combat crime, and determine real-time roadway
traffic conditions. (Wikipedia)
Big Data: A definition
• Put another way, big data is the realization of greater
business intelligence by storing, processing, and
analyzing data that was previously ignored due to the
limitations of traditional data management
technologies
Source: Harness the Power of Big Data: The IBM Big Data Platform
Lots of data
• 2.5 quintillion bytes of data are generated every day!
– A quintillion is 1018
• Data come from many quarters.
– Social media sites
– Sensors
– Digital photos
– Business transactions
– Location-based data
Source: IBM http://www-01.ibm.com/software/data/bigdata/
The four dimensions of Big Data
• Volume: Large volumes of data
• Velocity: Quickly moving data
• Variety: structured, unstructured, images, etc.
• Veracity: Trust and integrity is a challenge and a must
and is important for big data just as for traditional
relational DBs
Source: IBM http://www-01.ibm.com/software/data/bigdata/
The four dimensions of use
• Aspects of the way in which users want to interact
with their data…
– Totality: Users have an increased desire to process and
analyze all available data
– Exploration: Users apply analytic approaches where the
schema is defined in response to the nature of the query
– Frequency: Users have a desire to increase the rate of
analysis in order to generate more accurate and timely
business intelligence
– Dependency: Users’ need to balance investment in existing
technologies and skills with the adoption of new techniques
Source: IBM http://www-01.ibm.com/software/data/bigdata/
So, in a nutshell
• Big Data is about better analytics!
Why Big Data and BI
Source: Business Intelligence Strategy: A Framework for
Achieving BI Excellence
Source: Business Intelligence Strategy: A Framework for
Achieving BI Excellence
Big Data Conundrum
• Problems:
– Although there is a massive spike available data, the
percentage of the data that an enterprise can understand is
on the decline
– The data that the enterprise is trying to understand is
saturated with both useful signals and lots of noise
Source: IBM http://www-01.ibm.com/software/data/bigdata/
The Big Data platform Manifesto
imperatives and underlying technologies
IBM’s Big Data Platform
Some concepts
• NoSQL (Not Only SQL): Databases that “move
beyond” relational data models (i.e., no tables, limited
or no use of SQL)
– Focus on retrieval of data and appending new data (not
necessarily tables)
– Focus on key-value data stores that can be used to locate
data objects
– Focus on supporting storage of large quantities of
unstructured data
– SQL is not used for storage or retrieval of data
– No ACID (atomicity, consistency, isolation, durability)
NoSQL
• NoSQL focuses on a schema-less architecture (i.e.,
the data structure is not predefined)
• In contrast, traditional relation DBs require the
schema to be defined before the database is built and
populated.
– Data are structured
– Limited in scope
– Designed around ACID principles.
Hadoop
• Hadoop is a distributed file system and data processing
engine that is designed to handle extremely high volumes
of data in any structure.
• Hadoop has two components:
– The Hadoop distributed file system (HDFS), which supports data
in structured relational form, in unstructured form, and in any
form in between
– The MapReduce programing paradigm for managing
applications on multiple distributed servers
• The focus is on supporting redundancy, distributed
architectures, and parallel processing
Some Hadoop Related
Names to Know
• Apache Avro: designed for communication between
Hadoop nodes through data serialization
• Cassandra and Hbase: a non-relational database designed
for use with Hadoop
• Hive: a query language similar to SQL (HiveQL) but
compatible with Hadoop
• Mahout: an AI tool designed for machine learning; that is,
to assist with filtering data for analysis and exploration
• Pig Latin: A data-flow language and execution framework
for parallel computation
• ZooKeeper: Keeps all the parts coordinated and working
together
What to do with the data
Parallels with Data Warehousing
Data Warehouses
• Extraction
• Transformation
• Load
• Connector
• Processing
• User Management
Connector Framework
• Supports access to data by creating indexes that can
be used for access to the data in its native repository
(i.e., it does not manage the data, it keeps track of
where it is located)
Processing Layer
• Two primary functions:
– Indexes content: data are crawled, parsed, and analyzed
with the result that contents are indexed and located
• Processes queries
– Manages access to various servers hosting the indexed and
searchable content
Annotated Query Language
• AQL is an SQL-like declarative language for
performing text analysis and extraction
create view PersonPhone as select P.name as person, N.number as phone
from Person P, Phone PN, Sentence S where Follows(P.name. PN.number, 0, 30)
and Contains(S.sentence, P.name) and Contains(S.sentence, PN.number)
and ContainsRegex(/b(phone|at)b/, SpanBetween(P.name, PN.number));
The provenance viewer
Machine data analysis
Some resources
• BigInsights Wiki
• Information Management Bookstore
• BigData University

More Related Content

What's hot

Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data WarehousingAmdocs
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop siliconsudipt
 
Data Warehouse and Data Mining
Data Warehouse and Data MiningData Warehouse and Data Mining
Data Warehouse and Data MiningRanak Ghosh
 
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadBig Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadThink Big, a Teradata Company
 
Data Warehousing and Mining
Data Warehousing and MiningData Warehousing and Mining
Data Warehousing and Miningethantelaviv
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
 
Data as a service
Data as a serviceData as a service
Data as a serviceZoltan Nagy
 
Cortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogCortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogMSAdvAnalytics
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
 
Global IT Outsourcing case study
Global IT Outsourcing case studyGlobal IT Outsourcing case study
Global IT Outsourcing case studyNandita Nityanandam
 
Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15madynav
 
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.Łukasz Grala
 
introduction to data warehousing and mining
 introduction to data warehousing and mining introduction to data warehousing and mining
introduction to data warehousing and miningRajesh Chandra
 
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Denodo
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatiaSatish Bhatia
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesFellowBuddy.com
 
3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive AnalyticsNandita Nityanandam
 

What's hot (20)

Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
Data Warehouse and Data Mining
Data Warehouse and Data MiningData Warehouse and Data Mining
Data Warehouse and Data Mining
 
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadBig Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
 
Data Warehousing and Mining
Data Warehousing and MiningData Warehousing and Mining
Data Warehousing and Mining
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Data as a service
Data as a serviceData as a service
Data as a service
 
Cortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogCortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data Catalog
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
Global IT Outsourcing case study
Global IT Outsourcing case studyGlobal IT Outsourcing case study
Global IT Outsourcing case study
 
Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15
 
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.
 
introduction to data warehousing and mining
 introduction to data warehousing and mining introduction to data warehousing and mining
introduction to data warehousing and mining
 
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatia
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
 
3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics
 

Similar to Introduction to BIG DATA

SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview Rajesh Menon
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)Moacyr Passador
 
Eclipse day Sydney 2014 BIG data presentation
Eclipse day Sydney 2014 BIG data presentationEclipse day Sydney 2014 BIG data presentation
Eclipse day Sydney 2014 BIG data presentationSai Paravastu
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptxAlbert Alex
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopSri Kanth
 
Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion ahmed alshikh
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxAIMLSEMINARS
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 

Similar to Introduction to BIG DATA (20)

Big Data
Big DataBig Data
Big Data
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
 
Eclipse day Sydney 2014 BIG data presentation
Eclipse day Sydney 2014 BIG data presentationEclipse day Sydney 2014 BIG data presentation
Eclipse day Sydney 2014 BIG data presentation
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Overview of Bigdata Analytics
Overview of Bigdata Analytics Overview of Bigdata Analytics
Overview of Bigdata Analytics
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Big Data
Big DataBig Data
Big Data
 

More from Zeeshan Khan

Spring security4.x
Spring security4.xSpring security4.x
Spring security4.xZeeshan Khan
 
Micro services overview
Micro services overviewMicro services overview
Micro services overviewZeeshan Khan
 
XML / WEB SERVICES & RESTful Services
XML / WEB SERVICES & RESTful ServicesXML / WEB SERVICES & RESTful Services
XML / WEB SERVICES & RESTful ServicesZeeshan Khan
 
Collection framework (completenotes) zeeshan
Collection framework (completenotes) zeeshanCollection framework (completenotes) zeeshan
Collection framework (completenotes) zeeshanZeeshan Khan
 
JUnit with_mocking
JUnit with_mockingJUnit with_mocking
JUnit with_mockingZeeshan Khan
 
Android application development
Android application developmentAndroid application development
Android application developmentZeeshan Khan
 

More from Zeeshan Khan (12)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Spring security4.x
Spring security4.xSpring security4.x
Spring security4.x
 
Micro services overview
Micro services overviewMicro services overview
Micro services overview
 
XML / WEB SERVICES & RESTful Services
XML / WEB SERVICES & RESTful ServicesXML / WEB SERVICES & RESTful Services
XML / WEB SERVICES & RESTful Services
 
Manual Testing
Manual TestingManual Testing
Manual Testing
 
Collection framework (completenotes) zeeshan
Collection framework (completenotes) zeeshanCollection framework (completenotes) zeeshan
Collection framework (completenotes) zeeshan
 
JUnit with_mocking
JUnit with_mockingJUnit with_mocking
JUnit with_mocking
 
OOPS in Java
OOPS in JavaOOPS in Java
OOPS in Java
 
Java
JavaJava
Java
 
Big data
Big dataBig data
Big data
 
Android application development
Android application developmentAndroid application development
Android application development
 
Cyber crime
Cyber crimeCyber crime
Cyber crime
 

Recently uploaded

IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119APNIC
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxAndrieCagasanAkio
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxNIMMANAGANTI RAMAKRISHNA
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxmibuzondetrabajo
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxMario
 

Recently uploaded (11)

IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptx
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptx
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptx
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptx
 

Introduction to BIG DATA

  • 1. BIG DATA BY: ZEESHAN ALAM KHAN(MCA, AMU)
  • 2. Big Data: A definition • Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools. The challenges include capture, curation, storage, search, sharing, analysis, and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions. (Wikipedia)
  • 3. Big Data: A definition • Put another way, big data is the realization of greater business intelligence by storing, processing, and analyzing data that was previously ignored due to the limitations of traditional data management technologies Source: Harness the Power of Big Data: The IBM Big Data Platform
  • 4. Lots of data • 2.5 quintillion bytes of data are generated every day! – A quintillion is 1018 • Data come from many quarters. – Social media sites – Sensors – Digital photos – Business transactions – Location-based data Source: IBM http://www-01.ibm.com/software/data/bigdata/
  • 5. The four dimensions of Big Data • Volume: Large volumes of data • Velocity: Quickly moving data • Variety: structured, unstructured, images, etc. • Veracity: Trust and integrity is a challenge and a must and is important for big data just as for traditional relational DBs Source: IBM http://www-01.ibm.com/software/data/bigdata/
  • 6. The four dimensions of use • Aspects of the way in which users want to interact with their data… – Totality: Users have an increased desire to process and analyze all available data – Exploration: Users apply analytic approaches where the schema is defined in response to the nature of the query – Frequency: Users have a desire to increase the rate of analysis in order to generate more accurate and timely business intelligence – Dependency: Users’ need to balance investment in existing technologies and skills with the adoption of new techniques Source: IBM http://www-01.ibm.com/software/data/bigdata/
  • 7. So, in a nutshell • Big Data is about better analytics!
  • 8. Why Big Data and BI Source: Business Intelligence Strategy: A Framework for Achieving BI Excellence
  • 9. Source: Business Intelligence Strategy: A Framework for Achieving BI Excellence
  • 10. Big Data Conundrum • Problems: – Although there is a massive spike available data, the percentage of the data that an enterprise can understand is on the decline – The data that the enterprise is trying to understand is saturated with both useful signals and lots of noise Source: IBM http://www-01.ibm.com/software/data/bigdata/
  • 11. The Big Data platform Manifesto imperatives and underlying technologies
  • 12. IBM’s Big Data Platform
  • 13. Some concepts • NoSQL (Not Only SQL): Databases that “move beyond” relational data models (i.e., no tables, limited or no use of SQL) – Focus on retrieval of data and appending new data (not necessarily tables) – Focus on key-value data stores that can be used to locate data objects – Focus on supporting storage of large quantities of unstructured data – SQL is not used for storage or retrieval of data – No ACID (atomicity, consistency, isolation, durability)
  • 14. NoSQL • NoSQL focuses on a schema-less architecture (i.e., the data structure is not predefined) • In contrast, traditional relation DBs require the schema to be defined before the database is built and populated. – Data are structured – Limited in scope – Designed around ACID principles.
  • 15. Hadoop • Hadoop is a distributed file system and data processing engine that is designed to handle extremely high volumes of data in any structure. • Hadoop has two components: – The Hadoop distributed file system (HDFS), which supports data in structured relational form, in unstructured form, and in any form in between – The MapReduce programing paradigm for managing applications on multiple distributed servers • The focus is on supporting redundancy, distributed architectures, and parallel processing
  • 16. Some Hadoop Related Names to Know • Apache Avro: designed for communication between Hadoop nodes through data serialization • Cassandra and Hbase: a non-relational database designed for use with Hadoop • Hive: a query language similar to SQL (HiveQL) but compatible with Hadoop • Mahout: an AI tool designed for machine learning; that is, to assist with filtering data for analysis and exploration • Pig Latin: A data-flow language and execution framework for parallel computation • ZooKeeper: Keeps all the parts coordinated and working together
  • 17. What to do with the data
  • 18. Parallels with Data Warehousing Data Warehouses • Extraction • Transformation • Load • Connector • Processing • User Management
  • 19. Connector Framework • Supports access to data by creating indexes that can be used for access to the data in its native repository (i.e., it does not manage the data, it keeps track of where it is located)
  • 20. Processing Layer • Two primary functions: – Indexes content: data are crawled, parsed, and analyzed with the result that contents are indexed and located • Processes queries – Manages access to various servers hosting the indexed and searchable content
  • 21. Annotated Query Language • AQL is an SQL-like declarative language for performing text analysis and extraction create view PersonPhone as select P.name as person, N.number as phone from Person P, Phone PN, Sentence S where Follows(P.name. PN.number, 0, 30) and Contains(S.sentence, P.name) and Contains(S.sentence, PN.number) and ContainsRegex(/b(phone|at)b/, SpanBetween(P.name, PN.number));
  • 22.
  • 25.
  • 26.
  • 27.
  • 28. Some resources • BigInsights Wiki • Information Management Bookstore • BigData University