Are you ready for BIG DATA?

Are you Ready for Big Data?

Dr. PutchongUthayopas
Department of Computer
Engineering, Faculty of
Engineering, Kasetsart University.
pu@ku.ac.th

We are living in the world of Data

Video
Surveillance

Social Media

Mobile Sensors

Gene Sequencing
Smart Grids
Geophysical Medical Imaging
Exploration

Big Data
“Big data is data that exceeds the processing capacity of
conventional database systems. The data is too
big, moves too fast, or doesn’t fit the strictures of your
database architectures. To gain value from this data, you
must choose an alternative way to process it.”

Reference: “What is big data? An introduction to the big data
landscape.”, EddDumbill, http://radar.oreilly.com/2012/01/what-is-big-
data.html

The Value of Big Data
• Analytical use
– Big data analytics can reveal insights hidden
previously by data too costly to process.
• peer influence among customers, revealed by analyzing
shoppers’ transactions, social and geographical data.
– Being able to process every item of data in reasonable
time removes the troublesome need for sampling and
promotes an investigative approach to data.
• Enabling new products.
– Facebookhas been able to craft a highly personalized
user experience and create a new kind of advertising
business

3 Characteristics of Big Data

Volume • Volumes of data are larger than those conventional
relational database infrastructures can cope with

• Rate at which data flows in is much faster.
Velocity • Mobile event and interaction by users.
• Video, image , audio from users

• the source data is diverse, and doesn’t fall into neat

Variety relational structures eg. text from social
networks, image data, a raw feed directly from a
sensor source.

Big Data Challenge
• Volume
– How to process data so big that can not be move, or
store.
• Velocity
– A lot of data coming very fast so it can not be stored
such as Web usage log , Internet, mobile messages.
Stream processing is needed to filter unused data or
extract some knowledge real-time.
• Variety
– So many type of unstructured data format making
conventional database useless.

How to deal with big data
• Integration of
– Storage
– Processing
– Analysis Algorithm
– Visualization Processing

Massive
Data Stream Processing Visualize
Stream processing

Storage
Processing
Analysis

A New Approach For Distributed Big
L.A.
Data
BOSTON LONDON L.A. BOSTON LONDON

Storage Islands Single Storage Pool

• Disparate Systems • Single System Across Locations
• Manual Administration • Automated Policies
• One Tenant, Many Systems • Many Tenants One System
• IT Provisioned Storage • Self-Service Access

Hadoop
• Hadoopis a platform for distributing computing problems across a
number of servers. First developed and released as open source by
Yahoo.
– Implements the MapReduce approach pioneered by Google in
compiling its search indexes.
– Distributing a dataset among multiple servers and operating on the
data: the “map” stage. The partial results are then recombined: the
“reduce” stage.
• Hadooputilizes its own distributed filesystem, HDFS, which makes
data available to multiple computing nodes
• Hadoopusage pattern involves three stages:
– loading data into HDFS,
– MapReduce operations, and
– retrieving results from HDFS.

WHAT FACEBOOK KNOWS

Cameron Marlow calls himself Facebook's "in-
house sociologist." He and his team can analyze
http://www.facebook.com/data essentially all the information the site gathers.

Study of Human Society
• Facebook, in collaboration with the University
of Milan, conducted experiment that involved
– the entire social network as of May 2011
– more than 10 percent of the world's population.
• Analyzing the 69 billion friend connections
among those 721 million people showed that
– four intermediary friends are usually enough to
introduce anyone to a random stranger.

The links of Love
• Often young women specify that
they are “in a relationship” with
their “best friend forever”.
– Roughly 20% of all relationships for
the 15-and-under crowd are
between girls.
– This number dips to 15% for 18-
year-olds and is just 7% for 25-year-
olds.
• Anonymous US users who were
over 18 at the start of the
relationship
– the average of the shortest number
of steps to get from any one U.S.
user to any other individual is 16.7.
– This is much higher than the 4.74
steps you’d need to go from any
Facebook user to another through
friendship, as opposed to Graph shown the relationship of anonymous US users who were over
romantic, ties. 18 at the start of the relationship.

http://www.facebook.com/notes/facebook-data-team/the-links-of-
love/10150572088343859

Why?
• Facebook can improve users experience
– make useful predictions about users' behavior
– make better guesses about which ads you might
be more or less open to at any given time
• Right before Valentine's Day this year a blog
post from the Data Science Team listed the
songs most popular with people who had
recently signaled on Facebook that they had
entered or left a relationship

How facebook handle Big Data?
• Facebook built its data storage system using open-
source software called Hadoop.
– Hadoop spreading them across many machines inside a
data center.
– Use Hive, open-source that acts as a translation
service, making it possible to query vast Hadoop data
stores using relatively simple code.
• Much of Facebook's data resides in one Hadoop store
more than 100 petabytes (a million gigabytes) in
size, says SameetAgarwal, a director of engineering at
Facebook who works on data infrastructure, and the
quantity is growing exponentially. "Over the last few
years we have more than doubled in size every year,”

The Journey To Big Data

1 All Data
Faster Answers
Elastic & Scalable
2 Data Science
Collaboration
Self-Service
3 Real Time Decisions
New Applications
Data Monetization

Big Data Enabled Apps

Agile Process & Tools

AnalyticsEngines
Analytic Engines
Analytic Productivity Platform

Cloud Infrastructure

Big Data Infrastructure Agile Analytics Predictive Enterprise
Technology Focus People & Productivity Focus Application Focus

Data Tsunami
• Data flood is coming, no
where to run now!
– Data being generated
anytime, anywhere, anyone
– Data is moving in fast
– Data is too big to move, too
big to store
• Better be prepare
– Use this to enhance your
business and offer better
services to customer

Are you ready for BIG DATA?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Are you ready for BIG DATA?

Similar to Are you ready for BIG DATA? (20)

More from Putchong Uthayopas

More from Putchong Uthayopas (16)

Recently uploaded

Recently uploaded (20)

Are you ready for BIG DATA?

Editor's Notes