SlideShare a Scribd company logo
1 of 17
Are you Ready for Big Data?

         Dr. PutchongUthayopas
        Department of Computer
         Engineering, Faculty of
    Engineering, Kasetsart University.
               pu@ku.ac.th
We are living in the world of Data


                                                         Video
                                                      Surveillance

           Social Media




Mobile Sensors




                                                      Gene Sequencing
  Smart Grids
                   Geophysical      Medical Imaging
                   Exploration
Big Data
“Big data is data that exceeds the processing capacity of
conventional database systems. The data is too
big, moves too fast, or doesn’t fit the strictures of your
database architectures. To gain value from this data, you
must choose an alternative way to process it.”




         Reference: “What is big data? An introduction to the big data
         landscape.”, EddDumbill, http://radar.oreilly.com/2012/01/what-is-big-
         data.html
The Value of Big Data
• Analytical use
  – Big data analytics can reveal insights hidden
    previously by data too costly to process.
     • peer influence among customers, revealed by analyzing
       shoppers’ transactions, social and geographical data.
  – Being able to process every item of data in reasonable
    time removes the troublesome need for sampling and
    promotes an investigative approach to data.
• Enabling new products.
  – Facebookhas been able to craft a highly personalized
    user experience and create a new kind of advertising
    business
3 Characteristics of Big Data

Volume     • Volumes of data are larger than those conventional
             relational database infrastructures can cope with




           • Rate at which data flows in is much faster.
Velocity     • Mobile event and interaction by users.
             • Video, image , audio from users



           • the source data is diverse, and doesn’t fall into neat

Variety      relational structures eg. text from social
             networks, image data, a raw feed directly from a
             sensor source.
Big Data Challenge
• Volume
  – How to process data so big that can not be move, or
    store.
• Velocity
  – A lot of data coming very fast so it can not be stored
    such as Web usage log , Internet, mobile messages.
    Stream processing is needed to filter unused data or
    extract some knowledge real-time.
• Variety
  – So many type of unstructured data format making
    conventional database useless.
How to deal with big data
    • Integration of
          –   Storage
          –   Processing
          –   Analysis Algorithm
          –   Visualization                  Processing



Massive
 Data             Stream                     Processing       Visualize
Stream          processing

                                   Storage
                                             Processing
                                                          Analysis
A New Approach For Distributed Big
    L.A.
             Data
           BOSTON    LONDON         L.A.     BOSTON    LONDON




       Storage Islands               Single Storage Pool

•   Disparate Systems          •   Single System Across Locations
•   Manual Administration      •   Automated Policies
•   One Tenant, Many Systems   •   Many Tenants One System
•   IT Provisioned Storage     •   Self-Service Access
Hadoop
• Hadoopis a platform for distributing computing problems across a
  number of servers. First developed and released as open source by
  Yahoo.
   – Implements the MapReduce approach pioneered by Google in
     compiling its search indexes.
   – Distributing a dataset among multiple servers and operating on the
     data: the “map” stage. The partial results are then recombined: the
     “reduce” stage.
• Hadooputilizes its own distributed filesystem, HDFS, which makes
  data available to multiple computing nodes
• Hadoopusage pattern involves three stages:
   – loading data into HDFS,
   – MapReduce operations, and
   – retrieving results from HDFS.
WHAT FACEBOOK KNOWS




                               Cameron Marlow calls himself Facebook's "in-
                               house sociologist." He and his team can analyze
http://www.facebook.com/data   essentially all the information the site gathers.
Study of Human Society
• Facebook, in collaboration with the University
  of Milan, conducted experiment that involved
  – the entire social network as of May 2011
  – more than 10 percent of the world's population.
• Analyzing the 69 billion friend connections
  among those 721 million people showed that
  – four intermediary friends are usually enough to
    introduce anyone to a random stranger.
The links of Love
•   Often young women specify that
    they are “in a relationship” with
    their “best friend forever”.
     – Roughly 20% of all relationships for
       the 15-and-under crowd are
       between girls.
     – This number dips to 15% for 18-
       year-olds and is just 7% for 25-year-
       olds.
•   Anonymous US users who were
    over 18 at the start of the
    relationship
     – the average of the shortest number
       of steps to get from any one U.S.
       user to any other individual is 16.7.
     – This is much higher than the 4.74
       steps you’d need to go from any
       Facebook user to another through
       friendship, as opposed to                          Graph shown the relationship of anonymous US users who were over
       romantic, ties.                                    18 at the start of the relationship.


                  http://www.facebook.com/notes/facebook-data-team/the-links-of-
                  love/10150572088343859
Why?
• Facebook can improve users experience
  – make useful predictions about users' behavior
  – make better guesses about which ads you might
    be more or less open to at any given time
• Right before Valentine's Day this year a blog
  post from the Data Science Team listed the
  songs most popular with people who had
  recently signaled on Facebook that they had
  entered or left a relationship
How facebook handle Big Data?
• Facebook built its data storage system using open-
  source software called Hadoop.
   – Hadoop spreading them across many machines inside a
     data center.
   – Use Hive, open-source that acts as a translation
     service, making it possible to query vast Hadoop data
     stores using relatively simple code.
• Much of Facebook's data resides in one Hadoop store
  more than 100 petabytes (a million gigabytes) in
  size, says SameetAgarwal, a director of engineering at
  Facebook who works on data infrastructure, and the
  quantity is growing exponentially. "Over the last few
  years we have more than doubled in size every year,”
The Journey To Big Data

1   All Data
    Faster Answers
    Elastic & Scalable
                                 2   Data Science
                                     Collaboration
                                     Self-Service
                                                                      3   Real Time Decisions
                                                                          New Applications
                                                                          Data Monetization



                                                                      Big Data Enabled Apps

                                      Agile Process & Tools



         AnalyticsEngines
         Analytic Engines
                                     Analytic Productivity Platform


       Cloud Infrastructure


    Big Data Infrastructure                Agile Analytics             Predictive Enterprise
           Technology Focus             People & Productivity Focus        Application Focus
Data Tsunami
• Data flood is coming, no
  where to run now!
  – Data being generated
    anytime, anywhere, anyone
  – Data is moving in fast
  – Data is too big to move, too
    big to store
• Better be prepare
  – Use this to enhance your
    business and offer better
    services to customer
Thank you

More Related Content

What's hot

Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersMelinda Thielbar
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataVipin Batra
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data AnalyticsS P Sajjan
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataHaluan Irsad
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAmpoolIO
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big datakk1718
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentalsrjain51
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataJoey Li
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysisPoonam Kshirsagar
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsChandan Rajah
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalstelligence
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningPolash Halder
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1gauravsc36
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBernard Marr
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabatinabati
 

What's hot (20)

Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl Winters
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data mining
Big data miningBig data mining
Big data mining
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and Benefits
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-final
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 

Viewers also liked

Career Track: Business Analytics and Intelligence@NIDA โดย อาจารย์ ดร. อานนท์...
Career Track: Business Analytics and Intelligence@NIDA โดย อาจารย์ ดร. อานนท์...Career Track: Business Analytics and Intelligence@NIDA โดย อาจารย์ ดร. อานนท์...
Career Track: Business Analytics and Intelligence@NIDA โดย อาจารย์ ดร. อานนท์...BAINIDA
 
บทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-Magazineบทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-MagazineIMC Institute
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
 

Viewers also liked (14)

Using Big Data in Educational Assessment
Using Big Data in Educational AssessmentUsing Big Data in Educational Assessment
Using Big Data in Educational Assessment
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Feature (Attribute) Selection with RapidMiner Studio 6
Introduction to Feature (Attribute) Selection with RapidMiner Studio 6Introduction to Feature (Attribute) Selection with RapidMiner Studio 6
Introduction to Feature (Attribute) Selection with RapidMiner Studio 6
 
Introduction to Weka: Application approach
Introduction to Weka: Application approachIntroduction to Weka: Application approach
Introduction to Weka: Application approach
 
Career Track: Business Analytics and Intelligence@NIDA โดย อาจารย์ ดร. อานนท์...
Career Track: Business Analytics and Intelligence@NIDA โดย อาจารย์ ดร. อานนท์...Career Track: Business Analytics and Intelligence@NIDA โดย อาจารย์ ดร. อานนท์...
Career Track: Business Analytics and Intelligence@NIDA โดย อาจารย์ ดร. อานนท์...
 
Search Twitter with RapidMiner Studio 6
Search Twitter with RapidMiner Studio 6Search Twitter with RapidMiner Studio 6
Search Twitter with RapidMiner Studio 6
 
Introduction to Data Analytics with RapidMiner Studio 6 (ภาษาไทย)
Introduction to Data Analytics with RapidMiner Studio 6 (ภาษาไทย)Introduction to Data Analytics with RapidMiner Studio 6 (ภาษาไทย)
Introduction to Data Analytics with RapidMiner Studio 6 (ภาษาไทย)
 
Preprocessing with RapidMiner Studio 6
Preprocessing with RapidMiner Studio 6Preprocessing with RapidMiner Studio 6
Preprocessing with RapidMiner Studio 6
 
Building Decision Tree model with numerical attributes
Building Decision Tree model with numerical attributesBuilding Decision Tree model with numerical attributes
Building Decision Tree model with numerical attributes
 
Evaluation metrics: Precision, Recall, F-Measure, ROC
Evaluation metrics: Precision, Recall, F-Measure, ROCEvaluation metrics: Precision, Recall, F-Measure, ROC
Evaluation metrics: Precision, Recall, F-Measure, ROC
 
Introduction to Data Mining and Big Data Analytics
Introduction to Data Mining and Big Data AnalyticsIntroduction to Data Mining and Big Data Analytics
Introduction to Data Mining and Big Data Analytics
 
บทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-Magazineบทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-Magazine
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similar to Are you ready for BIG DATA?

Similar to Are you ready for BIG DATA? (20)

BigData
BigDataBigData
BigData
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big Data
Big Data Big Data
Big Data
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Big data ankita1
Big data ankita1Big data ankita1
Big data ankita1
 
Understanding big data
Understanding big dataUnderstanding big data
Understanding big data
 
FR.pptx
FR.pptxFR.pptx
FR.pptx
 
Big data
Big dataBig data
Big data
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
M.Florence Dayana
M.Florence DayanaM.Florence Dayana
M.Florence Dayana
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 

More from Putchong Uthayopas (16)

Cri big data
Cri big dataCri big data
Cri big data
 
Future of hpc
Future of hpcFuture of hpc
Future of hpc
 
Education in Disrupted World
Education in Disrupted WorldEducation in Disrupted World
Education in Disrupted World
 
Portrait Photography
Portrait PhotographyPortrait Photography
Portrait Photography
 
MOOC Wunca Talk
MOOC Wunca TalkMOOC Wunca Talk
MOOC Wunca Talk
 
Big Data on The Cloud
Big Data on The CloudBig Data on The Cloud
Big Data on The Cloud
 
Future of the cloud
Future of the cloud Future of the cloud
Future of the cloud
 
10 things
10 things10 things
10 things
 
IT trends for co-creation
IT trends for co-creationIT trends for co-creation
IT trends for co-creation
 
Cloud Computing: A New Trend in IT
Cloud Computing: A New Trend in ITCloud Computing: A New Trend in IT
Cloud Computing: A New Trend in IT
 
Learning Life and Photography
Learning Life and PhotographyLearning Life and Photography
Learning Life and Photography
 
What is Cloud Computing ?
What is Cloud Computing ?What is Cloud Computing ?
What is Cloud Computing ?
 
Simple Introduction to Cloud for Users
Simple Introduction to Cloud for UsersSimple Introduction to Cloud for Users
Simple Introduction to Cloud for Users
 
The Building of Thai Grid
The Building of Thai GridThe Building of Thai Grid
The Building of Thai Grid
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
Project Evaluation
Project EvaluationProject Evaluation
Project Evaluation
 

Recently uploaded

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Are you ready for BIG DATA?

  • 1. Are you Ready for Big Data? Dr. PutchongUthayopas Department of Computer Engineering, Faculty of Engineering, Kasetsart University. pu@ku.ac.th
  • 2. We are living in the world of Data Video Surveillance Social Media Mobile Sensors Gene Sequencing Smart Grids Geophysical Medical Imaging Exploration
  • 3. Big Data “Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.” Reference: “What is big data? An introduction to the big data landscape.”, EddDumbill, http://radar.oreilly.com/2012/01/what-is-big- data.html
  • 4. The Value of Big Data • Analytical use – Big data analytics can reveal insights hidden previously by data too costly to process. • peer influence among customers, revealed by analyzing shoppers’ transactions, social and geographical data. – Being able to process every item of data in reasonable time removes the troublesome need for sampling and promotes an investigative approach to data. • Enabling new products. – Facebookhas been able to craft a highly personalized user experience and create a new kind of advertising business
  • 5. 3 Characteristics of Big Data Volume • Volumes of data are larger than those conventional relational database infrastructures can cope with • Rate at which data flows in is much faster. Velocity • Mobile event and interaction by users. • Video, image , audio from users • the source data is diverse, and doesn’t fall into neat Variety relational structures eg. text from social networks, image data, a raw feed directly from a sensor source.
  • 6. Big Data Challenge • Volume – How to process data so big that can not be move, or store. • Velocity – A lot of data coming very fast so it can not be stored such as Web usage log , Internet, mobile messages. Stream processing is needed to filter unused data or extract some knowledge real-time. • Variety – So many type of unstructured data format making conventional database useless.
  • 7. How to deal with big data • Integration of – Storage – Processing – Analysis Algorithm – Visualization Processing Massive Data Stream Processing Visualize Stream processing Storage Processing Analysis
  • 8. A New Approach For Distributed Big L.A. Data BOSTON LONDON L.A. BOSTON LONDON Storage Islands Single Storage Pool • Disparate Systems • Single System Across Locations • Manual Administration • Automated Policies • One Tenant, Many Systems • Many Tenants One System • IT Provisioned Storage • Self-Service Access
  • 9. Hadoop • Hadoopis a platform for distributing computing problems across a number of servers. First developed and released as open source by Yahoo. – Implements the MapReduce approach pioneered by Google in compiling its search indexes. – Distributing a dataset among multiple servers and operating on the data: the “map” stage. The partial results are then recombined: the “reduce” stage. • Hadooputilizes its own distributed filesystem, HDFS, which makes data available to multiple computing nodes • Hadoopusage pattern involves three stages: – loading data into HDFS, – MapReduce operations, and – retrieving results from HDFS.
  • 10. WHAT FACEBOOK KNOWS Cameron Marlow calls himself Facebook's "in- house sociologist." He and his team can analyze http://www.facebook.com/data essentially all the information the site gathers.
  • 11. Study of Human Society • Facebook, in collaboration with the University of Milan, conducted experiment that involved – the entire social network as of May 2011 – more than 10 percent of the world's population. • Analyzing the 69 billion friend connections among those 721 million people showed that – four intermediary friends are usually enough to introduce anyone to a random stranger.
  • 12. The links of Love • Often young women specify that they are “in a relationship” with their “best friend forever”. – Roughly 20% of all relationships for the 15-and-under crowd are between girls. – This number dips to 15% for 18- year-olds and is just 7% for 25-year- olds. • Anonymous US users who were over 18 at the start of the relationship – the average of the shortest number of steps to get from any one U.S. user to any other individual is 16.7. – This is much higher than the 4.74 steps you’d need to go from any Facebook user to another through friendship, as opposed to Graph shown the relationship of anonymous US users who were over romantic, ties. 18 at the start of the relationship. http://www.facebook.com/notes/facebook-data-team/the-links-of- love/10150572088343859
  • 13. Why? • Facebook can improve users experience – make useful predictions about users' behavior – make better guesses about which ads you might be more or less open to at any given time • Right before Valentine's Day this year a blog post from the Data Science Team listed the songs most popular with people who had recently signaled on Facebook that they had entered or left a relationship
  • 14. How facebook handle Big Data? • Facebook built its data storage system using open- source software called Hadoop. – Hadoop spreading them across many machines inside a data center. – Use Hive, open-source that acts as a translation service, making it possible to query vast Hadoop data stores using relatively simple code. • Much of Facebook's data resides in one Hadoop store more than 100 petabytes (a million gigabytes) in size, says SameetAgarwal, a director of engineering at Facebook who works on data infrastructure, and the quantity is growing exponentially. "Over the last few years we have more than doubled in size every year,”
  • 15. The Journey To Big Data 1 All Data Faster Answers Elastic & Scalable 2 Data Science Collaboration Self-Service 3 Real Time Decisions New Applications Data Monetization Big Data Enabled Apps Agile Process & Tools AnalyticsEngines Analytic Engines Analytic Productivity Platform Cloud Infrastructure Big Data Infrastructure Agile Analytics Predictive Enterprise Technology Focus People & Productivity Focus Application Focus
  • 16. Data Tsunami • Data flood is coming, no where to run now! – Data being generated anytime, anywhere, anyone – Data is moving in fast – Data is too big to move, too big to store • Better be prepare – Use this to enhance your business and offer better services to customer

Editor's Notes

  1. The sources of information are expanding. Many new sources are machine generated. It’s also big files (siesmic scans can be 5TB per file) and massive numbers of small files (email, social media).Leading companies for decades have always sought to leverage new sources of data, and the insights that can be gleaned from those data sources, as new sources of competitive advantage.More detailed structured dataNew unstructured dataDevice-generated dataBut big data isn’t only about data, a comprehensive big data strategy also needs to consider the role and prominence of new, enabling-technologies such as:Scale out storageMPP database architecturesHadoop and the Hadoop ecosystemIn-database analyticsIn-memory computingData virtualizationData visualization
  2. Content and service providers as well as global organizations that need to distribute large content files are challenged with managing and ensuring performance of these distributed systems. Thus a new approach using a single storage pool in the cloud that provides policies for content placement, multi-tenancy and self service can be beneficial to their business.
  3. We’ve found our early adopter customer use a common approach to their journey to big data. First, they built on an infrastructure foundation that consists of elastic and scalable storage as well as analytics that can access all types of data. Next, they focus on improving the analytics processLastly, they embed Big Data into their applications and enable actionable insight. We found that customers who’ve used this approach have been able to transform into a more predictive enterprise.