SlideShare a Scribd company logo
Transforming Mobile Marketing & Advertising™




                        Harnessing s for Big Data
                        Analytics

                                                                   Jobin Wilson
                                                                   jobin.wilson@flytxt.com




                                                                                             Confidential
               Copyright © 2010 Flytxt B.V. All rights reserved.
Who am I ?

   • Architect @ Flytxt (Big Data Analytics & Automation)

   • Passionate about data, distributed computing , machine learning

   • Previously

        •Virtualization & Cloud Lifecycle Management(BMC)

               • Designed and Implemented Cloud Life Cycle Management Interface for BMC

        • Large Scale Data Centre Automation(AOL)

               • Implemented Centralized Data Center Management Framework for AOL

        •Workflow Systems & Automation (Accenture)

               • Implemented Service Management Suit for various customers




                                                                                          Confidential
             Copyright © 2010 Flytxt B.V. All rights reserved.
Session Agenda!

• Data – What's the big deal?

• What is Hadoop( & What it is not  )

• Map-Reduce Model & HDFS

• Hadoop Ecosystem & Tools

• Lets get started!

• Q&A




                                                                    3   Confidential
                Copyright © 2010 Flytxt B.V. All rights reserved.
Five computers & a 640k ;-)


                                                             "I think there is a world market
                                                             for about five computers"
      Moore’s Law
                                                                        Thomas Watson 1943,
                                                                        Chairman of the board of IBM




       "640k ought to be enough for
       anybody"


                          Attributed to
                          Bill Gates in 1981.




                                                                                                       Confidential
         Copyright © 2010 Flytxt B.V. All rights reserved.
Data Explosion !




                                                             Confidential
         Copyright © 2010 Flytxt B.V. All rights reserved.
Do I also know what you might do next summer?


                                        •     Does your travel company know you visited Goa &
                                              Cochin twice in the last two years?

                                        •     Collaborative Filtering




                                        •     Lots of Data + Statistics = WOW!!!

                                        •     BTW, don’t worry about the eqn 




                                                                                                Confidential
        Copyright © 2010 Flytxt B.V. All rights reserved.
Don‟t throw away data just because it doesn't „fit‟


 •   relational tuples, log files, semi structured textual data (e.g., e-mail),pictures
     , videos

 •   User generated data & System generated data

 •   Applications need more than structured data

 •   My application is not “Dumb” any more!!

 •   “I keep saying that the sexy job in the next 10 years will be
      statisticians, and I’m not kidding.” - Hal Varian (Google’s chief economist)




                                                                                          Confidential
                Copyright © 2010 Flytxt B.V. All rights reserved.
Lets get to business!!

What is Apache Hadoop ?

•   Apache Hadoop is an open-source system to
    reliably store and process extremely large data sets
    across many commodity computers.

•   originally developed to support Nutch search engine
    project.

•   scales linearly with data size or analysis complexity

•   Scale-out ,shared nothing architecture

•   inspired by Google's MapReduce and Google File
    System (GFS) papers




                                                                   Confidential
               Copyright © 2010 Flytxt B.V. All rights reserved.
Basics of Hadoop


 •   Two Core Components – HDFS & Map-Reduce

 •   Machines are un-reliable

 •   Separates distributed fault-tolerant computing code from application
     logic.

 •   No need to worry about identity of a machine

 •   lets you interact with a cluster, not a bunch of machines.

 •   Analysis workloads span across multiple machines

 •   runs as a cloud(cluster) & possibly on a cloud (EC2)




                                                                            Confidential
               Copyright © 2010 Flytxt B.V. All rights reserved.
Lead Actors


•   Name Node – Book keeping metadata server

•   Secondary Name Node – Assistant to Name Node

•   Job Tracker – Scheduler

•   Task Tracker - Task execution

•   Data Node - Block storage




                                                                    Confidential
                Copyright © 2010 Flytxt B.V. All rights reserved.
HDFS Write Model




                                                            Confidential
        Copyright © 2010 Flytxt B.V. All rights reserved.
Map-Reduce Model




                                                          Confidential
      Copyright © 2010 Flytxt B.V. All rights reserved.
Map-Reduce Execution Flow




                                                          Confidential
      Copyright © 2010 Flytxt B.V. All rights reserved.
Hadoop Ecosystem
•   Oozie – Open-source workflow/coordination
    service to manage data processing jobs for Apache
    Hadoop™ - Developed at Yahoo!

•   HBase – Column-store database based on
    Google’s BigTable. Holds extremely large data sets
    (Petabytes)

•   Hive – SQL based data warehousing app with
    features for analyzing very large data sets -
    Developed at Facebook

•   Zoo Keeper – Distributed consensus engine
    providing Leader election, service
    discovery, distributed locking / mutual exclusion

•   Pig - platform for analyzing large data sets that
    consists of a high-level language for expressing
    data analysis steps

•   Ganglia - a scalable distributed monitoring system
    for high-performance computing systems such as
    clusters and Grids
                                                                       Confidential
                   Copyright © 2010 Flytxt B.V. All rights reserved.
Hadoop is not a “Holy Grail”

•   Not a substitute for a database

•   MapReduce is not always the best algorithm

•   HDFS is not a substitute for a
    High Availability SAN-hosted FS

•   HDFS is not a Posix file system

•   Not a place to learn Java programming

•   Not a place to learn Unix/Linux system administration

•   Not a place to learn basics of networking




                                                                    Confidential
                Copyright © 2010 Flytxt B.V. All rights reserved.
Notable Users of Hadoop
(Source: http://en.wikipedia.org/wiki/Hadoop)



     • A9.com                               • Meebo
     • AOL                                  • Metaweb
     • EHarmony                             • The New York Times
     • eBay                                 • Rackspace
     • Facebook                             • StumbleUpon
     • Fox Interactive Media                • Twitter
     • IBM                                  • Yahoo
     • Last.fm                              • Amazon
     • LinkedIn




                                                                        Confidential
                    Copyright © 2010 Flytxt B.V. All rights reserved.
Q&A




                                                    www.flytxt.com
                                                    Confidential
Copyright © 2010 Flytxt B.V. All rights reserved.
THANK YOU
      contact us : dev2dev@flytxt.com/ jobin.wilson@flytxt.com




                                                                 www.flytxt.com
                                                                 Confidential   18
Copyright © 2010 Flytxt B.V. All rights reserved.

More Related Content

Viewers also liked

20130412 brand management chapter 5 iba 45 e
20130412 brand management chapter 5 iba 45 e20130412 brand management chapter 5 iba 45 e
20130412 brand management chapter 5 iba 45 eZeeshan Huq
 
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
2011  p5_and_p6_principal's_dialogue_collated_for_uploading2011  p5_and_p6_principal's_dialogue_collated_for_uploading
2011 p5_and_p6_principal's_dialogue_collated_for_uploadingalanpillay79
 
Cl introduction of p1_&_p2
Cl introduction of p1_&_p2Cl introduction of p1_&_p2
Cl introduction of p1_&_p2
alanpillay79
 
Recommendation engines : Matching items to users
Recommendation engines : Matching items to usersRecommendation engines : Matching items to users
Recommendation engines : Matching items to usersjobinwilson
 
P1 & p2_cl_powerpoint_slides_2011
P1 & p2_cl_powerpoint_slides_2011P1 & p2_cl_powerpoint_slides_2011
P1 & p2_cl_powerpoint_slides_2011
alanpillay79
 
20140128 buyer behavior iba mba48 d
20140128 buyer behavior iba mba48 d20140128 buyer behavior iba mba48 d
20140128 buyer behavior iba mba48 dZeeshan Huq
 
TL P1 & P2 parent's briefing 2011
TL P1 & P2 parent's briefing 2011TL P1 & P2 parent's briefing 2011
TL P1 & P2 parent's briefing 2011
alanpillay79
 
Building apps with HBase - Big Data TechCon Boston
Building apps with HBase - Big Data TechCon BostonBuilding apps with HBase - Big Data TechCon Boston
Building apps with HBase - Big Data TechCon Bostonamansk
 
Brightwater Engineering General Presentation
Brightwater Engineering General PresentationBrightwater Engineering General Presentation
Brightwater Engineering General Presentation
fletcher_mat
 
Pptpollution 111024083127-phpapp01
Pptpollution 111024083127-phpapp01Pptpollution 111024083127-phpapp01
Pptpollution 111024083127-phpapp01
Mukesh Thakur
 
Budjettikone
BudjettikoneBudjettikone
Budjettikone
Pluto Finland
 
Pharmapack 2012 Competitive Intelligence Report
Pharmapack 2012 Competitive Intelligence ReportPharmapack 2012 Competitive Intelligence Report
Pharmapack 2012 Competitive Intelligence Report
Viedoc
 
Program Komuniti Tone Plus
Program Komuniti Tone PlusProgram Komuniti Tone Plus
Program Komuniti Tone Plus
Vun Chee Vui
 
Rapport de veille_salon_texworld_paris_2010
Rapport de veille_salon_texworld_paris_2010Rapport de veille_salon_texworld_paris_2010
Rapport de veille_salon_texworld_paris_2010Viedoc
 
IT & Big Data 2012 Report
IT & Big Data 2012 ReportIT & Big Data 2012 Report
IT & Big Data 2012 Report
Viedoc
 
Mauricio Escalante Tarea Decalogo
Mauricio Escalante Tarea DecalogoMauricio Escalante Tarea Decalogo
Mauricio Escalante Tarea Decalogo
Mauricio Escalante
 
CFIA 2012 Food Industry ingredients Competitive Intelligence Report
CFIA 2012 Food Industry ingredients Competitive Intelligence ReportCFIA 2012 Food Industry ingredients Competitive Intelligence Report
CFIA 2012 Food Industry ingredients Competitive Intelligence Report
Viedoc
 
20140117 buyer behavior iba mba48 d
20140117 buyer behavior iba mba48 d20140117 buyer behavior iba mba48 d
20140117 buyer behavior iba mba48 dZeeshan Huq
 

Viewers also liked (20)

20130412 brand management chapter 5 iba 45 e
20130412 brand management chapter 5 iba 45 e20130412 brand management chapter 5 iba 45 e
20130412 brand management chapter 5 iba 45 e
 
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
2011  p5_and_p6_principal's_dialogue_collated_for_uploading2011  p5_and_p6_principal's_dialogue_collated_for_uploading
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
 
Monavie Presentation
Monavie PresentationMonavie Presentation
Monavie Presentation
 
Cl introduction of p1_&_p2
Cl introduction of p1_&_p2Cl introduction of p1_&_p2
Cl introduction of p1_&_p2
 
Recommendation engines : Matching items to users
Recommendation engines : Matching items to usersRecommendation engines : Matching items to users
Recommendation engines : Matching items to users
 
P1 & p2_cl_powerpoint_slides_2011
P1 & p2_cl_powerpoint_slides_2011P1 & p2_cl_powerpoint_slides_2011
P1 & p2_cl_powerpoint_slides_2011
 
Viral marketing
Viral marketingViral marketing
Viral marketing
 
20140128 buyer behavior iba mba48 d
20140128 buyer behavior iba mba48 d20140128 buyer behavior iba mba48 d
20140128 buyer behavior iba mba48 d
 
TL P1 & P2 parent's briefing 2011
TL P1 & P2 parent's briefing 2011TL P1 & P2 parent's briefing 2011
TL P1 & P2 parent's briefing 2011
 
Building apps with HBase - Big Data TechCon Boston
Building apps with HBase - Big Data TechCon BostonBuilding apps with HBase - Big Data TechCon Boston
Building apps with HBase - Big Data TechCon Boston
 
Brightwater Engineering General Presentation
Brightwater Engineering General PresentationBrightwater Engineering General Presentation
Brightwater Engineering General Presentation
 
Pptpollution 111024083127-phpapp01
Pptpollution 111024083127-phpapp01Pptpollution 111024083127-phpapp01
Pptpollution 111024083127-phpapp01
 
Budjettikone
BudjettikoneBudjettikone
Budjettikone
 
Pharmapack 2012 Competitive Intelligence Report
Pharmapack 2012 Competitive Intelligence ReportPharmapack 2012 Competitive Intelligence Report
Pharmapack 2012 Competitive Intelligence Report
 
Program Komuniti Tone Plus
Program Komuniti Tone PlusProgram Komuniti Tone Plus
Program Komuniti Tone Plus
 
Rapport de veille_salon_texworld_paris_2010
Rapport de veille_salon_texworld_paris_2010Rapport de veille_salon_texworld_paris_2010
Rapport de veille_salon_texworld_paris_2010
 
IT & Big Data 2012 Report
IT & Big Data 2012 ReportIT & Big Data 2012 Report
IT & Big Data 2012 Report
 
Mauricio Escalante Tarea Decalogo
Mauricio Escalante Tarea DecalogoMauricio Escalante Tarea Decalogo
Mauricio Escalante Tarea Decalogo
 
CFIA 2012 Food Industry ingredients Competitive Intelligence Report
CFIA 2012 Food Industry ingredients Competitive Intelligence ReportCFIA 2012 Food Industry ingredients Competitive Intelligence Report
CFIA 2012 Food Industry ingredients Competitive Intelligence Report
 
20140117 buyer behavior iba mba48 d
20140117 buyer behavior iba mba48 d20140117 buyer behavior iba mba48 d
20140117 buyer behavior iba mba48 d
 

Similar to Harnessing hadoop for big data analytics v0.1

Leveraging open source for big data stack
Leveraging open source for big data stackLeveraging open source for big data stack
Leveraging open source for big data stack
Flytxt
 
HTML5--The 30,000' View (A fast-paced overview of HTML5)
HTML5--The 30,000' View (A fast-paced overview of HTML5)HTML5--The 30,000' View (A fast-paced overview of HTML5)
HTML5--The 30,000' View (A fast-paced overview of HTML5)
Peter Lubbers
 
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
Taras Filatov
 
Html5 Flyover
Html5 FlyoverHtml5 Flyover
Html5 Flyover
Skills Matter
 
Putting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data StoresPutting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data Stores
DATAVERSITY
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
Adam Muise
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
Sean Roberts
 
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
AI4BD GmbH
 
SharePoint from the Forms-Eye View
SharePoint from the Forms-Eye ViewSharePoint from the Forms-Eye View
SharePoint from the Forms-Eye View
Steve Weissman
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
Ameet Paranjape
 
Building a modern data platform on AWS. Utrecht AWS Dev Day
Building a modern data platform on AWS. Utrecht AWS Dev DayBuilding a modern data platform on AWS. Utrecht AWS Dev Day
Building a modern data platform on AWS. Utrecht AWS Dev Day
javier ramirez
 
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Cloudera, Inc.
 
Tw Technology Radar Qtb Sep11
Tw Technology Radar Qtb Sep11Tw Technology Radar Qtb Sep11
Tw Technology Radar Qtb Sep11
Adrian Treacy
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorial
markgrover
 
IBM Watson
IBM WatsonIBM Watson
IBM Watson
Mohamed Tawfik
 
Alex Wade, Digital Library Interoperability
Alex Wade, Digital Library InteroperabilityAlex Wade, Digital Library Interoperability
Alex Wade, Digital Library Interoperability
parker01
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal
 
Visualizing IoT: Rapid Business Data Discovery for the Internet of Things
Visualizing IoT: Rapid Business Data Discovery for the Internet of ThingsVisualizing IoT: Rapid Business Data Discovery for the Internet of Things
Visualizing IoT: Rapid Business Data Discovery for the Internet of Things
Mia Yuan Cao
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
Peter Wang
 

Similar to Harnessing hadoop for big data analytics v0.1 (20)

Leveraging open source for big data stack
Leveraging open source for big data stackLeveraging open source for big data stack
Leveraging open source for big data stack
 
HTML5--The 30,000' View (A fast-paced overview of HTML5)
HTML5--The 30,000' View (A fast-paced overview of HTML5)HTML5--The 30,000' View (A fast-paced overview of HTML5)
HTML5--The 30,000' View (A fast-paced overview of HTML5)
 
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
 
Html5 Flyover
Html5 FlyoverHtml5 Flyover
Html5 Flyover
 
Putting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data StoresPutting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data Stores
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
 
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
 
SharePoint from the Forms-Eye View
SharePoint from the Forms-Eye ViewSharePoint from the Forms-Eye View
SharePoint from the Forms-Eye View
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Building a modern data platform on AWS. Utrecht AWS Dev Day
Building a modern data platform on AWS. Utrecht AWS Dev DayBuilding a modern data platform on AWS. Utrecht AWS Dev Day
Building a modern data platform on AWS. Utrecht AWS Dev Day
 
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
 
Tw Technology Radar Qtb Sep11
Tw Technology Radar Qtb Sep11Tw Technology Radar Qtb Sep11
Tw Technology Radar Qtb Sep11
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorial
 
IBM Watson
IBM WatsonIBM Watson
IBM Watson
 
Alex Wade, Digital Library Interoperability
Alex Wade, Digital Library InteroperabilityAlex Wade, Digital Library Interoperability
Alex Wade, Digital Library Interoperability
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Plug 20110217
Plug   20110217Plug   20110217
Plug 20110217
 
Visualizing IoT: Rapid Business Data Discovery for the Internet of Things
Visualizing IoT: Rapid Business Data Discovery for the Internet of ThingsVisualizing IoT: Rapid Business Data Discovery for the Internet of Things
Visualizing IoT: Rapid Business Data Discovery for the Internet of Things
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 

Recently uploaded

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Newntide latest company Introduction.pdf
Newntide latest company Introduction.pdfNewntide latest company Introduction.pdf
Newntide latest company Introduction.pdf
LucyLuo36
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 

Recently uploaded (20)

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Newntide latest company Introduction.pdf
Newntide latest company Introduction.pdfNewntide latest company Introduction.pdf
Newntide latest company Introduction.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 

Harnessing hadoop for big data analytics v0.1

  • 1. Transforming Mobile Marketing & Advertising™ Harnessing s for Big Data Analytics Jobin Wilson jobin.wilson@flytxt.com Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 2. Who am I ? • Architect @ Flytxt (Big Data Analytics & Automation) • Passionate about data, distributed computing , machine learning • Previously •Virtualization & Cloud Lifecycle Management(BMC) • Designed and Implemented Cloud Life Cycle Management Interface for BMC • Large Scale Data Centre Automation(AOL) • Implemented Centralized Data Center Management Framework for AOL •Workflow Systems & Automation (Accenture) • Implemented Service Management Suit for various customers Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 3. Session Agenda! • Data – What's the big deal? • What is Hadoop( & What it is not  ) • Map-Reduce Model & HDFS • Hadoop Ecosystem & Tools • Lets get started! • Q&A 3 Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 4. Five computers & a 640k ;-) "I think there is a world market for about five computers" Moore’s Law Thomas Watson 1943, Chairman of the board of IBM "640k ought to be enough for anybody" Attributed to Bill Gates in 1981. Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 5. Data Explosion ! Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 6. Do I also know what you might do next summer? • Does your travel company know you visited Goa & Cochin twice in the last two years? • Collaborative Filtering • Lots of Data + Statistics = WOW!!! • BTW, don’t worry about the eqn  Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 7. Don‟t throw away data just because it doesn't „fit‟ • relational tuples, log files, semi structured textual data (e.g., e-mail),pictures , videos • User generated data & System generated data • Applications need more than structured data • My application is not “Dumb” any more!! • “I keep saying that the sexy job in the next 10 years will be statisticians, and I’m not kidding.” - Hal Varian (Google’s chief economist) Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 8. Lets get to business!! What is Apache Hadoop ? • Apache Hadoop is an open-source system to reliably store and process extremely large data sets across many commodity computers. • originally developed to support Nutch search engine project. • scales linearly with data size or analysis complexity • Scale-out ,shared nothing architecture • inspired by Google's MapReduce and Google File System (GFS) papers Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 9. Basics of Hadoop • Two Core Components – HDFS & Map-Reduce • Machines are un-reliable • Separates distributed fault-tolerant computing code from application logic. • No need to worry about identity of a machine • lets you interact with a cluster, not a bunch of machines. • Analysis workloads span across multiple machines • runs as a cloud(cluster) & possibly on a cloud (EC2) Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 10. Lead Actors • Name Node – Book keeping metadata server • Secondary Name Node – Assistant to Name Node • Job Tracker – Scheduler • Task Tracker - Task execution • Data Node - Block storage Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 11. HDFS Write Model Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 12. Map-Reduce Model Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 13. Map-Reduce Execution Flow Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 14. Hadoop Ecosystem • Oozie – Open-source workflow/coordination service to manage data processing jobs for Apache Hadoop™ - Developed at Yahoo! • HBase – Column-store database based on Google’s BigTable. Holds extremely large data sets (Petabytes) • Hive – SQL based data warehousing app with features for analyzing very large data sets - Developed at Facebook • Zoo Keeper – Distributed consensus engine providing Leader election, service discovery, distributed locking / mutual exclusion • Pig - platform for analyzing large data sets that consists of a high-level language for expressing data analysis steps • Ganglia - a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 15. Hadoop is not a “Holy Grail” • Not a substitute for a database • MapReduce is not always the best algorithm • HDFS is not a substitute for a High Availability SAN-hosted FS • HDFS is not a Posix file system • Not a place to learn Java programming • Not a place to learn Unix/Linux system administration • Not a place to learn basics of networking Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 16. Notable Users of Hadoop (Source: http://en.wikipedia.org/wiki/Hadoop) • A9.com • Meebo • AOL • Metaweb • EHarmony • The New York Times • eBay • Rackspace • Facebook • StumbleUpon • Fox Interactive Media • Twitter • IBM • Yahoo • Last.fm • Amazon • LinkedIn Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 17. Q&A www.flytxt.com Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 18. THANK YOU contact us : dev2dev@flytxt.com/ jobin.wilson@flytxt.com www.flytxt.com Confidential 18 Copyright © 2010 Flytxt B.V. All rights reserved.