SlideShare a Scribd company logo
1 of 12
Hadoop and Microsoft.



Brad Sarsfield | Senior Software Engineer   @bradoop
How Big is Big Data?
It’s all about your
Big Data Problems
Hadoop is for Big Data.
Data is the Platform.
Hadoop Data Science.
Hadoop Capabilities.

      Extract Load     Distributed
       Transform        Compute




 Predictive     Machine         Graph
  Analysis      Learning      Processing
Hadoop architecture.


       Distributed Processing
            (Map Reduce)

     Distributed Storage
            (HDFS)
Hadoop and Microsoft.
     Big engineering investment
       • Big Data Business Intelligence tooling
       • Big Data Apache Hadoop
       • Big Data Parallel Data Warehouse


     Open source Commitment
       • Apache Software Foundation
       • Hortonworks Partnership


     We are delivering
       • Apache Hadoop on Windows Server
       • Apache Hadoop on Windows Azure
Microsoft Hadoop Vision.
      Better on Windows and Azure
        • Active Directory
        • System Center


      Microsoft Data Connectivity
        • SQL Server / SQL Parallel Data Warehouse
        • Azure Storage / Azure Data Market


      Microsoft Business Intelligence (BI)
        • ODBC Connectivity
ACM Hackathon.
  Free Hadoop on Azure
    • Code: acmhackathon



  Free 30 day Azure account
    • No credit card
    • 750h small compute / 35GB storage
    • Email brad@bing.com for code

  Hadoop on Azure demo

More Related Content

What's hot

On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)Stéphane Fréchette
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real WorldMark Kromer
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...DataStax
 
Big Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotBig Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotJen Stirrup
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0SpringPeople
 
How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016StampedeCon
 
Hydra - Content Processing Framework for Search Driven Solutions
Hydra - Content Processing Framework for Search Driven SolutionsHydra - Content Processing Framework for Search Driven Solutions
Hydra - Content Processing Framework for Search Driven SolutionsFindwise
 
Unleash the Power of Azure Data Factory - SQL User Group
Unleash the Power of Azure Data Factory - SQL User GroupUnleash the Power of Azure Data Factory - SQL User Group
Unleash the Power of Azure Data Factory - SQL User GroupSergio Zenatti Filho
 
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup   Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup Qubole
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasThoughtworks
 
Tropos.io - Hadoop in the Cloud - BA4ALL 2016
Tropos.io - Hadoop in the Cloud - BA4ALL 2016Tropos.io - Hadoop in the Cloud - BA4ALL 2016
Tropos.io - Hadoop in the Cloud - BA4ALL 2016Tropos.io
 
When Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture
When Databases Meet Big data and Hadoop - Uni of Tromso Online LectureWhen Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture
When Databases Meet Big data and Hadoop - Uni of Tromso Online LectureIrfan Elahi
 
Azure Con Cortana Analytics Suite
Azure Con Cortana Analytics Suite Azure Con Cortana Analytics Suite
Azure Con Cortana Analytics Suite Andy Wright
 
Contact Centers Powered by Esgyn
Contact Centers Powered by EsgynContact Centers Powered by Esgyn
Contact Centers Powered by EsgynRajender K Salgam
 
Auckland SQL Saturday - Azure Data Lake
Auckland SQL Saturday - Azure Data LakeAuckland SQL Saturday - Azure Data Lake
Auckland SQL Saturday - Azure Data LakeSergio Zenatti Filho
 

What's hot (20)

On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
Cognitives services
Cognitives servicesCognitives services
Cognitives services
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
 
The Ecosystem is too damn big
The Ecosystem is too damn big The Ecosystem is too damn big
The Ecosystem is too damn big
 
Big Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotBig Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivot
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0
 
How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016
 
Hydra - Content Processing Framework for Search Driven Solutions
Hydra - Content Processing Framework for Search Driven SolutionsHydra - Content Processing Framework for Search Driven Solutions
Hydra - Content Processing Framework for Search Driven Solutions
 
Unleash the Power of Azure Data Factory - SQL User Group
Unleash the Power of Azure Data Factory - SQL User GroupUnleash the Power of Azure Data Factory - SQL User Group
Unleash the Power of Azure Data Factory - SQL User Group
 
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup   Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
 
Tropos.io - Hadoop in the Cloud - BA4ALL 2016
Tropos.io - Hadoop in the Cloud - BA4ALL 2016Tropos.io - Hadoop in the Cloud - BA4ALL 2016
Tropos.io - Hadoop in the Cloud - BA4ALL 2016
 
When Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture
When Databases Meet Big data and Hadoop - Uni of Tromso Online LectureWhen Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture
When Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture
 
Azure Con Cortana Analytics Suite
Azure Con Cortana Analytics Suite Azure Con Cortana Analytics Suite
Azure Con Cortana Analytics Suite
 
Contact Centers Powered by Esgyn
Contact Centers Powered by EsgynContact Centers Powered by Esgyn
Contact Centers Powered by Esgyn
 
Auckland SQL Saturday - Azure Data Lake
Auckland SQL Saturday - Azure Data LakeAuckland SQL Saturday - Azure Data Lake
Auckland SQL Saturday - Azure Data Lake
 

Viewers also liked

Hadoop in the Microsoft Enterprise
Hadoop in the Microsoft EnterpriseHadoop in the Microsoft Enterprise
Hadoop in the Microsoft EnterpriseDataWorks Summit
 
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azureApache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azureBrad Sarsfield
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop StoryMichael Rys
 
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? DataWorks Summit
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies
 
ROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopDataWorks Summit
 
Cost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop ImplementationCost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop ImplementationDataWorks Summit
 

Viewers also liked (7)

Hadoop in the Microsoft Enterprise
Hadoop in the Microsoft EnterpriseHadoop in the Microsoft Enterprise
Hadoop in the Microsoft Enterprise
 
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azureApache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud?
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop
 
ROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on Hadoop
 
Cost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop ImplementationCost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop Implementation
 

Similar to Hadoop acm presentation

Seattle Scalability - Sept Meetup
Seattle Scalability - Sept MeetupSeattle Scalability - Sept Meetup
Seattle Scalability - Sept Meetupclive boulton
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxthando80
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Imam Raza
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big DataAndrew Brust
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Andrew Brust
 
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsightNaoki (Neo) SATO
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol HARMAN Services
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...BigDataEverywhere
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for AnalyticsJen Stirrup
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 

Similar to Hadoop acm presentation (20)

Seattle Scalability - Sept Meetup
Seattle Scalability - Sept MeetupSeattle Scalability - Sept Meetup
Seattle Scalability - Sept Meetup
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
 
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Uotm workshop
Uotm workshopUotm workshop
Uotm workshop
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 

Hadoop acm presentation

  • 1. Hadoop and Microsoft. Brad Sarsfield | Senior Software Engineer @bradoop
  • 2.
  • 3. How Big is Big Data?
  • 4. It’s all about your Big Data Problems
  • 5. Hadoop is for Big Data.
  • 6. Data is the Platform.
  • 8. Hadoop Capabilities. Extract Load Distributed Transform Compute Predictive Machine Graph Analysis Learning Processing
  • 9. Hadoop architecture. Distributed Processing (Map Reduce) Distributed Storage (HDFS)
  • 10. Hadoop and Microsoft. Big engineering investment • Big Data Business Intelligence tooling • Big Data Apache Hadoop • Big Data Parallel Data Warehouse Open source Commitment • Apache Software Foundation • Hortonworks Partnership We are delivering • Apache Hadoop on Windows Server • Apache Hadoop on Windows Azure
  • 11. Microsoft Hadoop Vision. Better on Windows and Azure • Active Directory • System Center Microsoft Data Connectivity • SQL Server / SQL Parallel Data Warehouse • Azure Storage / Azure Data Market Microsoft Business Intelligence (BI) • ODBC Connectivity
  • 12. ACM Hackathon. Free Hadoop on Azure • Code: acmhackathon Free 30 day Azure account • No credit card • 750h small compute / 35GB storage • Email brad@bing.com for code Hadoop on Azure demo

Editor's Notes

  1. Good afternoon. Thanks for coming, I know you're going to be really excited about this. I'm going to talk about Big Data, Hadoop and Microsoft It's just simply amazing to see the growing momentum around Big Data conversations happening today. Hadoop is changing the conversations that we have about Data, Big Data. I want to make sure we stay grounded in thinking through how to make money and save money with your Data using Hadoop.<next slide>
  2. Let’s talk about size for a moment. The example I like to use is the US library of congress. The US library of congress has millions of books, recording, photographs, maps, music and manuscripts. All put together they have around 300TB of information. How much is that? That's 838 miles of bookshelves; If you were to stretch those out end to end, then go downstairs, get in your car and start driving at 65mph you'd hit the end of the books 13 hours later in New York City.A little over three times that is a petabyte. Microsoft is managing well over 100 Petabytes of data across our online properties. That single row of bookshelfs from New York to Jacksonville Florida is now half a mile high. That’s stunning.We are adding 7.5PBs per month of new data, running 20k analytic jobs per day to run our online services business.The good news is that hardware is fast and cheap enough that now we can record this data and consume it. This simply wasn’t possible a few years ago. Hard drive density and CPU power continue to double every 18 months.From the Microsoft point of view we have a pretty good understanding of how to build and operate one of these infrastructures and in the end connect it thru to developers and end users. We’re the only ones in the 100+Petabyte Club who also run an enterprise software and cloud business. I see the complete solution where we enable developers to build applications on this data; and connect them through to our end users with BI tools to deliver Breakthrough Big Data Insights. I will talk more about that in the Hadoop and Excel talk and take this down to a practical level in my second talk. This leads me onto the next concept.. It’s all about the data
  3. It’s all about your Data, Actually, it’s all about your big dataIs it big as in Volume? Where your data exceeds limits of physical capabilities of systems today.Is it Velocity? The data is moving at a fast rate and value can decay over time.Is it Variability? of structure from unstructured, semi-structured to highly structured data.Doug Laney http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdfThe answer is it’s all of the above.Now that you have Big Data; you have two problems. You have BIG DATA problemsAndYou have big DATA PROBLEMS
  4. The second thing I want to talk about is Hadoop and how Hadoop is setup to deliver Breakthrough Business Insights from your data.How many of you are familiar with Hadoop? How many of you are using Hadoop for projects today?How many are planning on using Hadoop in the next 12mo? How about in the cloud?When people talk about Hadoop they are often talking about specific computational patterns including map reduce, which emerged as a method to process lots of unstructured data on top of a distributed storage system in a highly fault tolerant and embarrassingly scalable way.   Hadoop allows us to store and process large amounts of data on commodity hardware. In the past you would spend large amounts of money on very specialized hardware. Today you can do this with off the shelf hardware running Hadoop. Now, Hadoop doesn’t have a monopoly on “big”, “real time” or “unstructured” but does provide some unique capabilities.  
  5. It's everywhere to be mined, but we have what one can call "the pomegranate problem" Imagine all of your data being inside a pomegranate. When you eat a pomegranate it’s a bit difficult getting into all of the little pieces inside the pomegranate out, it's a bit of work.That’s the process that you need to go through to extract business insights out of your data.It’s useful to think of it in this way; where your data is the platform. Not the tooling that surrounds it.It’s all about the data.I’d like to share with you my favorite big data quotation from a famous Big Data philosopher.<next slide>
  6. We don't have a Hadoop problem they have analytics, pattern mining, trend analysis, statistical inferenceing, economic modeling, market regression level problems. Big Data; in terms of data size, variability and velocity at scale are is the first problem. But the Big Data solutions and technology by themselves don't lead to solving business objectives. Data science starts where the utility class services like Big Data Hadoop end. The real opportunity is for Data science as a hosted petascale service ontop of cloud infrastructure. As powerful as Hadoop is, today it’s still more of a computer scientist’s or academically-trained analyst’s tool than it is an enterprise analytics product. Hadoop itself is controlled through programming code rather than anything that looks like it was designed for business unit personnel. Hadoop data is often more “raw” and “wild” than data typically fed to data warehouse and OLAP (Online Analytical Processing) systems. This is where I and Microsoft see opportunity.  Essentially; wouldn't it be cool if mere mortals could use this stuff and consume insights that are directly coming from Hadoop?
  7. I see the real breakthrough insights coming through when you take what is the traditional "Business Intelligence" and add more capabilities like machine learning, predictive analysis, statistical analysis, large scale graph processing, pattern mining, trend analysis, economic modeling. All of which today are a reality in Hadoop. The implications of this are quite astounding when you think about it. This is huge.