Hadoop acm presentation

•Download as PPTX, PDF•

3 likes•855 views

Brad Sarsfield

Microsoft Hadoop presentation for ACM Data Mining Hackathon competition.

Hadoop Capabilities.

Extract Load Distributed
Transform Compute

Predictive Machine Graph
Analysis Learning Processing

Hadoop architecture.

Distributed Processing
(Map Reduce)

Distributed Storage
(HDFS)

Hadoop and Microsoft.
Big engineering investment
• Big Data Business Intelligence tooling
• Big Data Apache Hadoop
• Big Data Parallel Data Warehouse

Open source Commitment
• Apache Software Foundation
• Hortonworks Partnership

We are delivering
• Apache Hadoop on Windows Server
• Apache Hadoop on Windows Azure

Microsoft Hadoop Vision.
Better on Windows and Azure
• Active Directory
• System Center

Microsoft Data Connectivity
• SQL Server / SQL Parallel Data Warehouse
• Azure Storage / Azure Data Market

Microsoft Business Intelligence (BI)
• ODBC Connectivity

ACM Hackathon.
Free Hadoop on Azure
• Code: acmhackathon

Free 30 day Azure account
• No credit card
• 750h small compute / 35GB storage
• Email brad@bing.com for code

Hadoop on Azure demo

What's hot

On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)Stéphane Fréchette

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks

Big Data in the Real WorldMark Kromer

Cognitives servicesMichel HUBERT

Bloor Research & DataStax: How graph databases solve previously unsolvable bu...DataStax

The Ecosystem is too damn big DataWorks Summit/Hadoop Summit

Big Data Visualisation with Hadoop and PowerPivotJen Stirrup

Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer

Big data and hadoopPrashanth Yennampelli

Hadoop data access layer v4.0SpringPeople

How to get started in Big Data without Big Costs - StampedeCon 2016StampedeCon

Hydra - Content Processing Framework for Search Driven SolutionsFindwise

Unleash the Power of Azure Data Factory - SQL User GroupSergio Zenatti Filho

Qubole presentation for the Cleveland Big Data and Hadoop Meetup Qubole

Next Generation Data Platforms - Deon ThomasThoughtworks

Tropos.io - Hadoop in the Cloud - BA4ALL 2016Tropos.io

When Databases Meet Big data and Hadoop - Uni of Tromso Online LectureIrfan Elahi

Azure Con Cortana Analytics Suite Andy Wright

Contact Centers Powered by EsgynRajender K Salgam

Auckland SQL Saturday - Azure Data LakeSergio Zenatti Filho

What's hot (20)

On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...

Big Data in the Real World

Cognitives services

Bloor Research & DataStax: How graph databases solve previously unsolvable bu...

The Ecosystem is too damn big

Big Data Visualisation with Hadoop and PowerPivot

Big Data Analytics with Hadoop, MongoDB and SQL Server

Big data and hadoop

Hadoop data access layer v4.0

How to get started in Big Data without Big Costs - StampedeCon 2016

Hydra - Content Processing Framework for Search Driven Solutions

Unleash the Power of Azure Data Factory - SQL User Group

Qubole presentation for the Cleveland Big Data and Hadoop Meetup

Next Generation Data Platforms - Deon Thomas

Tropos.io - Hadoop in the Cloud - BA4ALL 2016

When Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture

Azure Con Cortana Analytics Suite

Contact Centers Powered by Esgyn

Auckland SQL Saturday - Azure Data Lake

Viewers also liked

Hadoop in the Microsoft EnterpriseDataWorks Summit

Apache hadoop for windows server and windwos azureBrad Sarsfield

Microsoft's Hadoop StoryMichael Rys

Where to Deploy Hadoop: Bare Metal or Cloud? DataWorks Summit

The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies

ROI of Big Data Analytics Native on HadoopDataWorks Summit

Cost of Ownership for Hadoop ImplementationDataWorks Summit

Viewers also liked (7)

Hadoop in the Microsoft Enterprise

Apache hadoop for windows server and windwos azure

Microsoft's Hadoop Story

Where to Deploy Hadoop: Bare Metal or Cloud?

The TCO Calculator - Estimate the True Cost of Hadoop

ROI of Big Data Analytics Native on Hadoop

Cost of Ownership for Hadoop Implementation

Similar to Hadoop acm presentation

Seattle Scalability - Sept Meetupclive boulton

SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann

Building Big Data Solutions with Azure Data Lake.10.11.17.pptxthando80

Big Data in the Microsoft PlatformJesus Rodriguez

Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Imam Raza

Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra

Microsoft's Big Play for Big DataAndrew Brust

Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Andrew Brust

[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsightNaoki (Neo) SATO

Hadoop in a NutshellAnthony Thomas

Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol HARMAN Services

Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...BigDataEverywhere

Big Data and NoSQL for Database and BI ProsAndrew Brust

Uotm workshopRavi Patel

Accelerating Big Data AnalyticsAttunity

5 Comparing Microsoft Big Data Technologies for AnalyticsJen Stirrup

Big Data and NoSQL for Database and BI ProsAndrew Brust

USQL Trivadis Azure Data Lake EventTrivadis

Bi on Big Data - Strata 2016 in LondonDremio Corporation

Hadoop in actionMahmoud Yassin

Hadoop acm presentation

1. Hadoop and Microsoft. Brad Sarsfield | Senior Software Engineer @bradoop

3. How Big is Big Data?

4. It’s all about your Big Data Problems

5. Hadoop is for Big Data.

6. Data is the Platform.

7. Hadoop Data Science.

8. Hadoop Capabilities. Extract Load Distributed Transform Compute Predictive Machine Graph Analysis Learning Processing

9. Hadoop architecture. Distributed Processing (Map Reduce) Distributed Storage (HDFS)

10. Hadoop and Microsoft. Big engineering investment • Big Data Business Intelligence tooling • Big Data Apache Hadoop • Big Data Parallel Data Warehouse Open source Commitment • Apache Software Foundation • Hortonworks Partnership We are delivering • Apache Hadoop on Windows Server • Apache Hadoop on Windows Azure

11. Microsoft Hadoop Vision. Better on Windows and Azure • Active Directory • System Center Microsoft Data Connectivity • SQL Server / SQL Parallel Data Warehouse • Azure Storage / Azure Data Market Microsoft Business Intelligence (BI) • ODBC Connectivity

12. ACM Hackathon. Free Hadoop on Azure • Code: acmhackathon Free 30 day Azure account • No credit card • 750h small compute / 35GB storage • Email brad@bing.com for code Hadoop on Azure demo

Editor's Notes

Good afternoon. Thanks for coming, I know you're going to be really excited about this. I'm going to talk about Big Data, Hadoop and Microsoft It's just simply amazing to see the growing momentum around Big Data conversations happening today. Hadoop is changing the conversations that we have about Data, Big Data. I want to make sure we stay grounded in thinking through how to make money and save money with your Data using Hadoop.<next slide>
Let’s talk about size for a moment. The example I like to use is the US library of congress. The US library of congress has millions of books, recording, photographs, maps, music and manuscripts. All put together they have around 300TB of information. How much is that? That's 838 miles of bookshelves; If you were to stretch those out end to end, then go downstairs, get in your car and start driving at 65mph you'd hit the end of the books 13 hours later in New York City.A little over three times that is a petabyte. Microsoft is managing well over 100 Petabytes of data across our online properties. That single row of bookshelfs from New York to Jacksonville Florida is now half a mile high. That’s stunning.We are adding 7.5PBs per month of new data, running 20k analytic jobs per day to run our online services business.The good news is that hardware is fast and cheap enough that now we can record this data and consume it. This simply wasn’t possible a few years ago. Hard drive density and CPU power continue to double every 18 months.From the Microsoft point of view we have a pretty good understanding of how to build and operate one of these infrastructures and in the end connect it thru to developers and end users. We’re the only ones in the 100+Petabyte Club who also run an enterprise software and cloud business. I see the complete solution where we enable developers to build applications on this data; and connect them through to our end users with BI tools to deliver Breakthrough Big Data Insights. I will talk more about that in the Hadoop and Excel talk and take this down to a practical level in my second talk. This leads me onto the next concept.. It’s all about the data
It’s all about your Data, Actually, it’s all about your big dataIs it big as in Volume? Where your data exceeds limits of physical capabilities of systems today.Is it Velocity? The data is moving at a fast rate and value can decay over time.Is it Variability? of structure from unstructured, semi-structured to highly structured data.Doug Laney http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdfThe answer is it’s all of the above.Now that you have Big Data; you have two problems. You have BIG DATA problemsAndYou have big DATA PROBLEMS
The second thing I want to talk about is Hadoop and how Hadoop is setup to deliver Breakthrough Business Insights from your data.How many of you are familiar with Hadoop? How many of you are using Hadoop for projects today?How many are planning on using Hadoop in the next 12mo? How about in the cloud?When people talk about Hadoop they are often talking about specific computational patterns including map reduce, which emerged as a method to process lots of unstructured data on top of a distributed storage system in a highly fault tolerant and embarrassingly scalable way. Hadoop allows us to store and process large amounts of data on commodity hardware. In the past you would spend large amounts of money on very specialized hardware. Today you can do this with off the shelf hardware running Hadoop. Now, Hadoop doesn’t have a monopoly on “big”, “real time” or “unstructured” but does provide some unique capabilities.
It's everywhere to be mined, but we have what one can call "the pomegranate problem" Imagine all of your data being inside a pomegranate. When you eat a pomegranate it’s a bit difficult getting into all of the little pieces inside the pomegranate out, it's a bit of work.That’s the process that you need to go through to extract business insights out of your data.It’s useful to think of it in this way; where your data is the platform. Not the tooling that surrounds it.It’s all about the data.I’d like to share with you my favorite big data quotation from a famous Big Data philosopher.<next slide>
We don't have a Hadoop problem they have analytics, pattern mining, trend analysis, statistical inferenceing, economic modeling, market regression level problems. Big Data; in terms of data size, variability and velocity at scale are is the first problem. But the Big Data solutions and technology by themselves don't lead to solving business objectives. Data science starts where the utility class services like Big Data Hadoop end. The real opportunity is for Data science as a hosted petascale service ontop of cloud infrastructure. As powerful as Hadoop is, today it’s still more of a computer scientist’s or academically-trained analyst’s tool than it is an enterprise analytics product. Hadoop itself is controlled through programming code rather than anything that looks like it was designed for business unit personnel. Hadoop data is often more “raw” and “wild” than data typically fed to data warehouse and OLAP (Online Analytical Processing) systems. This is where I and Microsoft see opportunity. Essentially; wouldn't it be cool if mere mortals could use this stuff and consume insights that are directly coming from Hadoop?
I see the real breakthrough insights coming through when you take what is the traditional "Business Intelligence" and add more capabilities like machine learning, predictive analysis, statistical analysis, large scale graph processing, pattern mining, trend analysis, economic modeling. All of which today are a reality in Hadoop. The implications of this are quite astounding when you think about it. This is huge.

Hadoop acm presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Hadoop acm presentation

Similar to Hadoop acm presentation (20)

Hadoop acm presentation

Editor's Notes