An Introduction to
BIG DATA
CUSO Seminar on Big Data
Prof. Dr. Philippe Cudré-Mauroux
http://exascale.info
May 22, 2014
Fribourg–Switzerland
1
2
On the Menu Today
• Big Data: Context
• Big Data: Buzzwords
– 3 Vs of Big Data
• Big Data Landscape
• Hadoop
• Big Data in Switzerland
Instant Quizz
• 3 Vs of Big Data?
• CAP?
• Hadoop?
• Spark?
3
Exascale Data Deluge
• Science
– Biology
– Astronomy
– Remote Sensing
• Web companies
– Ebay
– Yahoo
• Financial services,
retail companies
governments, etc.
© Wired 2009
➡ New data formats
➡ New machines
➡ Peta & exa-scale datasets
➡ Obsolescence of traditional
information infrastructures
The Web as the Main Driver
5
© Qmee
Big Data Central Theorem
Data+Technology  Actionable Insight  $$
6
Big Data Buzz
7
Between now and 2015, the firm expects big data to
create some 4.4 million IT jobs globally; of those,
1.9 million will be in the U.S. Applying an economic
multiplier to that estimate, Gartner expects each new
big-data-related IT job to create work for three more
people outside the tech industry, for a total of almost
6 million more U.S. jobs.
Growth in the Asia Pacific Big Data
market is expected to accelerate rapidly
in two to three years time, from a mere
US$258.5 million last year to in excess
of $1.76 billion in 2016, with highest
growth in the storage segment.
Big Data as a New Class of Asset
• The Age of Big Data (NYTimes Feb. 11, 2012)
http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-
world.html
“Welcome to the Age of Big Data. The new megarich of Silicon Valley, first at
Google and now Facebook, are masters at harnessing the data of the Web
— online searches, posts and messages — with Internet advertising. At the
World Economic Forum last month in Davos, Switzerland, Big Data was a
marquee topic. A report by the forum, “Big Data, Big Impact,” declared data
a new class of economic asset, like currency or gold.”
8
9
The 3-Vs of Big Data
• Volume
– Amount of data
• Velocity
– speed of data in and out
• Variety
– range of data types and sources
• [Gartner 2012] "Big Data are high-volume, high-velocity, and/or high-
variety information assets that require new forms of processing to
enable enhanced decision making, insight discovery and process
optimization" 10
What can you do with the data
• Reporting
– Post Hoc
– Real time
• Monitoring (fine-grained)
• Exploration
• Finding Patterns
• Root Cause Analysis
• Closed-loop Control
• Model construction
• Prediction
• …
11
© Mike Franklin
10 ways big data changes everything
• Some concrete examples
– http://gigaom.com/2012/03/11/10-ways-big-data-is-changing-everything/2/
1. Can gigabytes predict the next Lady Gaga?
2. How big data can curb the world’s energy consumption
3. Big data is now your company’s virtual assistant
4. The future of Foursquare is data-fueled recommendations
5. How Twitter data-tracked cholera in Haiti
6. Revolutionizing Web publishing with big data
7. Can cell phone data cure society’s ills?
8. How data can help predict and create video hits
9. The new face of data visualization
10. One hospital’s embrace of big data
12
Typical Big Data Success Story
• Modeling users through Big Data
– Online ads sale / placement [e.g., Facebook]
– Personalized Coupons [e.g., Target]
– Product Placement [Walmart]
– Content Generation [e.g., NetFlix]
– Personalized learning [e.g., Duolingo]
– HR Recruiting [e.g., Gild]
13
More Data => Better Answers?
• Not that easy…
• More Rows: Algorithmic complexity kicks in
• More Columns: Exponentially more hypotheses
• Another formulation of the problem:
– Given an inferential goal and a fixed computational budget,
provide a guarantee that the quality of inference will increase
monotonically as data accrue (without bound)
• In other words:
=> Data should be a resource, not a load
14
© Mike Jordan
Big Data Infrastructures
15
A Concrete Example: Zynga
16
Leading the Pack of Wolves: Hadoop
• Google: Map/Reduce paper published 2004
• Open source variant: Hadoop
• Map-reduce = high-level programming model and
implementation for large-scale parallel data processing
• Right now most overhyped system in CS
17
What about Swiss Big Data?
• Competitive Research Groups
• Swiss Big Data User Group
• Swiss companies playing catch-up
– Productized Big Data systems at leading telcos & financial
companies
– Big Data is not a new technology: it's a fact;
• Deal with it  POCs in most banks, insurance companies, retailers
18
Tasty Bites of Big Data (1)
Thursday afternoon
• 13:30-15:00: Big Data Profiling
Felix Naumann (Hasso Plattner Institute)
• 15:15-16:45: Realtime Analytics
Christoph Koch (EPFL)
• 16:45-17:45: Current Trends and Challenges in Big Data
Benchmarking
Kais Sachs (SAP / Spec)
19
Tasty Bites of Big Data (2)
Friday
• 9:00 - 10:30: Structured Data in Web Search
Alon Halevy (Google)
• 10:45 - 12:15: Human Computation for Big Data
Gianluca Demartini (UNIFR)
• 13:30-15:00: Analysing and Querying Big Scientific Data
Thomas Heinis (EPFL)
• 15:00-16:30: The Evolution of Big Data Frameworks
Carlo Curino (Microsoft Research)
20
Social Event, Friday – Beer
Tasting!
Basse-Ville Fribourg / 15 CHF per Person
Everything You Always Wanted to Know
About Beer. * But Were Afraid to Ask!
18:00 @ Café du Belvédère, Grand-Rue
36
19:00 @ Fri-Mousse, Rue de la
Samaritaine 19
Limited Places, Inscription is mandatory at:
http://xr.si

An Introduction to Big Data

  • 1.
    An Introduction to BIGDATA CUSO Seminar on Big Data Prof. Dr. Philippe Cudré-Mauroux http://exascale.info May 22, 2014 Fribourg–Switzerland 1
  • 2.
    2 On the MenuToday • Big Data: Context • Big Data: Buzzwords – 3 Vs of Big Data • Big Data Landscape • Hadoop • Big Data in Switzerland
  • 3.
    Instant Quizz • 3Vs of Big Data? • CAP? • Hadoop? • Spark? 3
  • 4.
    Exascale Data Deluge •Science – Biology – Astronomy – Remote Sensing • Web companies – Ebay – Yahoo • Financial services, retail companies governments, etc. © Wired 2009 ➡ New data formats ➡ New machines ➡ Peta & exa-scale datasets ➡ Obsolescence of traditional information infrastructures
  • 5.
    The Web asthe Main Driver 5 © Qmee
  • 6.
    Big Data CentralTheorem Data+Technology  Actionable Insight  $$ 6
  • 7.
    Big Data Buzz 7 Betweennow and 2015, the firm expects big data to create some 4.4 million IT jobs globally; of those, 1.9 million will be in the U.S. Applying an economic multiplier to that estimate, Gartner expects each new big-data-related IT job to create work for three more people outside the tech industry, for a total of almost 6 million more U.S. jobs. Growth in the Asia Pacific Big Data market is expected to accelerate rapidly in two to three years time, from a mere US$258.5 million last year to in excess of $1.76 billion in 2016, with highest growth in the storage segment.
  • 8.
    Big Data asa New Class of Asset • The Age of Big Data (NYTimes Feb. 11, 2012) http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the- world.html “Welcome to the Age of Big Data. The new megarich of Silicon Valley, first at Google and now Facebook, are masters at harnessing the data of the Web — online searches, posts and messages — with Internet advertising. At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “Big Data, Big Impact,” declared data a new class of economic asset, like currency or gold.” 8
  • 9.
  • 10.
    The 3-Vs ofBig Data • Volume – Amount of data • Velocity – speed of data in and out • Variety – range of data types and sources • [Gartner 2012] "Big Data are high-volume, high-velocity, and/or high- variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization" 10
  • 11.
    What can youdo with the data • Reporting – Post Hoc – Real time • Monitoring (fine-grained) • Exploration • Finding Patterns • Root Cause Analysis • Closed-loop Control • Model construction • Prediction • … 11 © Mike Franklin
  • 12.
    10 ways bigdata changes everything • Some concrete examples – http://gigaom.com/2012/03/11/10-ways-big-data-is-changing-everything/2/ 1. Can gigabytes predict the next Lady Gaga? 2. How big data can curb the world’s energy consumption 3. Big data is now your company’s virtual assistant 4. The future of Foursquare is data-fueled recommendations 5. How Twitter data-tracked cholera in Haiti 6. Revolutionizing Web publishing with big data 7. Can cell phone data cure society’s ills? 8. How data can help predict and create video hits 9. The new face of data visualization 10. One hospital’s embrace of big data 12
  • 13.
    Typical Big DataSuccess Story • Modeling users through Big Data – Online ads sale / placement [e.g., Facebook] – Personalized Coupons [e.g., Target] – Product Placement [Walmart] – Content Generation [e.g., NetFlix] – Personalized learning [e.g., Duolingo] – HR Recruiting [e.g., Gild] 13
  • 14.
    More Data =>Better Answers? • Not that easy… • More Rows: Algorithmic complexity kicks in • More Columns: Exponentially more hypotheses • Another formulation of the problem: – Given an inferential goal and a fixed computational budget, provide a guarantee that the quality of inference will increase monotonically as data accrue (without bound) • In other words: => Data should be a resource, not a load 14 © Mike Jordan
  • 15.
  • 16.
  • 17.
    Leading the Packof Wolves: Hadoop • Google: Map/Reduce paper published 2004 • Open source variant: Hadoop • Map-reduce = high-level programming model and implementation for large-scale parallel data processing • Right now most overhyped system in CS 17
  • 18.
    What about SwissBig Data? • Competitive Research Groups • Swiss Big Data User Group • Swiss companies playing catch-up – Productized Big Data systems at leading telcos & financial companies – Big Data is not a new technology: it's a fact; • Deal with it  POCs in most banks, insurance companies, retailers 18
  • 19.
    Tasty Bites ofBig Data (1) Thursday afternoon • 13:30-15:00: Big Data Profiling Felix Naumann (Hasso Plattner Institute) • 15:15-16:45: Realtime Analytics Christoph Koch (EPFL) • 16:45-17:45: Current Trends and Challenges in Big Data Benchmarking Kais Sachs (SAP / Spec) 19
  • 20.
    Tasty Bites ofBig Data (2) Friday • 9:00 - 10:30: Structured Data in Web Search Alon Halevy (Google) • 10:45 - 12:15: Human Computation for Big Data Gianluca Demartini (UNIFR) • 13:30-15:00: Analysing and Querying Big Scientific Data Thomas Heinis (EPFL) • 15:00-16:30: The Evolution of Big Data Frameworks Carlo Curino (Microsoft Research) 20
  • 21.
    Social Event, Friday– Beer Tasting! Basse-Ville Fribourg / 15 CHF per Person Everything You Always Wanted to Know About Beer. * But Were Afraid to Ask! 18:00 @ Café du Belvédère, Grand-Rue 36 19:00 @ Fri-Mousse, Rue de la Samaritaine 19 Limited Places, Inscription is mandatory at: http://xr.si