SlideShare a Scribd company logo
1 of 30
Download to read offline
Complex Realtime
Event Analytics using BigQuery
Márton Kodok
Senior Software Engineer at REEA
twitter: martonkodok stackoverflow: pentium10 github: pentium10
Crunch Warm Up - October 2015 - Budapest
Agenda
1. Big Data movement
2. Analytics Project - Background
3. Challenges - Why is it so hard?
4. Approach - Strategy - Application
5. Use Cases - Implementations
6. Exploring Big Data (GDELT, Hackernews, Reddit)
Complex Realtime Event Analytics using BigQuery @martonkodok
Big data analyses movement
Every scientist who needs
big data analytics to save millions of lives
should have that power.
Complex Realtime Event Analytics using BigQuery @martonkodok
Challenging experience
The simple fact is that
you are brilliant
but your brilliant ideas require
complex big data analytics.
Complex Realtime Event Analytics using BigQuery @martonkodok
Project: One-size-fits-all problem
Need a backend to store, query, extract for deep analytics:
● Events (product, app, site email events)
● Achievements (“tag” users on the go, retention)
● Entities (split tests, user profiles, business entities)
● Metrics (app profiler data, custom)
● Email activity (click-map, engagement, ISP, Spam)
● 3rd party Analytics (good to have: Google Analytics)
● Systems generated data (log file entries, unstructured)
Complex Realtime Event Analytics using BigQuery @martonkodok
Desired system/platform
● Terabyte scalable storage
● Real-time event ingestion
● Ask sophisticated queries (optional: without Dev)
● Query-performance
● Low-maintenance
● Cost effective
● Wire them up easily
Goal: Store everything accessible by SQL immediately.
Complex Realtime Event Analytics using BigQuery @martonkodok
Equipment strategy
● In-House
● Hosted
● Managed
* people still required
Services:
❏ ELK Stack (Elastic-Logstash-Kibana)...
❏ Cassandra, Hive, Hadoop...
❏ Amazon RedShift, Google BigQuery...
Complex Realtime Event Analytics using BigQuery @martonkodok
Complex Realtime Event Analytics using BigQuery @martonkodok
Google BigQuery
What is BigQuery?
● Analytics-as-a-Service - Data Warehouse in the Cloud
● Fully-Managed
● Scales into Petabytes
● Ridiculously fast
● Decent pricing (queries $5/TB, storage: $20/TB)
● 100.000 rows / sec Streaming API
* October 2015 pricing
Complex Realtime Event Analytics using BigQuery @martonkodok
BigQuery: Big Data Analytics in the Cloud
● Convenience of SQL
● Familiar DB Structure (table, column, views, JSON)
● Open Interfaces (REST, Web UI, ODBC)
● Fast atomic imports JSON/CSV (file size up to 5TB)
● Simple data ingest from GCS or Hadoop
● Web UI + bq CLI
● Connectors: Hadoop, Tableau, R, Talend, Logstash
● US or EU zone
Complex Realtime Event Analytics using BigQuery @martonkodok
BigQuery: Convenience of SQL/JSON/JS
● Append-only tables
● Batch load file size limits: 5TB (CSV or JSON)
● ACL - row level locking (individual or group based)
● Columnar storage (max 10 000 columns in table)
● Rich SQL: JSON,IP,Math,RegExp,Window functions
● Datatypes: String 2MB, Record, Nested …
● UDF (User defined functions): Javascript
Note: Store what you can in columns, the rest in JSON.
Complex Realtime Event Analytics using BigQuery @martonkodok
BigQuery Costs - October 2015
* 1 Petabyte storage, 100 TB rows insert, 100 TB queries => 26,000 USD
Queries Storage Ingestion
➔ 1 TB per month free
➔ 5 USD per TB
➔ only pay for the columns
you use in your query
➔ 20 USD per TB ➔ Batch load free (CSV/JSON)
➔ Exporting free
➔ Table copy free
➔ 1 USD per 20TB data
Estimate 1
- Storage 5 TB
- Streaming Inserts 5TB
- Queries 3 TB
Monthly total: 110 USD
Estimate 2
- Storage 20 TB
- Streaming Inserts 10TB
- Queries 10 TB
Monthly total: 455 USD
Complex Realtime Event Analytics using BigQuery @martonkodok
UDF - Power of Javascript
● impossible to express in SQL: Loops, complex
conditionals, string parsing or transformations
● UDFs are similar to map functions in MapReduce
● inline JS or from GCS (gs://some-bucket/js/lib.js)
Some UDF use cases:
● take one row and emit zero or more rows
● decoding URL-encoded strings
● text readability
Complex Realtime Event Analytics using BigQuery @martonkodok
Append only tables - Get last value
1. Use aggregation MIN/MAX on timestamp to find first/last and join back to the same table.
2. Use analytic functions FIRST_VALUE and LAST_VALUE.
SELECT LAST_VALUE(email) OVER(
PARTITION BY user_id
ORDER BY timestamp ASC) AS email_last ...
3. Using Window Functions
SELECT email, firstname, lastname
FROM
(SELECT email, firstname, lastname
row_number() over (partition BY user_id
ORDER BY timestamp DESC) seqnum
FROM [profile_event]
)
WHERE seqnum=1
Complex Realtime Event Analytics using BigQuery @martonkodok
Table wildcard functions
This example assumes the following tables exist:
● mydata.people20140323
● mydata.people20140324
● mydata.people20140325
SELECT
name
FROM
(TABLE_DATE_RANGE(mydata.people,
DATE_ADD(CURRENT_TIMESTAMP(), -2, 'DAY'),
CURRENT_TIMESTAMP()))
WHERE
age >= 35
#... another example with RegExp ...
FROM
(TABLE_QUERY(mydata,
'REGEXP_MATCH(table_id, r"^boo[d]{3,5}")'))
Complex Realtime Event Analytics using BigQuery @martonkodok
Infrastructure
Complex Realtime Event Analytics using BigQuery @martonkodok
Schema modelling
Complex Realtime Event Analytics using BigQuery @martonkodok
+--------------------------+-----------+----------+--+
| order_id | INTEGER | REQUIRED | |
| ... | | | |
| products | RECORD | REPEATED | |
| products.product_id | INTEGER | NULLABLE | |
| products.attributes | STRING | REPEATED | |
| products.price | FLOAT | NULLABLE | |
| products.name | STRING | NULLABLE | |
| ... | | | |
| common | RECORD | NULLABLE | |
| common.insert_id | INTEGER | REQUIRED | |
| common.tenant | INTEGER | REQUIRED | |
| common.event | INTEGER | REQUIRED | |
| common.user_id | INTEGER | REQUIRED | |
| common.timestamp | TIMESTAMP | REQUIRED | |
| .... | | | |
| common.utm | RECORD | NULLABLE | |
| common.utm.source | STRING | NULLABLE | |
| common.utm.medium | STRING | NULLABLE | |
| common.utm.campaign | STRING | NULLABLE | |
| common.utm.content | STRING | NULLABLE | |
| common.utm.term | STRING | NULLABLE | |
| meta | STRING | NULLABLE | |
+--------------------------+-----------+----------+--+
Streaming insert time (ms) - last 6M
Complex Realtime Event Analytics using BigQuery @martonkodok
Achievements
● Funnel Analysis
Complex Realtime Event Analytics using BigQuery @martonkodok
Attribute orders to first article visited
Example:
● article1 -> page2 -> page3 -> page4 -> orderpage1 -> thankyoupage1
● page1 -> article2-> page3 -> orderpage2 -> ...
Problem: When an order is made, attribute a credit to the first article visited by that user!
Complex Realtime Event Analytics using BigQuery @martonkodok
Achievements
● Funnel Analysis
● Email URL click heatmap
Complex Realtime Event Analytics using BigQuery @martonkodok
Email URL clicks map (79GB in 2.4sec)
Complex Realtime Event Analytics using BigQuery @martonkodok
Achievements Continued
● Funnel Analysis
● Email URL click heatmap
● Email Dashboard (Trends, SPAM, ISP deferral)
● Split tests (by content, region, device, during the day)
● Ability for advanced segmentation as all raw data is stored
● Behavioral analytics (engaged users, recommendations)
Complex Realtime Event Analytics using BigQuery @martonkodok
Our benefits
● no provisioning/deploy
● no running out of resources
● no more focus on large scale execution plan
● no need to re-implement tricky concepts
(time windows / join streams)
● pay only the columns we have in your queries
● run raw ad-hoc queries (either by analysts/sales or Devs)
● no more throwing away-, expiring-, aggregating old data.
Complex Realtime Event Analytics using BigQuery @martonkodok
BigQuery: Sample projects to try out
1. githubarchive.org: 20+ event types available since 2012
a. pull request latency
b. expressions, emotions in commit messages
2. httparchive.org: Trends in web technology
a. popular scripts
b. website performance
3. raw Google Analytics data (*only Premium Customers)
4. GDELT - Global Database of Events, Language, and Tone
GKG - Global Knowledge Graph
5. GSOD - samples of weather (rainfall, temp…)
6. 1.6 billion Reddit comments
7. Hackernews data
8. Wikipedia edits
Complex Realtime Event Analytics using BigQuery @martonkodok
HttpArchive - .HU Javascript frameworks
Complex Realtime Event Analytics using BigQuery @martonkodok
GDELT - News Coverage: Orbán Viktor
Complex Realtime Event Analytics using BigQuery @martonkodok
GDELT - News Coverage: Beata Szydlo
Complex Realtime Event Analytics using BigQuery @martonkodok
Reddit - books community talks about
Complex Realtime Event Analytics using BigQuery @martonkodok
Questions?
Thank you.

More Related Content

What's hot

Big query the first step - (MOSG)
Big query the first step - (MOSG)Big query the first step - (MOSG)
Big query the first step - (MOSG)Soshi Nemoto
 
30 days of google cloud event
30 days of google cloud event30 days of google cloud event
30 days of google cloud eventPreetyKhatkar
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query BasicsIdo Green
 
TDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDataTDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDatatdc-globalcode
 
Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...
Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...
Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...javier ramirez
 
Getting started with BigQuery
Getting started with BigQueryGetting started with BigQuery
Getting started with BigQueryPradeep Bhadani
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at TwitchImply
 
BigQuery for the Big Data win
BigQuery for the Big Data winBigQuery for the Big Data win
BigQuery for the Big Data winKen Taylor
 
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Charles Allen
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnectaDigital
 
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataTreasure Data, Inc.
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaObjectRocket
 
OWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuOWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuDataiku
 
Google Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery WebinarGoogle Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery WebinarRasel Rana
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Imply
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherObjectRocket
 
What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18Imply
 
Making advanced analytics accessible to more companies
Making advanced analytics accessible to more companiesMaking advanced analytics accessible to more companies
Making advanced analytics accessible to more companiesMárton Kodok
 

What's hot (20)

Google BigQuery
Google BigQueryGoogle BigQuery
Google BigQuery
 
Big query the first step - (MOSG)
Big query the first step - (MOSG)Big query the first step - (MOSG)
Big query the first step - (MOSG)
 
30 days of google cloud event
30 days of google cloud event30 days of google cloud event
30 days of google cloud event
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
 
TDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDataTDC2016SP - Trilha BigData
TDC2016SP - Trilha BigData
 
Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...
Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...
Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...
 
Getting started with BigQuery
Getting started with BigQueryGetting started with BigQuery
Getting started with BigQuery
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at Twitch
 
BigQuery for the Big Data win
BigQuery for the Big Data winBigQuery for the Big Data win
BigQuery for the Big Data win
 
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin Buzzwords
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
OWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuOWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - Dataiku
 
Google Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery WebinarGoogle Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery Webinar
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better Together
 
What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18
 
Making advanced analytics accessible to more companies
Making advanced analytics accessible to more companiesMaking advanced analytics accessible to more companies
Making advanced analytics accessible to more companies
 

Viewers also liked

Failing at Scale - PNWPHP 2016
Failing at Scale - PNWPHP 2016Failing at Scale - PNWPHP 2016
Failing at Scale - PNWPHP 2016Chris Tankersley
 
LXC - kontener pingwinów
LXC - kontener pingwinówLXC - kontener pingwinów
LXC - kontener pingwinówgnosek
 
Online Communities
Online CommunitiesOnline Communities
Online CommunitiesDawn Foster
 
Ecce de-gids nl
Ecce de-gids nlEcce de-gids nl
Ecce de-gids nlswaipnew
 
Cloud Foundry Logging and Metrics
Cloud Foundry Logging and MetricsCloud Foundry Logging and Metrics
Cloud Foundry Logging and MetricsEd King
 
AtlasCamp 2015: How HipChat ships at the speed of awesome
AtlasCamp 2015: How HipChat ships at the speed of awesomeAtlasCamp 2015: How HipChat ships at the speed of awesome
AtlasCamp 2015: How HipChat ships at the speed of awesomeAtlassian
 
How Docker EE is Finnish Railway’s Ticket to App Modernization
How Docker EE is Finnish Railway’s Ticket to App ModernizationHow Docker EE is Finnish Railway’s Ticket to App Modernization
How Docker EE is Finnish Railway’s Ticket to App ModernizationDocker, Inc.
 
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...Amazon Web Services
 
Roxar Multiphase Meter
Roxar Multiphase MeterRoxar Multiphase Meter
Roxar Multiphase Meterali_elkaseh
 
Regex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadRegex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadAll Things Open
 
Docker for PHP Developers - Madison PHP 2017
Docker for PHP Developers - Madison PHP 2017Docker for PHP Developers - Madison PHP 2017
Docker for PHP Developers - Madison PHP 2017Chris Tankersley
 
Using a Canary Microservice to Validate the Software Delivery Pipeline
Using a Canary Microservice to Validate the Software Delivery PipelineUsing a Canary Microservice to Validate the Software Delivery Pipeline
Using a Canary Microservice to Validate the Software Delivery PipelineXebiaLabs
 
Catálogo Elk Sport 2016 2017
Catálogo Elk Sport 2016 2017Catálogo Elk Sport 2016 2017
Catálogo Elk Sport 2016 2017Elk Sport
 
B2B Digital Transformation - Case Study
B2B Digital Transformation - Case StudyB2B Digital Transformation - Case Study
B2B Digital Transformation - Case StudyDivante
 
Open Secrets of the Defense Industry: Building Your Own Intelligence Program ...
Open Secrets of the Defense Industry: Building Your Own Intelligence Program ...Open Secrets of the Defense Industry: Building Your Own Intelligence Program ...
Open Secrets of the Defense Industry: Building Your Own Intelligence Program ...Sean Whalen
 

Viewers also liked (20)

Failing at Scale - PNWPHP 2016
Failing at Scale - PNWPHP 2016Failing at Scale - PNWPHP 2016
Failing at Scale - PNWPHP 2016
 
Distributed cat herding
Distributed cat herdingDistributed cat herding
Distributed cat herding
 
LXC - kontener pingwinów
LXC - kontener pingwinówLXC - kontener pingwinów
LXC - kontener pingwinów
 
114 Numalliance
114 Numalliance114 Numalliance
114 Numalliance
 
Online Communities
Online CommunitiesOnline Communities
Online Communities
 
Ecce de-gids nl
Ecce de-gids nlEcce de-gids nl
Ecce de-gids nl
 
Cloud Foundry Logging and Metrics
Cloud Foundry Logging and MetricsCloud Foundry Logging and Metrics
Cloud Foundry Logging and Metrics
 
AtlasCamp 2015: How HipChat ships at the speed of awesome
AtlasCamp 2015: How HipChat ships at the speed of awesomeAtlasCamp 2015: How HipChat ships at the speed of awesome
AtlasCamp 2015: How HipChat ships at the speed of awesome
 
How Docker EE is Finnish Railway’s Ticket to App Modernization
How Docker EE is Finnish Railway’s Ticket to App ModernizationHow Docker EE is Finnish Railway’s Ticket to App Modernization
How Docker EE is Finnish Railway’s Ticket to App Modernization
 
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
 
Roxar Multiphase Meter
Roxar Multiphase MeterRoxar Multiphase Meter
Roxar Multiphase Meter
 
Gsm jammer
Gsm jammerGsm jammer
Gsm jammer
 
Regex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadRegex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language Instead
 
Docker for PHP Developers - Madison PHP 2017
Docker for PHP Developers - Madison PHP 2017Docker for PHP Developers - Madison PHP 2017
Docker for PHP Developers - Madison PHP 2017
 
Using a Canary Microservice to Validate the Software Delivery Pipeline
Using a Canary Microservice to Validate the Software Delivery PipelineUsing a Canary Microservice to Validate the Software Delivery Pipeline
Using a Canary Microservice to Validate the Software Delivery Pipeline
 
Yirgacheffe Chelelelktu Washed Coffee 2015
Yirgacheffe Chelelelktu Washed Coffee 2015Yirgacheffe Chelelelktu Washed Coffee 2015
Yirgacheffe Chelelelktu Washed Coffee 2015
 
Catálogo Elk Sport 2016 2017
Catálogo Elk Sport 2016 2017Catálogo Elk Sport 2016 2017
Catálogo Elk Sport 2016 2017
 
Microservices
MicroservicesMicroservices
Microservices
 
B2B Digital Transformation - Case Study
B2B Digital Transformation - Case StudyB2B Digital Transformation - Case Study
B2B Digital Transformation - Case Study
 
Open Secrets of the Defense Industry: Building Your Own Intelligence Program ...
Open Secrets of the Defense Industry: Building Your Own Intelligence Program ...Open Secrets of the Defense Industry: Building Your Own Intelligence Program ...
Open Secrets of the Defense Industry: Building Your Own Intelligence Program ...
 

Similar to Complex realtime event analytics using BigQuery @Crunch Warmup

Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperMárton Kodok
 
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryVoxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryMárton Kodok
 
Supercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuerySupercharge your data analytics with BigQuery
Supercharge your data analytics with BigQueryMárton Kodok
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryMárton Kodok
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryMárton Kodok
 
DevTalks Keynote Powering interactive data analysis with Google BigQuery
DevTalks Keynote Powering interactive data analysis with Google BigQueryDevTalks Keynote Powering interactive data analysis with Google BigQuery
DevTalks Keynote Powering interactive data analysis with Google BigQueryMárton Kodok
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQLYu Ishikawa
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQueryDharmesh Vaya
 
IoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTIoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTJames Chittenden
 
Batch and Interactive Analytics: From Data to Insight
Batch and Interactive Analytics: From Data to InsightBatch and Interactive Analytics: From Data to Insight
Batch and Interactive Analytics: From Data to InsightWSO2
 
Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Ido Green
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksGuido Schmutz
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataTrieu Nguyen
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLMárton Kodok
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
 
Using Elasticsearch for Analytics
Using Elasticsearch for AnalyticsUsing Elasticsearch for Analytics
Using Elasticsearch for AnalyticsVaidik Kapoor
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageJulien Le Dem
 
Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...
Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...
Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...GITS Indonesia
 

Similar to Complex realtime event analytics using BigQuery @Crunch Warmup (20)

Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryVoxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
 
Supercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuerySupercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuery
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
 
DevTalks Keynote Powering interactive data analysis with Google BigQuery
DevTalks Keynote Powering interactive data analysis with Google BigQueryDevTalks Keynote Powering interactive data analysis with Google BigQuery
DevTalks Keynote Powering interactive data analysis with Google BigQuery
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
 
IoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTIoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoT
 
Batch and Interactive Analytics: From Data to Insight
Batch and Interactive Analytics: From Data to InsightBatch and Interactive Analytics: From Data to Insight
Batch and Interactive Analytics: From Data to Insight
 
Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
Sprint 69
Sprint 69Sprint 69
Sprint 69
 
Using Elasticsearch for Analytics
Using Elasticsearch for AnalyticsUsing Elasticsearch for Analytics
Using Elasticsearch for Analytics
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineage
 
Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...
Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...
Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...
 

More from Márton Kodok

Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionGen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionMárton Kodok
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsMárton Kodok
 
Discover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statementDiscover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statementMárton Kodok
 
Cloud Run - the rise of serverless and containerization
Cloud Run - the rise of serverless and containerizationCloud Run - the rise of serverless and containerization
Cloud Run - the rise of serverless and containerizationMárton Kodok
 
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
BigQuery best practices and recommendations to reduce costs with BI Engine, S...BigQuery best practices and recommendations to reduce costs with BI Engine, S...
BigQuery best practices and recommendations to reduce costs with BI Engine, S...Márton Kodok
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudMárton Kodok
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
 
Cloud Workflows What's new in serverless orchestration and automation
Cloud Workflows What's new in serverless orchestration and automationCloud Workflows What's new in serverless orchestration and automation
Cloud Workflows What's new in serverless orchestration and automationMárton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsMárton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsMárton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsMárton Kodok
 
BigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery MLBigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery MLMárton Kodok
 
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.Márton Kodok
 
Applying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analyticsApplying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analyticsMárton Kodok
 
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer ExpertigVibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer ExpertigMárton Kodok
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLMárton Kodok
 
Google Cloud Platform Solutions for DevOps Engineers
Google Cloud Platform Solutions  for DevOps EngineersGoogle Cloud Platform Solutions  for DevOps Engineers
Google Cloud Platform Solutions for DevOps EngineersMárton Kodok
 
GDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud PlatformGDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud PlatformMárton Kodok
 
Next18 Extended Targu Mures - Bringing the Cloud to you
Next18 Extended Targu Mures - Bringing the Cloud to youNext18 Extended Targu Mures - Bringing the Cloud to you
Next18 Extended Targu Mures - Bringing the Cloud to youMárton Kodok
 
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud PlatformonMárton Kodok
 

More from Márton Kodok (20)

Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionGen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflows
 
Discover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statementDiscover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statement
 
Cloud Run - the rise of serverless and containerization
Cloud Run - the rise of serverless and containerizationCloud Run - the rise of serverless and containerization
Cloud Run - the rise of serverless and containerization
 
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
BigQuery best practices and recommendations to reduce costs with BI Engine, S...BigQuery best practices and recommendations to reduce costs with BI Engine, S...
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
Cloud Workflows What's new in serverless orchestration and automation
Cloud Workflows What's new in serverless orchestration and automationCloud Workflows What's new in serverless orchestration and automation
Cloud Workflows What's new in serverless orchestration and automation
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
 
BigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery MLBigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery ML
 
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
 
Applying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analyticsApplying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analytics
 
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer ExpertigVibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
Google Cloud Platform Solutions for DevOps Engineers
Google Cloud Platform Solutions  for DevOps EngineersGoogle Cloud Platform Solutions  for DevOps Engineers
Google Cloud Platform Solutions for DevOps Engineers
 
GDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud PlatformGDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud Platform
 
Next18 Extended Targu Mures - Bringing the Cloud to you
Next18 Extended Targu Mures - Bringing the Cloud to youNext18 Extended Targu Mures - Bringing the Cloud to you
Next18 Extended Targu Mures - Bringing the Cloud to you
 
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
 

Recently uploaded

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 

Recently uploaded (20)

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 

Complex realtime event analytics using BigQuery @Crunch Warmup

  • 1. Complex Realtime Event Analytics using BigQuery Márton Kodok Senior Software Engineer at REEA twitter: martonkodok stackoverflow: pentium10 github: pentium10 Crunch Warm Up - October 2015 - Budapest
  • 2. Agenda 1. Big Data movement 2. Analytics Project - Background 3. Challenges - Why is it so hard? 4. Approach - Strategy - Application 5. Use Cases - Implementations 6. Exploring Big Data (GDELT, Hackernews, Reddit) Complex Realtime Event Analytics using BigQuery @martonkodok
  • 3. Big data analyses movement Every scientist who needs big data analytics to save millions of lives should have that power. Complex Realtime Event Analytics using BigQuery @martonkodok
  • 4. Challenging experience The simple fact is that you are brilliant but your brilliant ideas require complex big data analytics. Complex Realtime Event Analytics using BigQuery @martonkodok
  • 5. Project: One-size-fits-all problem Need a backend to store, query, extract for deep analytics: ● Events (product, app, site email events) ● Achievements (“tag” users on the go, retention) ● Entities (split tests, user profiles, business entities) ● Metrics (app profiler data, custom) ● Email activity (click-map, engagement, ISP, Spam) ● 3rd party Analytics (good to have: Google Analytics) ● Systems generated data (log file entries, unstructured) Complex Realtime Event Analytics using BigQuery @martonkodok
  • 6. Desired system/platform ● Terabyte scalable storage ● Real-time event ingestion ● Ask sophisticated queries (optional: without Dev) ● Query-performance ● Low-maintenance ● Cost effective ● Wire them up easily Goal: Store everything accessible by SQL immediately. Complex Realtime Event Analytics using BigQuery @martonkodok
  • 7. Equipment strategy ● In-House ● Hosted ● Managed * people still required Services: ❏ ELK Stack (Elastic-Logstash-Kibana)... ❏ Cassandra, Hive, Hadoop... ❏ Amazon RedShift, Google BigQuery... Complex Realtime Event Analytics using BigQuery @martonkodok
  • 8. Complex Realtime Event Analytics using BigQuery @martonkodok Google BigQuery
  • 9. What is BigQuery? ● Analytics-as-a-Service - Data Warehouse in the Cloud ● Fully-Managed ● Scales into Petabytes ● Ridiculously fast ● Decent pricing (queries $5/TB, storage: $20/TB) ● 100.000 rows / sec Streaming API * October 2015 pricing Complex Realtime Event Analytics using BigQuery @martonkodok
  • 10. BigQuery: Big Data Analytics in the Cloud ● Convenience of SQL ● Familiar DB Structure (table, column, views, JSON) ● Open Interfaces (REST, Web UI, ODBC) ● Fast atomic imports JSON/CSV (file size up to 5TB) ● Simple data ingest from GCS or Hadoop ● Web UI + bq CLI ● Connectors: Hadoop, Tableau, R, Talend, Logstash ● US or EU zone Complex Realtime Event Analytics using BigQuery @martonkodok
  • 11. BigQuery: Convenience of SQL/JSON/JS ● Append-only tables ● Batch load file size limits: 5TB (CSV or JSON) ● ACL - row level locking (individual or group based) ● Columnar storage (max 10 000 columns in table) ● Rich SQL: JSON,IP,Math,RegExp,Window functions ● Datatypes: String 2MB, Record, Nested … ● UDF (User defined functions): Javascript Note: Store what you can in columns, the rest in JSON. Complex Realtime Event Analytics using BigQuery @martonkodok
  • 12. BigQuery Costs - October 2015 * 1 Petabyte storage, 100 TB rows insert, 100 TB queries => 26,000 USD Queries Storage Ingestion ➔ 1 TB per month free ➔ 5 USD per TB ➔ only pay for the columns you use in your query ➔ 20 USD per TB ➔ Batch load free (CSV/JSON) ➔ Exporting free ➔ Table copy free ➔ 1 USD per 20TB data Estimate 1 - Storage 5 TB - Streaming Inserts 5TB - Queries 3 TB Monthly total: 110 USD Estimate 2 - Storage 20 TB - Streaming Inserts 10TB - Queries 10 TB Monthly total: 455 USD Complex Realtime Event Analytics using BigQuery @martonkodok
  • 13. UDF - Power of Javascript ● impossible to express in SQL: Loops, complex conditionals, string parsing or transformations ● UDFs are similar to map functions in MapReduce ● inline JS or from GCS (gs://some-bucket/js/lib.js) Some UDF use cases: ● take one row and emit zero or more rows ● decoding URL-encoded strings ● text readability Complex Realtime Event Analytics using BigQuery @martonkodok
  • 14. Append only tables - Get last value 1. Use aggregation MIN/MAX on timestamp to find first/last and join back to the same table. 2. Use analytic functions FIRST_VALUE and LAST_VALUE. SELECT LAST_VALUE(email) OVER( PARTITION BY user_id ORDER BY timestamp ASC) AS email_last ... 3. Using Window Functions SELECT email, firstname, lastname FROM (SELECT email, firstname, lastname row_number() over (partition BY user_id ORDER BY timestamp DESC) seqnum FROM [profile_event] ) WHERE seqnum=1 Complex Realtime Event Analytics using BigQuery @martonkodok
  • 15. Table wildcard functions This example assumes the following tables exist: ● mydata.people20140323 ● mydata.people20140324 ● mydata.people20140325 SELECT name FROM (TABLE_DATE_RANGE(mydata.people, DATE_ADD(CURRENT_TIMESTAMP(), -2, 'DAY'), CURRENT_TIMESTAMP())) WHERE age >= 35 #... another example with RegExp ... FROM (TABLE_QUERY(mydata, 'REGEXP_MATCH(table_id, r"^boo[d]{3,5}")')) Complex Realtime Event Analytics using BigQuery @martonkodok
  • 16. Infrastructure Complex Realtime Event Analytics using BigQuery @martonkodok
  • 17. Schema modelling Complex Realtime Event Analytics using BigQuery @martonkodok +--------------------------+-----------+----------+--+ | order_id | INTEGER | REQUIRED | | | ... | | | | | products | RECORD | REPEATED | | | products.product_id | INTEGER | NULLABLE | | | products.attributes | STRING | REPEATED | | | products.price | FLOAT | NULLABLE | | | products.name | STRING | NULLABLE | | | ... | | | | | common | RECORD | NULLABLE | | | common.insert_id | INTEGER | REQUIRED | | | common.tenant | INTEGER | REQUIRED | | | common.event | INTEGER | REQUIRED | | | common.user_id | INTEGER | REQUIRED | | | common.timestamp | TIMESTAMP | REQUIRED | | | .... | | | | | common.utm | RECORD | NULLABLE | | | common.utm.source | STRING | NULLABLE | | | common.utm.medium | STRING | NULLABLE | | | common.utm.campaign | STRING | NULLABLE | | | common.utm.content | STRING | NULLABLE | | | common.utm.term | STRING | NULLABLE | | | meta | STRING | NULLABLE | | +--------------------------+-----------+----------+--+
  • 18. Streaming insert time (ms) - last 6M Complex Realtime Event Analytics using BigQuery @martonkodok
  • 19. Achievements ● Funnel Analysis Complex Realtime Event Analytics using BigQuery @martonkodok
  • 20. Attribute orders to first article visited Example: ● article1 -> page2 -> page3 -> page4 -> orderpage1 -> thankyoupage1 ● page1 -> article2-> page3 -> orderpage2 -> ... Problem: When an order is made, attribute a credit to the first article visited by that user! Complex Realtime Event Analytics using BigQuery @martonkodok
  • 21. Achievements ● Funnel Analysis ● Email URL click heatmap Complex Realtime Event Analytics using BigQuery @martonkodok
  • 22. Email URL clicks map (79GB in 2.4sec) Complex Realtime Event Analytics using BigQuery @martonkodok
  • 23. Achievements Continued ● Funnel Analysis ● Email URL click heatmap ● Email Dashboard (Trends, SPAM, ISP deferral) ● Split tests (by content, region, device, during the day) ● Ability for advanced segmentation as all raw data is stored ● Behavioral analytics (engaged users, recommendations) Complex Realtime Event Analytics using BigQuery @martonkodok
  • 24. Our benefits ● no provisioning/deploy ● no running out of resources ● no more focus on large scale execution plan ● no need to re-implement tricky concepts (time windows / join streams) ● pay only the columns we have in your queries ● run raw ad-hoc queries (either by analysts/sales or Devs) ● no more throwing away-, expiring-, aggregating old data. Complex Realtime Event Analytics using BigQuery @martonkodok
  • 25. BigQuery: Sample projects to try out 1. githubarchive.org: 20+ event types available since 2012 a. pull request latency b. expressions, emotions in commit messages 2. httparchive.org: Trends in web technology a. popular scripts b. website performance 3. raw Google Analytics data (*only Premium Customers) 4. GDELT - Global Database of Events, Language, and Tone GKG - Global Knowledge Graph 5. GSOD - samples of weather (rainfall, temp…) 6. 1.6 billion Reddit comments 7. Hackernews data 8. Wikipedia edits Complex Realtime Event Analytics using BigQuery @martonkodok
  • 26. HttpArchive - .HU Javascript frameworks Complex Realtime Event Analytics using BigQuery @martonkodok
  • 27. GDELT - News Coverage: Orbán Viktor Complex Realtime Event Analytics using BigQuery @martonkodok
  • 28. GDELT - News Coverage: Beata Szydlo Complex Realtime Event Analytics using BigQuery @martonkodok
  • 29. Reddit - books community talks about Complex Realtime Event Analytics using BigQuery @martonkodok