SlideShare a Scribd company logo
1 of 38
Energy Usage Insights
with Hadoop & HBase
July 25, 2013
Scott Kuehn Data Architect
Oren Benjamin Senior Software Engineer
Our Utility Partners
2
Australia New Zealand France Nova ScotiaUK
Energy Usage Insights
326 July 2013
Home Energy Report
426 July 2013
Energy Savings
526 July 2013
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
4.5%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
Average Steady State Savings = ~1.5 – 3.5%
Months since program start
Energy saved
Impact
626 July 2013
$300,000,000
2,500,000,000 kWh
4,000,000,000 lbs CO2
Web Portal
726 July 2013
826 July 2013
Data Overview: Energy Usage Streams
926 July 2013
meter usage cost start end
0001 719.23 57.52 2013-01-04T00:00:00 2013-02-11T00:00:00
0001 742.61 59.36 2013-02-11T00:00:00 2013-03-12T00:00:00
0002 0.2050 2013-01-01T00:00:00 2013-01-01T00:15:00
0002 0.2250 2013-01-01T00:15:00 2013-01-01T00:30:00
0002 0.2350 2013-01-01T00:30:00 2013-01-01T00:45:00
0002 0.2050 2013-01-01T00:45:00 2013-01-01T01:00:00
0002 0.2250 2013-01-01T01:00:00 2013-01-01T01:15:00
0001 – Meter (Bills)
0002 – Smart Meter (Quarter-hourly reads)
Data Overview: Smart Meter
1026 July 2013
Data Overview: Entities
1126 July 2013
Customer
Account
Site
Meter
Account
Customer
Account
Meter
Data Overview: Size
1226 July 2013
» Billing data: 60M households
» Smart meter data: 15M households
» On disk: 5TB (raw)
» More smart meter data than all other data combined
Architecture: Usage Data Store
1326 July 2013
Customer
Account
Site
Meter
Account
Customer
Account
Meter
Architecture: Usage Data Store
1426 July 2013
Customer
Account
Site
Meter
Account
Customer
Account
Meter
HBase + Hadoop Architecture v1.0
1526 July 2013
Meter
metadata
Usage data
Mysql report/
AMI DB's
Batch
Workers
Web
servers
Sqoop
MySQL
report/AMI
DB's
HDFS
M/RHBase
HBase + Hadoop Architecture v2.0
1626 July 2013
Meter
metadata
Batch
Workers
Web
servers
HDFS file upload
Mysql report/
AMI DB's
MySQL
report/AMI
DB's
metadata
requests
HDFS
M/RHBase
Usage data
Data Schema: Kiji
1726 July 2013
Kiji Schema
»  Table layout definition
»  Schema management
»  Object serialization
»  Entity-centric data model
Supporting Projects
»  Kiji MR
»  Kiji Hive Adapter
»  Kiji REST
»  ...
Entity-centric Table: Row Key
1826 July 2013
Hash prefix Utility company Site ID
1 byte 4 bytes 8 bytes
"keys_format":{
"encoding":"FORMATTED",
"salt": { "hash_type": "MD5”, "hash_size": 1 },
"components":[
{ "name":"utility_company”, "type":"INTEGER” },
{ "name":"site_id”, "type":"LONG” }
]
}
Entity-centric Table: Site
1926 July 2013
A single row
0.12 kWh
1.3 Therm
24 Therm
356 kWh
Usage Data Column Family
UUA
June 18 - July
17; $25
Insights Column Family
stream:0 stream:1
stream:2 stream:3
uua:0
bill_forecast:0
Insight Example: Rate Calculation
2026 July 2013
Insights: Jobs & Services
2126 July 2013
»  M/R jobs to compute insights in batch
»  Services to access pre-computed insights / compute insights on demand
»  Insight for a Site is calculated based on the data in the Site’s row
»  The calculated insight is saved back to the Site row
Insight Example: Rate Calculation
2226 July 2013
Usage data column family
site
… … …rate
calculation
bill
forecast
Insights column family
Rate Calculation
MapReduce
stream:0 stream:n
Rate Calculation: Producer
2326 July 2013
public class RateCalculationProducer extends KijiProducer {
	
  
@Override	
  
public	
  void	
  produce(KijiRowData	
  siteRowData,	
  
	
   	
   	
   	
   	
   	
  ProducerContext	
  context)	
  {	
  
	
   	
  RateCalculation	
  insight	
  =	
  computeInsight(siteRowData);	
  
	
   	
  context.put(insight);	
  
}	
  
}	
  
Rate Calculation: Producer
2426 July 2013
public class RateCalculationProducer extends KijiProducer {
	
  
@Override	
  
public	
  void	
  produce(KijiRowData	
  siteRowData,	
  
	
   	
   	
   	
   	
   	
  ProducerContext	
  context)	
  {	
  
	
   	
  RateCalculation	
  insight	
  =	
  computeInsight(siteRowData);	
  
	
   	
  context.put(insight);	
  
}	
  
	
  
@Override	
  
public	
  String	
  getOutputColumn()	
  {	
  
	
   	
  return	
  "rate_calculation”;	
  
}	
  
	
  
}	
  
2526 July 2013
public class RateCalculationProducer extends KijiProducer {	
  
	
  
	
  @Override	
  
	
  public	
  KijiDataRequest	
  getDataRequest()	
  {	
  
	
   	
  Configuration	
  conf	
  =	
  getConf();	
  
	
  	
  	
  	
   	
  long	
  startTime	
  =	
  parseLong(conf.get(START_PARAM));	
  
	
  
	
  	
  	
  	
   	
  return	
  KijiDataRequest.builder()	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .withTimeRange(startTime,	
  END_OF_TIME)	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .addColumns(ColumnsDef.create()	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .withMaxVersions(ALL_VERSIONS)	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .addFamily("usage_data"))	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .build();	
  
	
  	
  	
  }	
  
	
  
@Override	
  
public	
  void	
  produce(KijiRowData	
  siteRowData,	
  ...	
  	
  
In-practice
2626 July 2013
»  ETL to an entity-centric schema
»  Bulk loading
»  Mixed workloads
Design decisions and challenges
In-practice: ETL to entity-centric schema
2726 July 2013
meter usage cost start end
0001 719.23 57.52 2013-01-04T00:00:00 2013-02-11T00:00:00
0001 742.61 59.36 2013-02-11T00:00:00 2013-03-12T00:00:00
0002 0.2050 2013-01-01T00:00:00 2013-01-01T00:15:00
0002 0.2250 2013-01-01T00:15:00 2013-01-01T00:30:00
0002 0.2350 2013-01-01T00:30:00 2013-01-01T00:45:00
0002 0.2050 2013-01-01T00:45:00 2013-01-01T01:00:00
0002 0.2250 2013-01-01T01:00:00 2013-01-01T01:15:00
0001 – Meter (Bills)
0002 – Smart Meter (Quarter-hourly reads)
In practice: ETL to entity-centric schema
2826 July 2013
»  Use bulkloading for performance
»  Make ingest process idempotent
»  Introduce a read-log for utility company billing corrections
»  ETL Steps:
1. Ingest all reads into a read-log table2
2. Load reads into the corresponding Site row
Read-log table
M/R Bulkload
Pivot
Site table21
M/R Bulkload
Billing files
In practice: bulk loading
2926 July 2013
»  Bulk loaded files are not assigned sequence numbers
»  All compactions become major compactions
»  Solution: Find a temporary fix, monitor the HBase JIRA
In practice: Mixed workloads
3026 July 2013
Site table
Reporting
apps
Web
servers
M/R
Ad-hoc reads
and forecasts
Batch insight
calculations
Bulk scans
In practice: Mixed workloads
3126 July 2013
»  Supporting mixed workloads requires adapting jobs and configurations
»  IO: Switch to bulkloading, enable direct HDFS reads
»  Major compactions: Disabled
»  Memory: increase heap and region sizes, use MSLAB
»  Verify performance by simulating nominal and high load scenarios
In practice: Mixed workloads
3226 July 2013
Results Visualized
3326 July 2013
Animation of jobs in progress
Mixed Workload Success
3426 July 2013
9ms
2ms
»  Mean read time is ~2ms
»  Nearly 200 forecasts/sec on performance testing cluster
3526 July 2013
Recap
3626 July 2013
Opower
»  Save energy
»  Make money
»  Big (enough) data
Oren Benjamin
oren.benjamin@opower.com
We’re hiring.
http://opower.com/careers
Scott Kuehn
scott.kuehn@opower.com
Rate Calculation: Rate Engine
3726 July 2013
public interface RateEngine {
/**	
  	
  
	
  *	
  Compute	
  the	
  cost	
  per	
  usage	
  read	
  for	
  the	
  given	
  Site	
  	
  
	
  *	
  over	
  the	
  requested	
  time	
  interval.	
  	
  
	
  *	
  @return	
  a	
  RateCalculation	
  containing	
  the	
  result	
  
	
  */	
  
RateCalculation calculate(Site site, List<UsageRead> usageReads);
}
Rate Calculation: Application Context
3826 July 2013
public class RateCalculationProducer extends KijiProducer {
	
  private	
  ConfigurableApplicationContext	
  appContext;	
  
	
  private	
  RateEngine	
  rateEngine;
	
  
	
  @Override	
  
	
  public	
  void	
  setup(KijiContext	
  context)	
  {	
  
	
  	
  	
  	
   	
  String	
  contextPath	
  =	
  getConf().get(CONTEXT_PATH_KEY);	
  
	
  	
  	
  	
   	
  appContext	
  =	
  new	
  XmlAppContext(contextPath);	
  
	
  	
  	
  	
   	
  rateEngine	
  =	
  appContext.getBean(RateEngine.class);	
  
	
  
@Override	
  
public	
  void	
  produce(KijiRowData	
  siteRowData,	
  …

More Related Content

Similar to Energy usage insights_with_hadoop_and_h_base

MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB
 
Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...IRJET Journal
 
MineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperMineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperDerek Diamond
 
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...Sumeet Singh
 
Fast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisFast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisIRJET Journal
 
BlueBRIDGE: Cloud infrastructure serving aquafarms and supporting models
BlueBRIDGE: Cloud infrastructure serving aquafarms and supporting modelsBlueBRIDGE: Cloud infrastructure serving aquafarms and supporting models
BlueBRIDGE: Cloud infrastructure serving aquafarms and supporting modelsBlue BRIDGE
 
Modernizing sql server the right way
Modernizing sql server the right wayModernizing sql server the right way
Modernizing sql server the right wayMariano Kovo
 
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku
 
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse AutomationSolving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse AutomationItai Yaffe
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemstaimur hafeez
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformGoDataDriven
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsSeeling Cheung
 
Azuresatpn19 - An Introduction To Azure Data Factory
Azuresatpn19 - An Introduction To Azure Data FactoryAzuresatpn19 - An Introduction To Azure Data Factory
Azuresatpn19 - An Introduction To Azure Data FactoryRiccardo Perico
 
Reference for data migration pls choose and
Reference for data migration pls choose andReference for data migration pls choose and
Reference for data migration pls choose andiswarianagarajan
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkInfinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkHentsū
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsWaqas Idrees
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET Journal
 
Energy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud EnvironmentEnergy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud EnvironmentIRJET Journal
 

Similar to Energy usage insights_with_hadoop_and_h_base (20)

MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
 
Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...
 
Azure SQL Data Warehouse
Azure SQL Data Warehouse Azure SQL Data Warehouse
Azure SQL Data Warehouse
 
MineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperMineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White Paper
 
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
 
Fast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisFast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data Analysis
 
BlueBRIDGE: Cloud infrastructure serving aquafarms and supporting models
BlueBRIDGE: Cloud infrastructure serving aquafarms and supporting modelsBlueBRIDGE: Cloud infrastructure serving aquafarms and supporting models
BlueBRIDGE: Cloud infrastructure serving aquafarms and supporting models
 
Modernizing sql server the right way
Modernizing sql server the right wayModernizing sql server the right way
Modernizing sql server the right way
 
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin Buzzwords
 
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse AutomationSolving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystems
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with Telematics
 
Azuresatpn19 - An Introduction To Azure Data Factory
Azuresatpn19 - An Introduction To Azure Data FactoryAzuresatpn19 - An Introduction To Azure Data Factory
Azuresatpn19 - An Introduction To Azure Data Factory
 
Reference for data migration pls choose and
Reference for data migration pls choose andReference for data migration pls choose and
Reference for data migration pls choose and
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkInfinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake Analytics
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
 
Energy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud EnvironmentEnergy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud Environment
 

Recently uploaded

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Energy usage insights_with_hadoop_and_h_base

  • 1. Energy Usage Insights with Hadoop & HBase July 25, 2013 Scott Kuehn Data Architect Oren Benjamin Senior Software Engineer
  • 2. Our Utility Partners 2 Australia New Zealand France Nova ScotiaUK
  • 5. Energy Savings 526 July 2013 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 Average Steady State Savings = ~1.5 – 3.5% Months since program start Energy saved
  • 9. Data Overview: Energy Usage Streams 926 July 2013 meter usage cost start end 0001 719.23 57.52 2013-01-04T00:00:00 2013-02-11T00:00:00 0001 742.61 59.36 2013-02-11T00:00:00 2013-03-12T00:00:00 0002 0.2050 2013-01-01T00:00:00 2013-01-01T00:15:00 0002 0.2250 2013-01-01T00:15:00 2013-01-01T00:30:00 0002 0.2350 2013-01-01T00:30:00 2013-01-01T00:45:00 0002 0.2050 2013-01-01T00:45:00 2013-01-01T01:00:00 0002 0.2250 2013-01-01T01:00:00 2013-01-01T01:15:00 0001 – Meter (Bills) 0002 – Smart Meter (Quarter-hourly reads)
  • 10. Data Overview: Smart Meter 1026 July 2013
  • 11. Data Overview: Entities 1126 July 2013 Customer Account Site Meter Account Customer Account Meter
  • 12. Data Overview: Size 1226 July 2013 » Billing data: 60M households » Smart meter data: 15M households » On disk: 5TB (raw) » More smart meter data than all other data combined
  • 13. Architecture: Usage Data Store 1326 July 2013 Customer Account Site Meter Account Customer Account Meter
  • 14. Architecture: Usage Data Store 1426 July 2013 Customer Account Site Meter Account Customer Account Meter
  • 15. HBase + Hadoop Architecture v1.0 1526 July 2013 Meter metadata Usage data Mysql report/ AMI DB's Batch Workers Web servers Sqoop MySQL report/AMI DB's HDFS M/RHBase
  • 16. HBase + Hadoop Architecture v2.0 1626 July 2013 Meter metadata Batch Workers Web servers HDFS file upload Mysql report/ AMI DB's MySQL report/AMI DB's metadata requests HDFS M/RHBase Usage data
  • 17. Data Schema: Kiji 1726 July 2013 Kiji Schema »  Table layout definition »  Schema management »  Object serialization »  Entity-centric data model Supporting Projects »  Kiji MR »  Kiji Hive Adapter »  Kiji REST »  ...
  • 18. Entity-centric Table: Row Key 1826 July 2013 Hash prefix Utility company Site ID 1 byte 4 bytes 8 bytes "keys_format":{ "encoding":"FORMATTED", "salt": { "hash_type": "MD5”, "hash_size": 1 }, "components":[ { "name":"utility_company”, "type":"INTEGER” }, { "name":"site_id”, "type":"LONG” } ] }
  • 19. Entity-centric Table: Site 1926 July 2013 A single row 0.12 kWh 1.3 Therm 24 Therm 356 kWh Usage Data Column Family UUA June 18 - July 17; $25 Insights Column Family stream:0 stream:1 stream:2 stream:3 uua:0 bill_forecast:0
  • 20. Insight Example: Rate Calculation 2026 July 2013
  • 21. Insights: Jobs & Services 2126 July 2013 »  M/R jobs to compute insights in batch »  Services to access pre-computed insights / compute insights on demand »  Insight for a Site is calculated based on the data in the Site’s row »  The calculated insight is saved back to the Site row
  • 22. Insight Example: Rate Calculation 2226 July 2013 Usage data column family site … … …rate calculation bill forecast Insights column family Rate Calculation MapReduce stream:0 stream:n
  • 23. Rate Calculation: Producer 2326 July 2013 public class RateCalculationProducer extends KijiProducer {   @Override   public  void  produce(KijiRowData  siteRowData,              ProducerContext  context)  {      RateCalculation  insight  =  computeInsight(siteRowData);      context.put(insight);   }   }  
  • 24. Rate Calculation: Producer 2426 July 2013 public class RateCalculationProducer extends KijiProducer {   @Override   public  void  produce(KijiRowData  siteRowData,              ProducerContext  context)  {      RateCalculation  insight  =  computeInsight(siteRowData);      context.put(insight);   }     @Override   public  String  getOutputColumn()  {      return  "rate_calculation”;   }     }  
  • 25. 2526 July 2013 public class RateCalculationProducer extends KijiProducer {      @Override    public  KijiDataRequest  getDataRequest()  {      Configuration  conf  =  getConf();            long  startTime  =  parseLong(conf.get(START_PARAM));              return  KijiDataRequest.builder()                                    .withTimeRange(startTime,  END_OF_TIME)                                    .addColumns(ColumnsDef.create()                                            .withMaxVersions(ALL_VERSIONS)                                            .addFamily("usage_data"))                                    .build();        }     @Override   public  void  produce(KijiRowData  siteRowData,  ...    
  • 26. In-practice 2626 July 2013 »  ETL to an entity-centric schema »  Bulk loading »  Mixed workloads Design decisions and challenges
  • 27. In-practice: ETL to entity-centric schema 2726 July 2013 meter usage cost start end 0001 719.23 57.52 2013-01-04T00:00:00 2013-02-11T00:00:00 0001 742.61 59.36 2013-02-11T00:00:00 2013-03-12T00:00:00 0002 0.2050 2013-01-01T00:00:00 2013-01-01T00:15:00 0002 0.2250 2013-01-01T00:15:00 2013-01-01T00:30:00 0002 0.2350 2013-01-01T00:30:00 2013-01-01T00:45:00 0002 0.2050 2013-01-01T00:45:00 2013-01-01T01:00:00 0002 0.2250 2013-01-01T01:00:00 2013-01-01T01:15:00 0001 – Meter (Bills) 0002 – Smart Meter (Quarter-hourly reads)
  • 28. In practice: ETL to entity-centric schema 2826 July 2013 »  Use bulkloading for performance »  Make ingest process idempotent »  Introduce a read-log for utility company billing corrections »  ETL Steps: 1. Ingest all reads into a read-log table2 2. Load reads into the corresponding Site row Read-log table M/R Bulkload Pivot Site table21 M/R Bulkload Billing files
  • 29. In practice: bulk loading 2926 July 2013 »  Bulk loaded files are not assigned sequence numbers »  All compactions become major compactions »  Solution: Find a temporary fix, monitor the HBase JIRA
  • 30. In practice: Mixed workloads 3026 July 2013 Site table Reporting apps Web servers M/R Ad-hoc reads and forecasts Batch insight calculations Bulk scans
  • 31. In practice: Mixed workloads 3126 July 2013 »  Supporting mixed workloads requires adapting jobs and configurations »  IO: Switch to bulkloading, enable direct HDFS reads »  Major compactions: Disabled »  Memory: increase heap and region sizes, use MSLAB »  Verify performance by simulating nominal and high load scenarios
  • 32. In practice: Mixed workloads 3226 July 2013
  • 33. Results Visualized 3326 July 2013 Animation of jobs in progress
  • 34. Mixed Workload Success 3426 July 2013 9ms 2ms »  Mean read time is ~2ms »  Nearly 200 forecasts/sec on performance testing cluster
  • 36. Recap 3626 July 2013 Opower »  Save energy »  Make money »  Big (enough) data Oren Benjamin oren.benjamin@opower.com We’re hiring. http://opower.com/careers Scott Kuehn scott.kuehn@opower.com
  • 37. Rate Calculation: Rate Engine 3726 July 2013 public interface RateEngine { /**      *  Compute  the  cost  per  usage  read  for  the  given  Site      *  over  the  requested  time  interval.      *  @return  a  RateCalculation  containing  the  result    */   RateCalculation calculate(Site site, List<UsageRead> usageReads); }
  • 38. Rate Calculation: Application Context 3826 July 2013 public class RateCalculationProducer extends KijiProducer {  private  ConfigurableApplicationContext  appContext;    private  RateEngine  rateEngine;    @Override    public  void  setup(KijiContext  context)  {            String  contextPath  =  getConf().get(CONTEXT_PATH_KEY);            appContext  =  new  XmlAppContext(contextPath);            rateEngine  =  appContext.getBean(RateEngine.class);     @Override   public  void  produce(KijiRowData  siteRowData,  …