SlideShare a Scribd company logo
1 of 34
Download to read offline
Implementing and Visualizing Click-
  Stream Data with MongoDB	

                      	

Jan 22, 2013 - New York MongoDB User Group	

                        	

            Cameron Sim - LearnVest.com
Agenda	

•  About LearnVest	

•  HL Application Architecture	

•  Data Capture	

•  Event Packaging	

•  MongoDB Data Warehousing	

•  Loading & Visualization	

•  Finishing up
LearnVest Inc.
                            www.learnvest.com	

                             Mission Statement	

    Aiming to making Financial Planning as accessible as having a gym membership	

                                          	

                                          	

           Company	

                                          Key Products	

nded in 2008 by Alexa Von Tobel, CEO	

            Account Aggregation and Managem
                	

                              (Bank, Credit, Loan, Investment, Mort
 50+ People and Growing rapidly	

                                     	

          Based in NYC	

                       Original and Syndicated Newsletter Co
                	

                                                    	

           Platforms	

                                       Financial Planning	

         Web  iPhone	

                                  (tiered product offering)	

                	

                                                    	


                                        Stack	

                                                             Analytics	

        Operational	

                             MongoDB 2.2.0 (3-node replica-set
Wordpress, Backbone.js, Node.js	

                         Java 6, Spring 3	

ava Spring 3, Redis, Memcached,
LearnVest.com	

      Web
LearnVest.com	

     IPhone
High Level Architecture	

      Production	

                            Analytics	

               	

                                  	

elivery               Services	

   Services              Loaders  Dashbo




  HTTPS	

  pyMongo
ure Everything	

                            Collection	

-Driven events over web and mobile	

 m-level exceptions	

ything else	


porary Data	

ok’ with approximate data	

rational Databases are the system of record	


egate events as they come in	

ove the overhead of basic metrics (counts, sums) on core events	

p by user unique id and increment counts per event, over time-dimensions
eek-ending, month, year)
Data Capture	

OS	


 (void) sendAnalyticEventType:(NSString*)eventType
                       object:(NSString*)object
                         name:(NSString*)name
                         page:(NSString*)page
                       source:(NSString*)source;

    NSMutableDictionary *eventData = [NSMutableDictionary dictionary];

    if   (eventType!=nil) [params setObject:eventType forKey:@eventType];
    if   (object!=nil) [eventData setObject:object forKey:@object];
    if   (name!=nil) [eventData setObject:name forKey:@name];
    if   (page!=nil) [eventData setObject:page forKey:@page];
    if   (source!=nil) [eventData setObject:source forKey:@source];
    if   (eventData!=nil) [params setObject:eventData forKey:@eventData];

    [[LVNetworkEngine sharedManager] analytics_send:params];
Data Capture	

WEB (JavaScript)	


unction internalTrackPageView() {
  var cookie = {
            userContext: jQuery.cookie('UserContextCookie'),
      };
  var trackEvent = {
            eventType: pageView,
            eventData: {
                   page: window.location.pathname + window.location.search
            }
      };
      // AJAX
      jQuery.ajax({
             url: /api/track,
             type: POST,
             dataType: json,
             data: JSON.stringify(trackEvent),
             // Set Request Headers
             beforeSend: function (xhr, settings) {
                    xhr.setRequestHeader('Accept', 'application/json');
                    xhr.setRequestHeader('User-Context', cookie.userContext)
                    if(settings.type === 'PUT' || settings.type === 'POST')
                           xhr.setRequestHeader('Content-Type', 'application/js
                    }
             }
      });
Bus Event Packaging	

ng 3 RESTful service layer, controller methods define the eventCode via @tracki
otation	


tom Intercepter class extends HandlerInterceptorAdapter and implements
 Handle() (for each event) to invoke calls via Spring @async to an EventPublisher	


ntPublisher publishes to common event bus queue with multiple subscribers, one o
kages the eventPayload MapString, Object object and forwards to Analytics Rest
Bus Event Packaging	

ing RestController Methods	

ace	


estMapping(value = /user/login, method = RequestMethod.POST,
rs=Accept=application/json)
c MapString, Object userLogin(@RequestBody MapString, Object event,
ervletRequest request);

ete/Impl Class	

ride
king(user.login)
c MapString, Object userLogin(@RequestBody MapString, Object event,
ervletRequest request){

/Implementation

eturn event;
Bus Event Packaging	

stom Intercepter class extends HandlerInterceptorAdapter 	


cted void handleTracking(String trackingCode, MapString, Object modelMap
ervletRequest request) {


MapString, Object responseModel = new HashMapString, Object();

 // remove non-serializables  copy over data from modelMap

 try {
        this.eventPublisher.publish(trackingCode, responseModel, request);
 } catch (Exception e) {
        log.error(Error tracking event ' + trackingCode + ' : 
                     + ExceptionUtils.getStackTrace(e));
 }
Bus Event Packaging	

stom Intercepter class extends HandlerInterceptorAdapter 	

c void publish (String eventCode, MapString,Object eventData,
                                                HttpServletRequest request

MapString,Object payload = new HashMapString,Object();
String eventId=UUID.randomUUID().toString();
MapString, String requestMap = HttpRequestUtils.getRequestHeaders(reques

//Normalize message
payload.put(eventType, eventData.get(eventType));
payload.put(eventData, eventData.get(eventType));
payload.put(version, eventData.get(eventType));
payload.put(eventId, eventId);
payload.put(eventTime, new Date());
payload.put(request, requestMap);
.
.
.
//Send to the Analytics Service for MongoDB persistence




c void sendPost(EventPayload payload){
   HttpEntity request = new HttpEntity(payload.getEventPayload(), headers)
Map m = restTemplate.postForObject(endpoint, request, java.util.Map.class)
Bus Event Packaging	

erialized Json (User Action)	


tCode”   :   “user.login”,
tType”   :   “login”,
ion”     :   “1.0”,
tTime”   :   “1358603157746”,
tData”   :   {
                  “” : “”,
                  “” : “”,
                  “” : “”
             },
est” : {
             “call-source” : “WEB”,
             “user-context” : “00002b4f1150249206ac2b692e48ddb3”,
             “user.agent”   : “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2)
                                AppleWebKit/537.11 (KHTML, like Gecko) Chrome/
                                23.0.1271.101 Safari/537.11”,
             “cookie”       : “size=4; CP.mode=B; PHPSESSID=c087908516
                                ee2fae50cef6500101dc89; resolution=1920;
                                JSESSIONID=56EB165266A2C4AFF9
                                46F139669D746F; csrftoken=73bdcd
                                ddf151dc56b8020855b2cb10c8, content-length :
                                204, accept-encoding : gzip,deflate,sdch”,

         }
Bus Event Packaging	

erialized Json (Generic Event)	


tCode”   :   “generic.ui”,
tType”   :   “pageView”,
ion”     :   “1.0”,
tTime”   :   “1358603157746”,
tData”   :   {
                  “page”    : “/learnvest/moneycenter/inbox”,
                  “section” : “transactions”,
                  “name”    : “view transactions”
                  “object” : “page”
             },
est” : {
             “call-source” : “WEB”,
             “user-context” : “00002b4f1150249206ac2b692e48ddb3”,
             “user.agent”   : “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2)
                                AppleWebKit/537.11 (KHTML, like Gecko) Chrome/
                                23.0.1271.101 Safari/537.11”,
             “cookie”       : “size=4; CP.mode=B; PHPSESSID=c087908516
                                ee2fae50cef6500101dc89; resolution=1920;
                                JSESSIONID=56EB165266A2C4AFF9
                                46F139669D746F; csrftoken=73bdcd
                                ddf151dc56b8020855b2cb10c8, content-length :
                                204, accept-encoding : gzip,deflate,sdch”,

         }
MongoDB Data Warehousing	

goDB Information	

 0	

 de replica-set	

rge (primary), 2x Medium (secondary) AWS Amazon-Linux machines	

  with single 500GB EBS volumes mounted to /opt/data	


goDB Config File	

  = /opt/data/mongodb/datarest = truereplSet = voyager	

mes	

vents daily on web, ~600K on mobile	

B per day at start, slowed to ~1GB per day	

ntly at 78GB (collecting since August 2012)	


re Scaling Strategy	

p 2nd Replica-Set	

d replica-sets to n at 60% / 250GB per EBS volume	

d key probably based on sequential mix of email_address  additional string
MongoDB Data Warehousing	

OBILE	


 ist all events, bucketed by source, event code and time:-	

EB/MOBILE	

er.login	

 e (day, week-ending, month, year)	


ert into collection e_web / e_mobile	


sert into:- 	

web_user_login_day	

web_user_login_week	

web_user_login_month	

web_user_login_year	


 dictable model for scaling and measuring business growth
MongoDB Data Warehousing	

DBObject newDocument = new BasicDBObject().append($inc
                     new BasicDBObject().append(count, 1));

ate day dimension
ction_day.update(new BasicDBObject().append(user-context, userContext)
               .append(eventType, eventType)
               .append(date, sdf_day.format(d)),newDocument, true, false

ate week dimension
ction_week.update(new BasicDBObject().append(user-context, userContext)
               .append(eventType, eventType)
               .append(date, sdf_day.format(w)), newDocument, true, fals

ate month dimension
ction_month.update(new BasicDBObject().append(user-context, userContext)
               .append(eventType, eventType)
               .append(date, sdf_month.format(d)), newDocument, true, fa

ate month dimension
ction_year.update(new BasicDBObject().append(user-context, userContext)
               .append(eventType, eventType)
               .append(date, sdf_year.format(d)), newDocument, true, fal
MongoDB Data Warehousing	

ount_addManual_weeke_web_account_addManual_year
_user_login_day
_user_login_week
_user_login_month
_user_login_yeare_mobile_generic_ui_daye_mobile_generic_ui_monthe_mobile_g
weeke_mobile_generic_ui_year

e_web_user_login_day.find()
d : ObjectId(50e4b9871b36921910222c42), count   : 5, date : 01/02,
-context : c4ca4238a0b923820dcc509a6f75849b }
d : ObjectId(50cd6cfcb9a80a2b4ee21422), count   : 7, date : 01/02,
-context : c4ca4238a0b923820dcc509a6f75849b }
d : ObjectId(50cd6e51b9a80a2b4ee21427), count   : 2, date : 01/02,
-context : c4ca4238a0b923820dcc509a6f75849b }
d : ObjectId(50e4b9871b36921910222c42), count   : 3, date : 01/03,
-context : 50e49a561b36921910222c33 }
MongoDB Data Warehousing	

1, accept-charset : ISO-8859-1,utf-8;q=0.7,*;q=0.3, cookie : size=
de=B; PHPSESSID=c087908516ee2fae50cef6500101dc89; resolution=1920;
IONID=56EB165266A2C4AFF946F139669D746F;
oken=73bdcdddf151dc56b8020855b2cb10c8, content-length : 255, accept-
ing : gzip,deflate,sdch }, eventType : flick, eventData : { obje
on, name : split transaction button, page : #inbox/79876/, secti
saction_river_details } }
MongoDB Data Warehousing	

xing Strategy	


xes on core collections (e_web and e_mobile) come in under 3GB on 7.5GB Large
ce and 3.75GB on Medium instances	


 datetime in two fields and compound index on date with other fields like eventTyp
unique id (user-context)	


vy insertion rates, much lower read rates....so less indexes the better
MongoDB Data Warehousing	

ing Strategy
e_web.getIndexes()[
        v : 1,            key : {                  request.user-contex
               created_date : 1        },            ns :
ycenter.e_web,             name : request.user-context_1_created_date_

        v : 1,            key : {                  eventData.name : 1
     created_date : 1            },           ns : moneycenter.e_web
 name : eventData.name_1_created_date_1     }]
jective	

Loading  Visualization	

 how historic and intraday stats on core use cases (logins, conversions)	

 how user funnel rates on conversion pages	

 how general usability - how do users really use the Web and IOS platforms?	


on-Functionals	

 traday doesn’t need to be “real-time”, polling is good enough for now	

Overnight batch job for historic must scale horizontally	


 neral Implementation Strategy	

 o all heavy lifting  object manipulation, UI should just display graph or table	

Modularize the service to be able to regenerate any graphs/tables without a full load
Loading  Visualization	

va Batch Service	


a Mongo library to query key collections and return user counts and sum of events

ursor webUserLogins = c.find(
   new BasicDBObject(date, sdf.format(new Date())));

vate HashMapString, Object getSumAndCount(DBCursor cursor){
          HashMapString, Object m = new HashMapString, Object();

           int sum=0;
           int count=0;
           DBObject obj;
           while(cursor.hasNext()){
                  obj=(DBObject)cursor.next();
                  count++;
                  sum=sum+(Integer)obj.get(count);
           }

           m.put(sum, sum);
           m.put(count, count);
           m.put(average, sdf.format(new Float(sum)/count));

           return m;
Loading  Visualization	

va Batch Service	


e Aggregation Framework where required on core collections (e_web) and externa
reate aggregation objects
bject project = new BasicDBObject($project,
 new BasicDBObject(day_value, fields) );
bject day_value = new BasicDBObject( day_value, $day_value);
bject groupFields = new BasicDBObject( _id, day_value);

reate the fields to group by, in this case “number”
upFields.put(number, new BasicDBObject( $sum, 1));

reate the group
bject group = new BasicDBObject($group, groupFields);

xecute
regationOutput output = mycollection.aggregate( project, group );

(DBObject obj : output.results()){
Loading  Visualization	


va Batch Service	


ngoDB Command Line example on aggregation over a time period, e.g. month
b.e_web.aggregate( [      { $match : { created_date : { $gt :
Date(2012-10-25T00:00:00)}}},     { $project : {        day_value : {day
dayOfMonth : $created_date },                          month:{ $month :
reated_date }} }},     { $group : {         _id : {day_value:$day_value}
    number : { $sum : 1 }      } },   { $sort : { day_value : -1 } } ])
Loading  Visualization	

va Batch Service	


sisting events into graph and table collections	


.homeGraphs.find()

_id : ObjectId(50f57b5c1d4e714b581674e2), accounts_natural : 54,
counts_total : 54, date : ISODate(2011-02-06T05:00:00Z), linked_rate
.96, premium_rate : 0, str_date : 2011,01,06, upgrade_rate : 0
ers_avg_linked : 3.43, users_linked : 7 }
_id : ObjectId(50f57b5c1d4e714b581674e3), accounts_natural : 144,
counts_total : 144, date : ISODate(2011-02-07T05:00:00Z), linked_rat
.11, premium_rate : 0, str_date : 2011,01,07, upgrade_rate : 0
ers_avg_linked : 4, users_linked : 16 }
_id : ObjectId(50f57b5c1d4e714b581674e4), accounts_natural : 119,
counts_total : 119, date : ISODate(2011-02-08T05:00:00Z), linked_rat
.13, premium_rate : 0, str_date : 2011,01,08, upgrade_rate : 0
ers_avg_linked : 4.5, users_linked : 18 }
17)
           Loading  Visualization	

day numbers    try:        conn = pymongo.Connection('localhost',
           db = conn['lvanalytics']
accountmetrics.find(
                                           cursor =

           {date : {$gte : dt_from, $lte : dt_to}}).sort(date)
urn buildMetricsDict(cursor)    except Exception as e:
ger.error(e.message)


urn the graph object (as a list or a dict of lists) to the view that called the
thod	

edata={}
edata['accountsGraph']=mongodb_home.getHomeChart()

urn render_to_response('home.html',{'pagedata': pagedata},
text_instance=RequestContext(request))




.homeGraphs.find()

_id : ObjectId(50f57b5c1d4e714b581674e2), accounts_natural : 54,
Loading  Visualization	


ango and HighCharts

pulate the series.. (JavaScript with Django templating)	

iesOptions[0] = {
id: 'naturalAccounts',    name: Natural Accounts,    data: [     {% for
n pagedata.metrics.accounts_natural %}          {% if not forloop.first
 {% endif %}               [Date.UTC({{a.0}}),{{a.1}}]         {% endfor
  ],   tooltip: {      valueDecimals: 2   }   };
Loading  Visualization	

ango and HighCharts

d Create the Charts and Tables...
Loading  Visualization	

ango and HighCharts

d Create the Charts and Tables...
Lessons Learned	

• Date Time managed as two fields, Datetime and Date	

• Aggregating and upserting documents as events are received works for us	

•  Real-time Map-Reduce in pyMongo - too slow, don’t do this.	

	

• Django-noRel - Unstable, use Django and configure MongoDB as a
      datastore only	


• Memcached on Django is good enough (at the moment) - use django-celery
      with rabbitmq to pre-cache all data after data loading	


•  HighCharts is buggy - considering D3  other libraries	

• Don’t need to retrieve data directly from MongoDB to Django, perhaps
      provide all data via a service layer (at the expense of ever-additional
      features in pyMongo)
Next Steps	

• A/B testing framework, experiments and variances	

•  Unauthenticated / Authenticated user tracking	

•  Provide data async over service layer	

• Segmentation with graphical libraries like D3  Cross-Filter (
http://square.github.com/crossfilter/)	


• Saving Query Criteria, expanding out BI tools for internal users	

• MongoDB Connector, Hadoop and Hive (maybe Tableau and other tools)	

• Storm / Kafka for real-time analytics processing	

• Shard the Replica-Set, looking into Gizzard as the middleware
Hrishi Dixit	

  Chief Technology Officer	

                                                       
                                             Kevin Connelly	

                                         Director of Engineering	

                 Will Larche	

                                          kevin@learnvest.com	

   hrishi@learnvest.com	

                                  	

                                  	

                                                                     	

                                                                                Lead IOS Developer	

                                                                                will@learnvest.com	


                                  	

                                  	

                                  	

                                                  	

                   	

                                                                        	

                                                                        	

                                  	

                                   	

                                  	

                                   	

                                                    	

                 	

                                                    	

                 	

              	

                                             Cameron Sim	

                             	

       Jeremy Brennan	

                                        Director of Analytics Tech	

           your name here	

Director of UI/UX Technology	

                                        cameron@learnvest.com	

              New Awesome Develope
   jeremy@learnvest.com	

                                  	

                                           you@learnvest.com	

              	

                                  	

             	

                                             	

                        	

                                                                               HIR

More Related Content

What's hot

Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...Amazon Web Services
 
Three Pillars, Zero Answers: Rethinking Observability
Three Pillars, Zero Answers: Rethinking ObservabilityThree Pillars, Zero Answers: Rethinking Observability
Three Pillars, Zero Answers: Rethinking ObservabilityDevOps.com
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure DataTaro L. Saito
 
Web Assembly Big Picture
Web Assembly Big PictureWeb Assembly Big Picture
Web Assembly Big PictureYousif Shalaby
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldJignesh Shah
 
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)Altinity Ltd
 
실전 서버 부하테스트 노하우
실전 서버 부하테스트 노하우 실전 서버 부하테스트 노하우
실전 서버 부하테스트 노하우 YoungSu Son
 
Cloud DW technology trends and considerations for enterprises to apply snowflake
Cloud DW technology trends and considerations for enterprises to apply snowflakeCloud DW technology trends and considerations for enterprises to apply snowflake
Cloud DW technology trends and considerations for enterprises to apply snowflakeSANG WON PARK
 
메모리 할당에 관한 기초
메모리 할당에 관한 기초메모리 할당에 관한 기초
메모리 할당에 관한 기초Changyol BAEK
 
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유Hyojun Jeon
 
Event source 학습 내용 공유
Event source 학습 내용 공유Event source 학습 내용 공유
Event source 학습 내용 공유beom kyun choi
 
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019confluent
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For ArchitectsKevin Brockhoff
 
게임사를 위한 Amazon GameLift 세션 - 이정훈, AWS 솔루션즈 아키텍트
게임사를 위한 Amazon GameLift 세션 - 이정훈, AWS 솔루션즈 아키텍트게임사를 위한 Amazon GameLift 세션 - 이정훈, AWS 솔루션즈 아키텍트
게임사를 위한 Amazon GameLift 세션 - 이정훈, AWS 솔루션즈 아키텍트Amazon Web Services Korea
 
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...ScaleGrid.io
 
Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용흥배 최
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersDataWorks Summit
 
Dapr: distributed application runtime
Dapr: distributed application runtimeDapr: distributed application runtime
Dapr: distributed application runtimeMoaid Hathot
 
OpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleOpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleMihai Criveti
 

What's hot (20)

Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...
 
Three Pillars, Zero Answers: Rethinking Observability
Three Pillars, Zero Answers: Rethinking ObservabilityThree Pillars, Zero Answers: Rethinking Observability
Three Pillars, Zero Answers: Rethinking Observability
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Web Assembly Big Picture
Web Assembly Big PictureWeb Assembly Big Picture
Web Assembly Big Picture
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
 
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
 
실전 서버 부하테스트 노하우
실전 서버 부하테스트 노하우 실전 서버 부하테스트 노하우
실전 서버 부하테스트 노하우
 
Cloud DW technology trends and considerations for enterprises to apply snowflake
Cloud DW technology trends and considerations for enterprises to apply snowflakeCloud DW technology trends and considerations for enterprises to apply snowflake
Cloud DW technology trends and considerations for enterprises to apply snowflake
 
메모리 할당에 관한 기초
메모리 할당에 관한 기초메모리 할당에 관한 기초
메모리 할당에 관한 기초
 
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
 
Event source 학습 내용 공유
Event source 학습 내용 공유Event source 학습 내용 공유
Event source 학습 내용 공유
 
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
게임사를 위한 Amazon GameLift 세션 - 이정훈, AWS 솔루션즈 아키텍트
게임사를 위한 Amazon GameLift 세션 - 이정훈, AWS 솔루션즈 아키텍트게임사를 위한 Amazon GameLift 세션 - 이정훈, AWS 솔루션즈 아키텍트
게임사를 위한 Amazon GameLift 세션 - 이정훈, AWS 솔루션즈 아키텍트
 
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
 
Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
Dapr: distributed application runtime
Dapr: distributed application runtimeDapr: distributed application runtime
Dapr: distributed application runtime
 
OpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleOpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image Lifecycle
 

Viewers also liked

MongoDB ClickStream and Visualization
MongoDB ClickStream and VisualizationMongoDB ClickStream and Visualization
MongoDB ClickStream and VisualizationCameron Sim
 
Clickstream Data Warehouse - Turning clicks into customers
Clickstream Data Warehouse - Turning clicks into customersClickstream Data Warehouse - Turning clicks into customers
Clickstream Data Warehouse - Turning clicks into customersAlbert Hui
 
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald NowlingInsights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald NowlingSpark Summit
 
Web log & clickstream
Web log & clickstream Web log & clickstream
Web log & clickstream Michel Bruley
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Cloudera, Inc.
 

Viewers also liked (6)

MongoDB ClickStream and Visualization
MongoDB ClickStream and VisualizationMongoDB ClickStream and Visualization
MongoDB ClickStream and Visualization
 
Clickstream Data Warehouse - Turning clicks into customers
Clickstream Data Warehouse - Turning clicks into customersClickstream Data Warehouse - Turning clicks into customers
Clickstream Data Warehouse - Turning clicks into customers
 
Clickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache SparkClickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache Spark
 
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald NowlingInsights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
 
Web log & clickstream
Web log & clickstream Web log & clickstream
Web log & clickstream
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360
 

Similar to Implementing and Visualizing Clickstream data with MongoDB

Open analytics | Cameron Sim
Open analytics | Cameron SimOpen analytics | Cameron Sim
Open analytics | Cameron SimOpen Analytics
 
Developing your first application using FIWARE
Developing your first application using FIWAREDeveloping your first application using FIWARE
Developing your first application using FIWAREFIWARE
 
Firefox OS: HTML5 sur les stéroïdes - HTML5mtl - 2014-04-22
Firefox OS: HTML5 sur les stéroïdes - HTML5mtl - 2014-04-22Firefox OS: HTML5 sur les stéroïdes - HTML5mtl - 2014-04-22
Firefox OS: HTML5 sur les stéroïdes - HTML5mtl - 2014-04-22Frédéric Harper
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics PlatformSrinath Perera
 
Taking Web Apps Offline
Taking Web Apps OfflineTaking Web Apps Offline
Taking Web Apps OfflinePedro Morais
 
Developing your first application using FI-WARE
Developing your first application using FI-WAREDeveloping your first application using FI-WARE
Developing your first application using FI-WAREFermin Galan
 
Engage 2013 - Multi Channel Data Collection
Engage 2013 - Multi Channel Data CollectionEngage 2013 - Multi Channel Data Collection
Engage 2013 - Multi Channel Data CollectionWebtrends
 
HTML for the Mobile Web, Firefox OS - All Things Open - 2014-10-22
HTML for the Mobile Web, Firefox OS - All Things Open - 2014-10-22HTML for the Mobile Web, Firefox OS - All Things Open - 2014-10-22
HTML for the Mobile Web, Firefox OS - All Things Open - 2014-10-22Frédéric Harper
 
NoSQL meets Microservices - Michael Hackstein
NoSQL meets Microservices -  Michael HacksteinNoSQL meets Microservices -  Michael Hackstein
NoSQL meets Microservices - Michael Hacksteindistributed matters
 
Firefox OS, une plateforme à découvrir - IO Saglac - 2014-09-09
Firefox OS, une plateforme à découvrir - IO Saglac - 2014-09-09Firefox OS, une plateforme à découvrir - IO Saglac - 2014-09-09
Firefox OS, une plateforme à découvrir - IO Saglac - 2014-09-09Frédéric Harper
 
Firefox OS, HTML5 to the next level - Python Montreal - 2014-05-12
Firefox OS, HTML5 to the next level - Python Montreal - 2014-05-12Firefox OS, HTML5 to the next level - Python Montreal - 2014-05-12
Firefox OS, HTML5 to the next level - Python Montreal - 2014-05-12Frédéric Harper
 
Evolving your Data Access with MongoDB Stitch
Evolving your Data Access with MongoDB StitchEvolving your Data Access with MongoDB Stitch
Evolving your Data Access with MongoDB StitchMongoDB
 
WSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
WSO2Con EU 2016: An Introduction to the WSO2 Analytics PlatformWSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
WSO2Con EU 2016: An Introduction to the WSO2 Analytics PlatformWSO2
 
[Serverless Meetup Tokyo #3] Serverless in Azure (Azure Functionsのアップデート、事例、デ...
[Serverless Meetup Tokyo #3] Serverless in Azure (Azure Functionsのアップデート、事例、デ...[Serverless Meetup Tokyo #3] Serverless in Azure (Azure Functionsのアップデート、事例、デ...
[Serverless Meetup Tokyo #3] Serverless in Azure (Azure Functionsのアップデート、事例、デ...Naoki (Neo) SATO
 
HTML5 on Mobile
HTML5 on MobileHTML5 on Mobile
HTML5 on MobileAdam Lu
 
Practical AngularJS
Practical AngularJSPractical AngularJS
Practical AngularJSWei Ru
 
Webinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaWebinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaMongoDB
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.GeeksLab Odessa
 

Similar to Implementing and Visualizing Clickstream data with MongoDB (20)

Open analytics | Cameron Sim
Open analytics | Cameron SimOpen analytics | Cameron Sim
Open analytics | Cameron Sim
 
Developing your first application using FIWARE
Developing your first application using FIWAREDeveloping your first application using FIWARE
Developing your first application using FIWARE
 
Siddhi - cloud-native stream processor
Siddhi - cloud-native stream processorSiddhi - cloud-native stream processor
Siddhi - cloud-native stream processor
 
Firefox OS: HTML5 sur les stéroïdes - HTML5mtl - 2014-04-22
Firefox OS: HTML5 sur les stéroïdes - HTML5mtl - 2014-04-22Firefox OS: HTML5 sur les stéroïdes - HTML5mtl - 2014-04-22
Firefox OS: HTML5 sur les stéroïdes - HTML5mtl - 2014-04-22
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics Platform
 
Taking Web Apps Offline
Taking Web Apps OfflineTaking Web Apps Offline
Taking Web Apps Offline
 
Developing your first application using FI-WARE
Developing your first application using FI-WAREDeveloping your first application using FI-WARE
Developing your first application using FI-WARE
 
Engage 2013 - Multi Channel Data Collection
Engage 2013 - Multi Channel Data CollectionEngage 2013 - Multi Channel Data Collection
Engage 2013 - Multi Channel Data Collection
 
HTML for the Mobile Web, Firefox OS - All Things Open - 2014-10-22
HTML for the Mobile Web, Firefox OS - All Things Open - 2014-10-22HTML for the Mobile Web, Firefox OS - All Things Open - 2014-10-22
HTML for the Mobile Web, Firefox OS - All Things Open - 2014-10-22
 
NoSQL meets Microservices - Michael Hackstein
NoSQL meets Microservices -  Michael HacksteinNoSQL meets Microservices -  Michael Hackstein
NoSQL meets Microservices - Michael Hackstein
 
Firefox OS, une plateforme à découvrir - IO Saglac - 2014-09-09
Firefox OS, une plateforme à découvrir - IO Saglac - 2014-09-09Firefox OS, une plateforme à découvrir - IO Saglac - 2014-09-09
Firefox OS, une plateforme à découvrir - IO Saglac - 2014-09-09
 
Firefox OS, HTML5 to the next level - Python Montreal - 2014-05-12
Firefox OS, HTML5 to the next level - Python Montreal - 2014-05-12Firefox OS, HTML5 to the next level - Python Montreal - 2014-05-12
Firefox OS, HTML5 to the next level - Python Montreal - 2014-05-12
 
Evolving your Data Access with MongoDB Stitch
Evolving your Data Access with MongoDB StitchEvolving your Data Access with MongoDB Stitch
Evolving your Data Access with MongoDB Stitch
 
WSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
WSO2Con EU 2016: An Introduction to the WSO2 Analytics PlatformWSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
WSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
 
[Serverless Meetup Tokyo #3] Serverless in Azure (Azure Functionsのアップデート、事例、デ...
[Serverless Meetup Tokyo #3] Serverless in Azure (Azure Functionsのアップデート、事例、デ...[Serverless Meetup Tokyo #3] Serverless in Azure (Azure Functionsのアップデート、事例、デ...
[Serverless Meetup Tokyo #3] Serverless in Azure (Azure Functionsのアップデート、事例、デ...
 
HTML5 on Mobile
HTML5 on MobileHTML5 on Mobile
HTML5 on Mobile
 
Practical AngularJS
Practical AngularJSPractical AngularJS
Practical AngularJS
 
Webinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaWebinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and Java
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
NoSQL meets Microservices
NoSQL meets MicroservicesNoSQL meets Microservices
NoSQL meets Microservices
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdfJamie (Taka) Wang
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 

Recently uploaded (20)

Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 

Implementing and Visualizing Clickstream data with MongoDB

  • 1. Implementing and Visualizing Click- Stream Data with MongoDB Jan 22, 2013 - New York MongoDB User Group Cameron Sim - LearnVest.com
  • 2. Agenda •  About LearnVest •  HL Application Architecture •  Data Capture •  Event Packaging •  MongoDB Data Warehousing •  Loading & Visualization •  Finishing up
  • 3. LearnVest Inc. www.learnvest.com Mission Statement Aiming to making Financial Planning as accessible as having a gym membership Company Key Products nded in 2008 by Alexa Von Tobel, CEO Account Aggregation and Managem (Bank, Credit, Loan, Investment, Mort 50+ People and Growing rapidly Based in NYC Original and Syndicated Newsletter Co Platforms Financial Planning Web iPhone (tiered product offering) Stack Analytics Operational MongoDB 2.2.0 (3-node replica-set Wordpress, Backbone.js, Node.js Java 6, Spring 3 ava Spring 3, Redis, Memcached,
  • 5. LearnVest.com IPhone
  • 6. High Level Architecture Production Analytics elivery Services Services Loaders Dashbo HTTPS pyMongo
  • 7. ure Everything Collection -Driven events over web and mobile m-level exceptions ything else porary Data ok’ with approximate data rational Databases are the system of record egate events as they come in ove the overhead of basic metrics (counts, sums) on core events p by user unique id and increment counts per event, over time-dimensions eek-ending, month, year)
  • 8. Data Capture OS (void) sendAnalyticEventType:(NSString*)eventType object:(NSString*)object name:(NSString*)name page:(NSString*)page source:(NSString*)source; NSMutableDictionary *eventData = [NSMutableDictionary dictionary]; if (eventType!=nil) [params setObject:eventType forKey:@eventType]; if (object!=nil) [eventData setObject:object forKey:@object]; if (name!=nil) [eventData setObject:name forKey:@name]; if (page!=nil) [eventData setObject:page forKey:@page]; if (source!=nil) [eventData setObject:source forKey:@source]; if (eventData!=nil) [params setObject:eventData forKey:@eventData]; [[LVNetworkEngine sharedManager] analytics_send:params];
  • 9. Data Capture WEB (JavaScript) unction internalTrackPageView() { var cookie = { userContext: jQuery.cookie('UserContextCookie'), }; var trackEvent = { eventType: pageView, eventData: { page: window.location.pathname + window.location.search } }; // AJAX jQuery.ajax({ url: /api/track, type: POST, dataType: json, data: JSON.stringify(trackEvent), // Set Request Headers beforeSend: function (xhr, settings) { xhr.setRequestHeader('Accept', 'application/json'); xhr.setRequestHeader('User-Context', cookie.userContext) if(settings.type === 'PUT' || settings.type === 'POST') xhr.setRequestHeader('Content-Type', 'application/js } } });
  • 10. Bus Event Packaging ng 3 RESTful service layer, controller methods define the eventCode via @tracki otation tom Intercepter class extends HandlerInterceptorAdapter and implements Handle() (for each event) to invoke calls via Spring @async to an EventPublisher ntPublisher publishes to common event bus queue with multiple subscribers, one o kages the eventPayload MapString, Object object and forwards to Analytics Rest
  • 11. Bus Event Packaging ing RestController Methods ace estMapping(value = /user/login, method = RequestMethod.POST, rs=Accept=application/json) c MapString, Object userLogin(@RequestBody MapString, Object event, ervletRequest request); ete/Impl Class ride king(user.login) c MapString, Object userLogin(@RequestBody MapString, Object event, ervletRequest request){ /Implementation eturn event;
  • 12. Bus Event Packaging stom Intercepter class extends HandlerInterceptorAdapter cted void handleTracking(String trackingCode, MapString, Object modelMap ervletRequest request) { MapString, Object responseModel = new HashMapString, Object(); // remove non-serializables copy over data from modelMap try { this.eventPublisher.publish(trackingCode, responseModel, request); } catch (Exception e) { log.error(Error tracking event ' + trackingCode + ' : + ExceptionUtils.getStackTrace(e)); }
  • 13. Bus Event Packaging stom Intercepter class extends HandlerInterceptorAdapter c void publish (String eventCode, MapString,Object eventData, HttpServletRequest request MapString,Object payload = new HashMapString,Object(); String eventId=UUID.randomUUID().toString(); MapString, String requestMap = HttpRequestUtils.getRequestHeaders(reques //Normalize message payload.put(eventType, eventData.get(eventType)); payload.put(eventData, eventData.get(eventType)); payload.put(version, eventData.get(eventType)); payload.put(eventId, eventId); payload.put(eventTime, new Date()); payload.put(request, requestMap); . . . //Send to the Analytics Service for MongoDB persistence c void sendPost(EventPayload payload){ HttpEntity request = new HttpEntity(payload.getEventPayload(), headers) Map m = restTemplate.postForObject(endpoint, request, java.util.Map.class)
  • 14. Bus Event Packaging erialized Json (User Action) tCode” : “user.login”, tType” : “login”, ion” : “1.0”, tTime” : “1358603157746”, tData” : { “” : “”, “” : “”, “” : “” }, est” : { “call-source” : “WEB”, “user-context” : “00002b4f1150249206ac2b692e48ddb3”, “user.agent” : “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/ 23.0.1271.101 Safari/537.11”, “cookie” : “size=4; CP.mode=B; PHPSESSID=c087908516 ee2fae50cef6500101dc89; resolution=1920; JSESSIONID=56EB165266A2C4AFF9 46F139669D746F; csrftoken=73bdcd ddf151dc56b8020855b2cb10c8, content-length : 204, accept-encoding : gzip,deflate,sdch”, }
  • 15. Bus Event Packaging erialized Json (Generic Event) tCode” : “generic.ui”, tType” : “pageView”, ion” : “1.0”, tTime” : “1358603157746”, tData” : { “page” : “/learnvest/moneycenter/inbox”, “section” : “transactions”, “name” : “view transactions” “object” : “page” }, est” : { “call-source” : “WEB”, “user-context” : “00002b4f1150249206ac2b692e48ddb3”, “user.agent” : “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/ 23.0.1271.101 Safari/537.11”, “cookie” : “size=4; CP.mode=B; PHPSESSID=c087908516 ee2fae50cef6500101dc89; resolution=1920; JSESSIONID=56EB165266A2C4AFF9 46F139669D746F; csrftoken=73bdcd ddf151dc56b8020855b2cb10c8, content-length : 204, accept-encoding : gzip,deflate,sdch”, }
  • 16. MongoDB Data Warehousing goDB Information 0 de replica-set rge (primary), 2x Medium (secondary) AWS Amazon-Linux machines with single 500GB EBS volumes mounted to /opt/data goDB Config File = /opt/data/mongodb/datarest = truereplSet = voyager mes vents daily on web, ~600K on mobile B per day at start, slowed to ~1GB per day ntly at 78GB (collecting since August 2012) re Scaling Strategy p 2nd Replica-Set d replica-sets to n at 60% / 250GB per EBS volume d key probably based on sequential mix of email_address additional string
  • 17. MongoDB Data Warehousing OBILE ist all events, bucketed by source, event code and time:- EB/MOBILE er.login e (day, week-ending, month, year) ert into collection e_web / e_mobile sert into:- web_user_login_day web_user_login_week web_user_login_month web_user_login_year dictable model for scaling and measuring business growth
  • 18. MongoDB Data Warehousing DBObject newDocument = new BasicDBObject().append($inc new BasicDBObject().append(count, 1)); ate day dimension ction_day.update(new BasicDBObject().append(user-context, userContext) .append(eventType, eventType) .append(date, sdf_day.format(d)),newDocument, true, false ate week dimension ction_week.update(new BasicDBObject().append(user-context, userContext) .append(eventType, eventType) .append(date, sdf_day.format(w)), newDocument, true, fals ate month dimension ction_month.update(new BasicDBObject().append(user-context, userContext) .append(eventType, eventType) .append(date, sdf_month.format(d)), newDocument, true, fa ate month dimension ction_year.update(new BasicDBObject().append(user-context, userContext) .append(eventType, eventType) .append(date, sdf_year.format(d)), newDocument, true, fal
  • 19. MongoDB Data Warehousing ount_addManual_weeke_web_account_addManual_year _user_login_day _user_login_week _user_login_month _user_login_yeare_mobile_generic_ui_daye_mobile_generic_ui_monthe_mobile_g weeke_mobile_generic_ui_year e_web_user_login_day.find() d : ObjectId(50e4b9871b36921910222c42), count : 5, date : 01/02, -context : c4ca4238a0b923820dcc509a6f75849b } d : ObjectId(50cd6cfcb9a80a2b4ee21422), count : 7, date : 01/02, -context : c4ca4238a0b923820dcc509a6f75849b } d : ObjectId(50cd6e51b9a80a2b4ee21427), count : 2, date : 01/02, -context : c4ca4238a0b923820dcc509a6f75849b } d : ObjectId(50e4b9871b36921910222c42), count : 3, date : 01/03, -context : 50e49a561b36921910222c33 }
  • 20. MongoDB Data Warehousing 1, accept-charset : ISO-8859-1,utf-8;q=0.7,*;q=0.3, cookie : size= de=B; PHPSESSID=c087908516ee2fae50cef6500101dc89; resolution=1920; IONID=56EB165266A2C4AFF946F139669D746F; oken=73bdcdddf151dc56b8020855b2cb10c8, content-length : 255, accept- ing : gzip,deflate,sdch }, eventType : flick, eventData : { obje on, name : split transaction button, page : #inbox/79876/, secti saction_river_details } }
  • 21. MongoDB Data Warehousing xing Strategy xes on core collections (e_web and e_mobile) come in under 3GB on 7.5GB Large ce and 3.75GB on Medium instances datetime in two fields and compound index on date with other fields like eventTyp unique id (user-context) vy insertion rates, much lower read rates....so less indexes the better
  • 22. MongoDB Data Warehousing ing Strategy e_web.getIndexes()[ v : 1, key : { request.user-contex created_date : 1 }, ns : ycenter.e_web, name : request.user-context_1_created_date_ v : 1, key : { eventData.name : 1 created_date : 1 }, ns : moneycenter.e_web name : eventData.name_1_created_date_1 }]
  • 23. jective Loading Visualization how historic and intraday stats on core use cases (logins, conversions) how user funnel rates on conversion pages how general usability - how do users really use the Web and IOS platforms? on-Functionals traday doesn’t need to be “real-time”, polling is good enough for now Overnight batch job for historic must scale horizontally neral Implementation Strategy o all heavy lifting object manipulation, UI should just display graph or table Modularize the service to be able to regenerate any graphs/tables without a full load
  • 24. Loading Visualization va Batch Service a Mongo library to query key collections and return user counts and sum of events ursor webUserLogins = c.find( new BasicDBObject(date, sdf.format(new Date()))); vate HashMapString, Object getSumAndCount(DBCursor cursor){ HashMapString, Object m = new HashMapString, Object(); int sum=0; int count=0; DBObject obj; while(cursor.hasNext()){ obj=(DBObject)cursor.next(); count++; sum=sum+(Integer)obj.get(count); } m.put(sum, sum); m.put(count, count); m.put(average, sdf.format(new Float(sum)/count)); return m;
  • 25. Loading Visualization va Batch Service e Aggregation Framework where required on core collections (e_web) and externa reate aggregation objects bject project = new BasicDBObject($project, new BasicDBObject(day_value, fields) ); bject day_value = new BasicDBObject( day_value, $day_value); bject groupFields = new BasicDBObject( _id, day_value); reate the fields to group by, in this case “number” upFields.put(number, new BasicDBObject( $sum, 1)); reate the group bject group = new BasicDBObject($group, groupFields); xecute regationOutput output = mycollection.aggregate( project, group ); (DBObject obj : output.results()){
  • 26. Loading Visualization va Batch Service ngoDB Command Line example on aggregation over a time period, e.g. month b.e_web.aggregate( [ { $match : { created_date : { $gt : Date(2012-10-25T00:00:00)}}}, { $project : { day_value : {day dayOfMonth : $created_date }, month:{ $month : reated_date }} }}, { $group : { _id : {day_value:$day_value} number : { $sum : 1 } } }, { $sort : { day_value : -1 } } ])
  • 27. Loading Visualization va Batch Service sisting events into graph and table collections .homeGraphs.find() _id : ObjectId(50f57b5c1d4e714b581674e2), accounts_natural : 54, counts_total : 54, date : ISODate(2011-02-06T05:00:00Z), linked_rate .96, premium_rate : 0, str_date : 2011,01,06, upgrade_rate : 0 ers_avg_linked : 3.43, users_linked : 7 } _id : ObjectId(50f57b5c1d4e714b581674e3), accounts_natural : 144, counts_total : 144, date : ISODate(2011-02-07T05:00:00Z), linked_rat .11, premium_rate : 0, str_date : 2011,01,07, upgrade_rate : 0 ers_avg_linked : 4, users_linked : 16 } _id : ObjectId(50f57b5c1d4e714b581674e4), accounts_natural : 119, counts_total : 119, date : ISODate(2011-02-08T05:00:00Z), linked_rat .13, premium_rate : 0, str_date : 2011,01,08, upgrade_rate : 0 ers_avg_linked : 4.5, users_linked : 18 }
  • 28. 17) Loading Visualization day numbers try: conn = pymongo.Connection('localhost', db = conn['lvanalytics'] accountmetrics.find( cursor = {date : {$gte : dt_from, $lte : dt_to}}).sort(date) urn buildMetricsDict(cursor) except Exception as e: ger.error(e.message) urn the graph object (as a list or a dict of lists) to the view that called the thod edata={} edata['accountsGraph']=mongodb_home.getHomeChart() urn render_to_response('home.html',{'pagedata': pagedata}, text_instance=RequestContext(request)) .homeGraphs.find() _id : ObjectId(50f57b5c1d4e714b581674e2), accounts_natural : 54,
  • 29. Loading Visualization ango and HighCharts pulate the series.. (JavaScript with Django templating) iesOptions[0] = { id: 'naturalAccounts', name: Natural Accounts, data: [ {% for n pagedata.metrics.accounts_natural %} {% if not forloop.first {% endif %} [Date.UTC({{a.0}}),{{a.1}}] {% endfor ], tooltip: { valueDecimals: 2 } };
  • 30. Loading Visualization ango and HighCharts d Create the Charts and Tables...
  • 31. Loading Visualization ango and HighCharts d Create the Charts and Tables...
  • 32. Lessons Learned • Date Time managed as two fields, Datetime and Date • Aggregating and upserting documents as events are received works for us •  Real-time Map-Reduce in pyMongo - too slow, don’t do this. • Django-noRel - Unstable, use Django and configure MongoDB as a datastore only • Memcached on Django is good enough (at the moment) - use django-celery with rabbitmq to pre-cache all data after data loading •  HighCharts is buggy - considering D3 other libraries • Don’t need to retrieve data directly from MongoDB to Django, perhaps provide all data via a service layer (at the expense of ever-additional features in pyMongo)
  • 33. Next Steps • A/B testing framework, experiments and variances •  Unauthenticated / Authenticated user tracking •  Provide data async over service layer • Segmentation with graphical libraries like D3 Cross-Filter ( http://square.github.com/crossfilter/) • Saving Query Criteria, expanding out BI tools for internal users • MongoDB Connector, Hadoop and Hive (maybe Tableau and other tools) • Storm / Kafka for real-time analytics processing • Shard the Replica-Set, looking into Gizzard as the middleware
  • 34. Hrishi Dixit Chief Technology Officer Kevin Connelly Director of Engineering Will Larche kevin@learnvest.com hrishi@learnvest.com Lead IOS Developer will@learnvest.com Cameron Sim Jeremy Brennan Director of Analytics Tech your name here Director of UI/UX Technology cameron@learnvest.com New Awesome Develope jeremy@learnvest.com you@learnvest.com HIR