SlideShare a Scribd company logo
1 of 25
Topics-oriented APIs
May 2015 – APIdays Barcelona
Tyler Singletary - @harmophone
Director of Platform
tyler@klout.com
HI.
HI. WITH CONTEXT.
A Practical Application of Social
Media Machine Learning and NLP1
WHAT IS KLOUT, REALLY?
• Klout is an API client
application of the social
web.
• Federated identity
across platforms
• Macro and micro
understanding of
profile, conversation,
and content.
ple linked by Topics.
UNIFYING PRINCIPLE: TOPICS
• TBs of Social Interactions
a Day
• NLP applied to posts
• Aggregated to profiles:
– effects are Klout Score,
topical strengths
– The what becomes topics
– The why becomes TopicSets
• Links crawled, NLP
summarization
tent and people linked by Topics.
TOPIC SETS + USERS + SCORING
• Allow for time-series
slicing
• Aggregate counting
• Slicing of set to create
ordered list
Topic-oriented view
NLP-based Building Blocks
2
KLOUT DEALS WITH RIDICULOUS AMOUNTS OF DATA
o Topic assignment at scale:
o ~650 M new pieces of data daily
o hundreds of millions of profiles
o ~10,000 topics in 3-level hierarchy
o Daily update
o Multiple Social networks and various data sources:
o Twitter, Facebook, LinkedIn, Google+, Wikipedia
o User activity, profiles, connections
o Topics normalized to an evolving, managed ontology
WEIGHTING, NORMALIZATION, CALIBRATION
Signals are weighted and normalized to
mirror real-world influence
– Machine-learned weighting based on regression
analysis of survey data
Advanced algorithm based on 1500 signal
combinations of relationships and ratios
– Where: Which network is the action taking place?
– What: What action was taken?
– Who: Who acted on your content?
– How much: How many actions and unique actors?
– When: When was the action performed?
TOPIC SETS FOR CONTEXT
User’s
Influence
With various Scores
User’s
Interests
With various Scores
User’s Self-
selection
Based on registered self-
declared interest
Audience
Influence
Rollup of User’s Influence
within a user’s downlevel
and uplevel networks
Audience
Interests
Rollup of User’s Interests
within a User’s downlevel
(and uplevel) networks
CHALLENGES IN BIG DATA
● Message size: Overall data size may be
huge, but message size per user may be
small.
● Text Sparsity: Many users may be
passive consumers of content.
● Noise: colloquial language, slang,
grammatical errors, abbreviations.
● Context: Need to expand context to get
more information
● False positives are embarrassing when
user-facing
CHALLENGES TO SCALE
NLP* - StanfordNLP english.conll.4class.distsim.crf.ser.gz
● Speed Matters (650M messages a day):
○ Stanford Named Entity Extraction - 10.959 ms (82.0 CPU days)
○ Dictionary - 0.056ms (0.42 CPU days)
● Corpus
○ Stanford Named Entity Extraction:
■ {‘the rule of law’=1.0}
○ Dictionary based:
■ {‘the rule of law’=1.0, ‘nsa’=1.0, ‘eff’=1.0}
WEBSTER
MACHINE LEARNING AT KLOUT
We our leverage past machine learning and NLP
classification assets to:
• Train new models for adding additional data sources
• Retraining Topics classification
• Predict “actionability” of support
• Predict virality of content [macro and micro]
• Predict the “personhood” of a social media account
• Content-targeting based on downlevel predictions
How do you productize this in APIs?
3
INPUTS AND OUTPUTS
People-
Specific
Insights
Input: People(s)
Output: TopicSet(s)
Topic-
Specific
People
Input: Topic(s)
Output: People
Topic-
Aggregate
Insights
Input: Topic(s)
Output: Metadata,
Aggregation
People-
Aggregate
Insights
Input: User(s)
Output: Metadata, Aggregate
Sets
GET
user.json/[id]/i
nsights/influe
nce-topics
GET
user.json/insight
s/aggregated/inf
luence-
topics?userIds=
1,2,3
GET
topic.json/[ids]/pe
ople
GET
topic.json
/[ids]/insi
ghts
PAYLOADS
{
topicSetType: "expertise",
topicSet: [
{
topicId: "7516448513106795305",
score: 0.999596145670965,
strength: "strong",
displayName: "APIs",
name: "APIs",
slug: "api",
imageUrl: "http://kcdn3.klout.com/static/images/topics/api_6bae2a67e1a5a9b68d526b4d483c4eb8.png",
displayType: "visible",
topicType: "entity"
},
{
topicId: "10000000000000008253",
score: 0.9992839644220868,
strength: "strong",
displayName: "Twitter",
name: "Twitter",
slug: "twitter",
imageUrl: "http://kcdn3.klout.com/static/images/icons/generic-topic.png",
displayType: "visible",
topicType: "entity"
},
{
topicId: "8961164588331655920",
score: 0.9992326280041798,
strength: "strong",
displayName: "Klout",
name: "Klout",
slug: "klout",
imageUrl: "http://kcdn3.klout.com/static/images/klout-topic-image-1333588028647.jpg",
displayType: "visible",
topicType: "entity”
topicSetType: "interest",
topicSet: [
{
topicId: "10000000000000008253",
score: 0.9946672348339362,
strength: "strong",
displayName: "Twitter",
name: "Twitter",
slug: "twitter",
imageUrl: "http://kcdn3.klout.com/static/images/icons/generic-topic.png",
displayType: "visible",
topicType: "entity"
},
{
topicId: "6485494992525344250",
score: 0.9918719149780779,
strength: "strong",
displayName: "Marketing",
name: "Marketing",
slug: "marketing",
imageUrl: "http://kcdn3.klout.com/static/images/topics/people.png",
displayType: "visible",
topicType: "sub"
},
{
topicId: "7516448513106795305",
score: 0.9888798650771197,
strength: "strong",
displayName: "APIs",
name: "APIs",
slug: "api",
imageUrl: "http://kcdn3.klout.com/static/images/topics/api_6bae2a67e1a5a9b68d526b4d4
displayType: "visible",
topicType: "entity"
},
Let’s get practical, prescriptive and
talk about the future4
PARAMETERIZATION
• Topics Scoring uses different models in each topic
set
• Overall Topic Scoring is based on hundreds of
features, weights, decays, spanning short and
long term
• Parameterize scoring for different contexts
EXAMPLES
Use interchanging, specified models, with rules modifiers
EXAMPLES
• Treated like a product, you must think through
implementations others would make.
• Maybe even make them your own.
POLICY
• Data is great.
• Representation of data is hard.
• Raw data rarely if ever needs to be displayed.
• Balance innovation on data assets with brand and
utility, allowed use cases.
KLOUT RESEARCH ONLINE
• LASTA
Bye!
May 2015 – APIdays
Tyler Singletary - @harmophone
Director of Platform
tyler@klout.com

More Related Content

Viewers also liked

Jeeves -natural language interface application
Jeeves -natural language interface applicationJeeves -natural language interface application
Jeeves -natural language interface applicationKaran Harsh Wardhan
 
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...Grokking VN
 
Search Analytics with Flume and HBase
Search Analytics with Flume and HBaseSearch Analytics with Flume and HBase
Search Analytics with Flume and HBaseSematext Group, Inc.
 
Elasticsearch for SQL Users
Elasticsearch for SQL UsersElasticsearch for SQL Users
Elasticsearch for SQL UsersAll Things Open
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in SolrTommaso Teofili
 
Webinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior RelevanceWebinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior RelevanceLucidworks
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014PyData
 
Artificial Intelligence 02 uninformed search
Artificial Intelligence 02 uninformed searchArtificial Intelligence 02 uninformed search
Artificial Intelligence 02 uninformed searchAndres Mendez-Vazquez
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingrohitnayak
 
Use Cases for Elastic Search Percolator
Use Cases for Elastic Search PercolatorUse Cases for Elastic Search Percolator
Use Cases for Elastic Search PercolatorMaxim Shelest
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 

Viewers also liked (15)

Jeeves -natural language interface application
Jeeves -natural language interface applicationJeeves -natural language interface application
Jeeves -natural language interface application
 
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
 
Search Analytics with Flume and HBase
Search Analytics with Flume and HBaseSearch Analytics with Flume and HBase
Search Analytics with Flume and HBase
 
Elasticsearch for SQL Users
Elasticsearch for SQL UsersElasticsearch for SQL Users
Elasticsearch for SQL Users
 
NLP from scratch
NLP from scratch NLP from scratch
NLP from scratch
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in Solr
 
Webinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior RelevanceWebinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior Relevance
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014
 
Artificial Intelligence 02 uninformed search
Artificial Intelligence 02 uninformed searchArtificial Intelligence 02 uninformed search
Artificial Intelligence 02 uninformed search
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Use Cases for Elastic Search Percolator
Use Cases for Elastic Search PercolatorUse Cases for Elastic Search Percolator
Use Cases for Elastic Search Percolator
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 

Similar to Klout as an Example Application of Topics-oriented NLP APIs

Narrative Mind Lessons Learned
Narrative Mind Lessons LearnedNarrative Mind Lessons Learned
Narrative Mind Lessons LearnedH4Diadmin
 
Narrative Mind Lessons Learned H4D Stanford 2016
Narrative Mind Lessons Learned H4D Stanford 2016Narrative Mind Lessons Learned H4D Stanford 2016
Narrative Mind Lessons Learned H4D Stanford 2016Stanford University
 
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)Semantic Web Company
 
Scalable Topic-Specific Influence Analysis on Microblogs
Scalable Topic-Specific Influence Analysis on MicroblogsScalable Topic-Specific Influence Analysis on Microblogs
Scalable Topic-Specific Influence Analysis on MicroblogsYuanyuan Tian
 
Putting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationPutting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationMathieu d'Aquin
 
Narrative Mind Week 5 H4D Stanford 2016
Narrative Mind Week 5 H4D Stanford 2016Narrative Mind Week 5 H4D Stanford 2016
Narrative Mind Week 5 H4D Stanford 2016Stanford University
 
TruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social NetworkTruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social NetworkLora Aroyo
 
Narrative Mind Week 6 H4D Stanford 2016
Narrative Mind Week 6 H4D Stanford 2016Narrative Mind Week 6 H4D Stanford 2016
Narrative Mind Week 6 H4D Stanford 2016Stanford University
 
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...keelangreen
 
Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...
Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...
Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...benaam
 
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence Marina Santini
 
Conor Hayes - Topics, tags and trends in the blogosphere
Conor Hayes - Topics, tags and trends in the blogosphereConor Hayes - Topics, tags and trends in the blogosphere
Conor Hayes - Topics, tags and trends in the blogosphereDERIGalway
 
The Art and Science of Requirements Gathering
The Art and Science of Requirements GatheringThe Art and Science of Requirements Gathering
The Art and Science of Requirements GatheringVanessa Turke
 
Harvesting Intelligence from User Interactions
Harvesting Intelligence from User Interactions Harvesting Intelligence from User Interactions
Harvesting Intelligence from User Interactions R A Akerkar
 
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsProjection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsIRJET Journal
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & AnalysisScott Sanders
 

Similar to Klout as an Example Application of Topics-oriented NLP APIs (20)

Narrative Mind Lessons Learned
Narrative Mind Lessons LearnedNarrative Mind Lessons Learned
Narrative Mind Lessons Learned
 
Narrative Mind Lessons Learned H4D Stanford 2016
Narrative Mind Lessons Learned H4D Stanford 2016Narrative Mind Lessons Learned H4D Stanford 2016
Narrative Mind Lessons Learned H4D Stanford 2016
 
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
 
Scalable Topic-Specific Influence Analysis on Microblogs
Scalable Topic-Specific Influence Analysis on MicroblogsScalable Topic-Specific Influence Analysis on Microblogs
Scalable Topic-Specific Influence Analysis on Microblogs
 
Putting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationPutting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education Organisation
 
Narrative Mind Week 5 H4D Stanford 2016
Narrative Mind Week 5 H4D Stanford 2016Narrative Mind Week 5 H4D Stanford 2016
Narrative Mind Week 5 H4D Stanford 2016
 
TruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social NetworkTruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social Network
 
Narrative Mind Week 6 H4D Stanford 2016
Narrative Mind Week 6 H4D Stanford 2016Narrative Mind Week 6 H4D Stanford 2016
Narrative Mind Week 6 H4D Stanford 2016
 
MediaGlu and Mongo DB
MediaGlu and Mongo DBMediaGlu and Mongo DB
MediaGlu and Mongo DB
 
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
 
Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...
Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...
Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...
 
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
 
Maruti gollapudi cv
Maruti gollapudi cvMaruti gollapudi cv
Maruti gollapudi cv
 
Conor Hayes - Topics, tags and trends in the blogosphere
Conor Hayes - Topics, tags and trends in the blogosphereConor Hayes - Topics, tags and trends in the blogosphere
Conor Hayes - Topics, tags and trends in the blogosphere
 
The Art and Science of Requirements Gathering
The Art and Science of Requirements GatheringThe Art and Science of Requirements Gathering
The Art and Science of Requirements Gathering
 
Semantic web & structured data - #SMT Search Marketing Thursday - Jan-Willem ...
Semantic web & structured data - #SMT Search Marketing Thursday - Jan-Willem ...Semantic web & structured data - #SMT Search Marketing Thursday - Jan-Willem ...
Semantic web & structured data - #SMT Search Marketing Thursday - Jan-Willem ...
 
Harvesting Intelligence from User Interactions
Harvesting Intelligence from User Interactions Harvesting Intelligence from User Interactions
Harvesting Intelligence from User Interactions
 
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsProjection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & Analysis
 
Recsys 2016
Recsys 2016Recsys 2016
Recsys 2016
 

More from Tyler Singletary

APIs as a Venture Capital Model
APIs as a Venture Capital ModelAPIs as a Venture Capital Model
APIs as a Venture Capital ModelTyler Singletary
 
APIs are for humans (defrag-con)
APIs are for humans   (defrag-con)APIs are for humans   (defrag-con)
APIs are for humans (defrag-con)Tyler Singletary
 
Ouroburos As A Service - Klout
Ouroburos As A Service - KloutOuroburos As A Service - Klout
Ouroburos As A Service - KloutTyler Singletary
 
Mobile First (or maybe second) API Development
Mobile First (or maybe second) API DevelopmentMobile First (or maybe second) API Development
Mobile First (or maybe second) API DevelopmentTyler Singletary
 
Big Data - Small Print (proposal version)
Big Data - Small Print (proposal version)Big Data - Small Print (proposal version)
Big Data - Small Print (proposal version)Tyler Singletary
 

More from Tyler Singletary (8)

HOWTO: Shut It Down
HOWTO: Shut It DownHOWTO: Shut It Down
HOWTO: Shut It Down
 
APIs as a Venture Capital Model
APIs as a Venture Capital ModelAPIs as a Venture Capital Model
APIs as a Venture Capital Model
 
APIs are for humans (defrag-con)
APIs are for humans   (defrag-con)APIs are for humans   (defrag-con)
APIs are for humans (defrag-con)
 
Ouroburos As A Service - Klout
Ouroburos As A Service - KloutOuroburos As A Service - Klout
Ouroburos As A Service - Klout
 
Mobile First (or maybe second) API Development
Mobile First (or maybe second) API DevelopmentMobile First (or maybe second) API Development
Mobile First (or maybe second) API Development
 
Big data; small print.
Big data; small print.Big data; small print.
Big data; small print.
 
Big Data - Small Print (proposal version)
Big Data - Small Print (proposal version)Big Data - Small Print (proposal version)
Big Data - Small Print (proposal version)
 
Mobile APIs in Practice
Mobile APIs in PracticeMobile APIs in Practice
Mobile APIs in Practice
 

Recently uploaded

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 

Recently uploaded (20)

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 

Klout as an Example Application of Topics-oriented NLP APIs

  • 1. Topics-oriented APIs May 2015 – APIdays Barcelona Tyler Singletary - @harmophone Director of Platform tyler@klout.com
  • 2. HI.
  • 4. A Practical Application of Social Media Machine Learning and NLP1
  • 5. WHAT IS KLOUT, REALLY? • Klout is an API client application of the social web. • Federated identity across platforms • Macro and micro understanding of profile, conversation, and content. ple linked by Topics.
  • 6. UNIFYING PRINCIPLE: TOPICS • TBs of Social Interactions a Day • NLP applied to posts • Aggregated to profiles: – effects are Klout Score, topical strengths – The what becomes topics – The why becomes TopicSets • Links crawled, NLP summarization tent and people linked by Topics.
  • 7. TOPIC SETS + USERS + SCORING • Allow for time-series slicing • Aggregate counting • Slicing of set to create ordered list Topic-oriented view
  • 9. KLOUT DEALS WITH RIDICULOUS AMOUNTS OF DATA o Topic assignment at scale: o ~650 M new pieces of data daily o hundreds of millions of profiles o ~10,000 topics in 3-level hierarchy o Daily update o Multiple Social networks and various data sources: o Twitter, Facebook, LinkedIn, Google+, Wikipedia o User activity, profiles, connections o Topics normalized to an evolving, managed ontology
  • 10. WEIGHTING, NORMALIZATION, CALIBRATION Signals are weighted and normalized to mirror real-world influence – Machine-learned weighting based on regression analysis of survey data Advanced algorithm based on 1500 signal combinations of relationships and ratios – Where: Which network is the action taking place? – What: What action was taken? – Who: Who acted on your content? – How much: How many actions and unique actors? – When: When was the action performed?
  • 11. TOPIC SETS FOR CONTEXT User’s Influence With various Scores User’s Interests With various Scores User’s Self- selection Based on registered self- declared interest Audience Influence Rollup of User’s Influence within a user’s downlevel and uplevel networks Audience Interests Rollup of User’s Interests within a User’s downlevel (and uplevel) networks
  • 12. CHALLENGES IN BIG DATA ● Message size: Overall data size may be huge, but message size per user may be small. ● Text Sparsity: Many users may be passive consumers of content. ● Noise: colloquial language, slang, grammatical errors, abbreviations. ● Context: Need to expand context to get more information ● False positives are embarrassing when user-facing
  • 13. CHALLENGES TO SCALE NLP* - StanfordNLP english.conll.4class.distsim.crf.ser.gz ● Speed Matters (650M messages a day): ○ Stanford Named Entity Extraction - 10.959 ms (82.0 CPU days) ○ Dictionary - 0.056ms (0.42 CPU days) ● Corpus ○ Stanford Named Entity Extraction: ■ {‘the rule of law’=1.0} ○ Dictionary based: ■ {‘the rule of law’=1.0, ‘nsa’=1.0, ‘eff’=1.0}
  • 15. MACHINE LEARNING AT KLOUT We our leverage past machine learning and NLP classification assets to: • Train new models for adding additional data sources • Retraining Topics classification • Predict “actionability” of support • Predict virality of content [macro and micro] • Predict the “personhood” of a social media account • Content-targeting based on downlevel predictions
  • 16. How do you productize this in APIs? 3
  • 17. INPUTS AND OUTPUTS People- Specific Insights Input: People(s) Output: TopicSet(s) Topic- Specific People Input: Topic(s) Output: People Topic- Aggregate Insights Input: Topic(s) Output: Metadata, Aggregation People- Aggregate Insights Input: User(s) Output: Metadata, Aggregate Sets GET user.json/[id]/i nsights/influe nce-topics GET user.json/insight s/aggregated/inf luence- topics?userIds= 1,2,3 GET topic.json/[ids]/pe ople GET topic.json /[ids]/insi ghts
  • 18. PAYLOADS { topicSetType: "expertise", topicSet: [ { topicId: "7516448513106795305", score: 0.999596145670965, strength: "strong", displayName: "APIs", name: "APIs", slug: "api", imageUrl: "http://kcdn3.klout.com/static/images/topics/api_6bae2a67e1a5a9b68d526b4d483c4eb8.png", displayType: "visible", topicType: "entity" }, { topicId: "10000000000000008253", score: 0.9992839644220868, strength: "strong", displayName: "Twitter", name: "Twitter", slug: "twitter", imageUrl: "http://kcdn3.klout.com/static/images/icons/generic-topic.png", displayType: "visible", topicType: "entity" }, { topicId: "8961164588331655920", score: 0.9992326280041798, strength: "strong", displayName: "Klout", name: "Klout", slug: "klout", imageUrl: "http://kcdn3.klout.com/static/images/klout-topic-image-1333588028647.jpg", displayType: "visible", topicType: "entity” topicSetType: "interest", topicSet: [ { topicId: "10000000000000008253", score: 0.9946672348339362, strength: "strong", displayName: "Twitter", name: "Twitter", slug: "twitter", imageUrl: "http://kcdn3.klout.com/static/images/icons/generic-topic.png", displayType: "visible", topicType: "entity" }, { topicId: "6485494992525344250", score: 0.9918719149780779, strength: "strong", displayName: "Marketing", name: "Marketing", slug: "marketing", imageUrl: "http://kcdn3.klout.com/static/images/topics/people.png", displayType: "visible", topicType: "sub" }, { topicId: "7516448513106795305", score: 0.9888798650771197, strength: "strong", displayName: "APIs", name: "APIs", slug: "api", imageUrl: "http://kcdn3.klout.com/static/images/topics/api_6bae2a67e1a5a9b68d526b4d4 displayType: "visible", topicType: "entity" },
  • 19. Let’s get practical, prescriptive and talk about the future4
  • 20. PARAMETERIZATION • Topics Scoring uses different models in each topic set • Overall Topic Scoring is based on hundreds of features, weights, decays, spanning short and long term • Parameterize scoring for different contexts
  • 21. EXAMPLES Use interchanging, specified models, with rules modifiers
  • 22. EXAMPLES • Treated like a product, you must think through implementations others would make. • Maybe even make them your own.
  • 23. POLICY • Data is great. • Representation of data is hard. • Raw data rarely if ever needs to be displayed. • Balance innovation on data assets with brand and utility, allowed use cases.
  • 25. Bye! May 2015 – APIdays Tyler Singletary - @harmophone Director of Platform tyler@klout.com

Editor's Notes

  1. Klout is best known for the Klout Score. For better or worse. We have more.
  2. Now we know a bit more about me. We don’t really know what it all means here. Expertise tells us a bit more. Things towards the bottom start to look like interests. I’m mostly known for talking about Politics of APIs issues. I won’t be doing that here.
  3. Not going to recover Louis’ Predictive APIs talk – I’m not a machine learning expert.
  4. I lied.