Klout as an Example Application of Topics-oriented NLP APIs

Topics-oriented APIs
May 2015 – APIdays Barcelona
Tyler Singletary - @harmophone
Director of Platform
tyler@klout.com

A Practical Application of Social
Media Machine Learning and NLP1

WHAT IS KLOUT, REALLY?
• Klout is an API client
application of the social
web.
• Federated identity
across platforms
• Macro and micro
understanding of
profile, conversation,
and content.
ple linked by Topics.

UNIFYING PRINCIPLE: TOPICS
• TBs of Social Interactions
a Day
• NLP applied to posts
• Aggregated to profiles:
– effects are Klout Score,
topical strengths
– The what becomes topics
– The why becomes TopicSets
• Links crawled, NLP
summarization
tent and people linked by Topics.

TOPIC SETS + USERS + SCORING
• Allow for time-series
slicing
• Aggregate counting
• Slicing of set to create
ordered list
Topic-oriented view

KLOUT DEALS WITH RIDICULOUS AMOUNTS OF DATA
o Topic assignment at scale:
o ~650 M new pieces of data daily
o hundreds of millions of profiles
o ~10,000 topics in 3-level hierarchy
o Daily update
o Multiple Social networks and various data sources:
o Twitter, Facebook, LinkedIn, Google+, Wikipedia
o User activity, profiles, connections
o Topics normalized to an evolving, managed ontology

WEIGHTING, NORMALIZATION, CALIBRATION
Signals are weighted and normalized to
mirror real-world influence
– Machine-learned weighting based on regression
analysis of survey data
Advanced algorithm based on 1500 signal
combinations of relationships and ratios
– Where: Which network is the action taking place?
– What: What action was taken?
– Who: Who acted on your content?
– How much: How many actions and unique actors?
– When: When was the action performed?

TOPIC SETS FOR CONTEXT
User’s
Influence
With various Scores
User’s
Interests
With various Scores
User’s Self-
selection
Based on registered self-
declared interest
Audience
Influence
Rollup of User’s Influence
within a user’s downlevel
and uplevel networks
Audience
Interests
Rollup of User’s Interests
within a User’s downlevel
(and uplevel) networks

CHALLENGES IN BIG DATA
● Message size: Overall data size may be
huge, but message size per user may be
small.
● Text Sparsity: Many users may be
passive consumers of content.
● Noise: colloquial language, slang,
grammatical errors, abbreviations.
● Context: Need to expand context to get
more information
● False positives are embarrassing when
user-facing

CHALLENGES TO SCALE
NLP* - StanfordNLP english.conll.4class.distsim.crf.ser.gz
● Speed Matters (650M messages a day):
○ Stanford Named Entity Extraction - 10.959 ms (82.0 CPU days)
○ Dictionary - 0.056ms (0.42 CPU days)
● Corpus
○ Stanford Named Entity Extraction:
■ {‘the rule of law’=1.0}
○ Dictionary based:
■ {‘the rule of law’=1.0, ‘nsa’=1.0, ‘eff’=1.0}

MACHINE LEARNING AT KLOUT
We our leverage past machine learning and NLP
classification assets to:
• Train new models for adding additional data sources
• Retraining Topics classification
• Predict “actionability” of support
• Predict virality of content [macro and micro]
• Predict the “personhood” of a social media account
• Content-targeting based on downlevel predictions

How do you productize this in APIs?
3

INPUTS AND OUTPUTS
People-
Specific
Insights
Input: People(s)
Output: TopicSet(s)
Topic-
Specific
People
Input: Topic(s)
Output: People
Topic-
Aggregate
Insights
Input: Topic(s)
Output: Metadata,
Aggregation
People-
Aggregate
Insights
Input: User(s)
Output: Metadata, Aggregate
Sets
GET
user.json/[id]/i
nsights/influe
nce-topics
GET
user.json/insight
s/aggregated/inf
luence-
topics?userIds=
1,2,3
GET
topic.json/[ids]/pe
ople
GET
topic.json
/[ids]/insi
ghts

PAYLOADS
{
topicSetType: "expertise",
topicSet: [
{
topicId: "7516448513106795305",
score: 0.999596145670965,
strength: "strong",
displayName: "APIs",
name: "APIs",
slug: "api",
imageUrl: "http://kcdn3.klout.com/static/images/topics/api_6bae2a67e1a5a9b68d526b4d483c4eb8.png",
displayType: "visible",
topicType: "entity"
},
{
topicId: "10000000000000008253",
score: 0.9992839644220868,
strength: "strong",
displayName: "Twitter",
name: "Twitter",
slug: "twitter",
imageUrl: "http://kcdn3.klout.com/static/images/icons/generic-topic.png",
topicType: "entity"
},
{
topicId: "8961164588331655920",
score: 0.9992326280041798,
strength: "strong",
displayName: "Klout",
name: "Klout",
slug: "klout",
imageUrl: "http://kcdn3.klout.com/static/images/klout-topic-image-1333588028647.jpg",
topicType: "entity”
topicSetType: "interest",
topicSet: [
{
topicId: "10000000000000008253",
score: 0.9946672348339362,
strength: "strong",
displayName: "Twitter",
name: "Twitter",
slug: "twitter",
imageUrl: "http://kcdn3.klout.com/static/images/icons/generic-topic.png",
topicType: "entity"
},
{
topicId: "6485494992525344250",
score: 0.9918719149780779,
strength: "strong",
displayName: "Marketing",
name: "Marketing",
slug: "marketing",
imageUrl: "http://kcdn3.klout.com/static/images/topics/people.png",
topicType: "sub"
},
{
topicId: "7516448513106795305",
score: 0.9888798650771197,
strength: "strong",
displayName: "APIs",
name: "APIs",
slug: "api",
imageUrl: "http://kcdn3.klout.com/static/images/topics/api_6bae2a67e1a5a9b68d526b4d4
topicType: "entity"
},

Let’s get practical, prescriptive and
talk about the future4

PARAMETERIZATION
• Topics Scoring uses different models in each topic
set
• Overall Topic Scoring is based on hundreds of
features, weights, decays, spanning short and
long term
• Parameterize scoring for different contexts

EXAMPLES
Use interchanging, specified models, with rules modifiers

EXAMPLES
• Treated like a product, you must think through
implementations others would make.
• Maybe even make them your own.

POLICY
• Data is great.
• Representation of data is hard.
• Raw data rarely if ever needs to be displayed.
• Balance innovation on data assets with brand and
utility, allowed use cases.

KLOUT RESEARCH ONLINE
• LASTA

Bye!
May 2015 – APIdays
Tyler Singletary - @harmophone
Director of Platform
tyler@klout.com

Klout as an Example Application of Topics-oriented NLP APIs

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (15)

Similar to Klout as an Example Application of Topics-oriented NLP APIs

Similar to Klout as an Example Application of Topics-oriented NLP APIs (20)

More from Tyler Singletary

More from Tyler Singletary (8)

Recently uploaded

Recently uploaded (20)

Klout as an Example Application of Topics-oriented NLP APIs

Editor's Notes