SlideShare a Scribd company logo
1 of 45
Download to read offline
What do you know about
an alligator when you know
the company it keeps?
Katrin Erk
University of Texas at Austin
STARSEM 2017
Distributional semantics and
you
• Distributional models/Embeddings: An incredible
success story in computational linguistics
• Do you make use of distributional information, too?
• Landauer & Dumais, 1997: “A solution to Plato’s problem”
• How do humans acquire such a gigantic vocabulary in such a
short time?
• Much debate in psychology,
experimental support: McDonald&Ramscar, 2001,
Lazaridou et al, 2016
• But how about the linguistic side of the story?
“A solution to Plato’s problem”
“Many well-read adults know that Buddha sat long
under a banyan tree (whatever that is) and Tahitian
natives lived idyllically on breadfruit and poi (whatever
those are). More or less correct usage often precedes
referential knowledge” (Landauer&Dumais, 1997)
“A solution to Plato’s problem”
“Many well-read adults know that Buddha sat long
under a banyan tree (whatever that is) and Tahitian
natives lived idyllically on breadfruit and poi (whatever
those are). More or less correct usage often precedes
referential knowledge” ” (Landauer&Dumais, 1997)
But wait: How can you use the word “banyan” more or
less correctly when you are not aware of its reference?
When you couldn’t point out a banyan in a yard?
Learning about word meaning
from textual context
• Main aim: insight
• What information is present in distributional
representations, and why?
• Assuming a learner with grounded concepts:
How can distributional information contribute?
Learning about meaning from
textual context
Suppose you do not know what an alligator is. What
do these sentence tell you about alligators?
• On our last evening, the boatman killed an alligator as
it crawled past our camp-fire to go hunting in the reeds
beyond.
• A study done by Edwin Colbert and his colleagues
showed that a tiny 50 gramme (1.76 oz) alligator heated
up 1◦C every minute and a half from the Sun[…]
• The throne was occupied by a pipe-smoking alligator.
Learning about word meaning
from textual context
• Setting: adult learner
• What kind of information can you get from text?
• How does it enable you to use “alligator” more or less
correctly?
• Why can you learn anything from text?
• Textual clues are rarely 100% reliable
• “An alligator was lying at the bottom of a pool”
• Could be an animal, a pool-cleaning implement…
The story in a nutshell
• How can I successfully use the word “alligator”
when I don’t know what it refers to?
• I know some properties of alligators: they are
animals, dangerous, …
• So then I use “alligator” in animal-like textual
contexts
The story in a nutshell
• How does distributional information help?
• It lets me infer properties of words:
• Suppose I don’t know what an alligator is
• But it appears in similar contexts as “crocodile”
• So it must be something like a crocodile:
• That is, it must share properties with a crocodile
• So it may be an animal, it may be dangerous…
The story in a nutshell
• But distributional information can never yield
certain knowledge
• Instead uncertain, probabilistic information
• Formal semantics framework
• Probabilistic semantics:
• Probability distribution over worlds that could be the
current one
• Probability of a world influenced by distributional
information
Plan
• What can an agent learn from distributional context?
• A probabilistic information state
• Influencing a probabilistic information state with
distributional information
• A toy experiment
What is in an embedding?
• What information can be encoded in an embedding
computed from text data?
• Lots of things, given the right objective function
• But:
• What objective function can we assume a human agent
to use?
• What individual linguistic phenomena have been
shown to be encoded?
• So, restrict ourselves to simple model
What is in an embedding?
• Count-based models of textual context
• (and neural models like word2vec,
see Levy&Goldberg 2015)
• Long-time criticism in psychology, eg. Murphy (2002):
only a vague notion of “similarity”
• But in fact distributional models can distinguish between
semantic relations
• by choice of what “context” means
• through relation-specific classifiers (Fu et al, 2014; Levy et al,
2015; Shwartz et al, 2016; Roller& Erk, 2016, …)
The effect of context window size
• Peirsman 2008 (Dutch):
• Narrow context window: high ratings to “similar” words
• Particularly to co-hyponyms
• Syntactic context even more so
• Wide context window: high ratings to “related” words
• Baroni/Lenci 2011 (English):
• Narrow context window: highest ratings to co-hyponyms
• Wide context window: ratings equal across many relations
What is narrow-window
similarity?
• High ratings for co-hyponyms, also synonyms, some
hypernyms, antonyms (well-known bug)
• What semantic relation is that?
• Co-hyponymy is an odd relation
• dictionary-specific
• can be incompatible (cat/dog) or compatible
(hotel/restaurant)
• Proposal: property overlap
• Alligator, crocodile have many properties in common:
animal, reptile, scaly, dangerous, …
Why does narrow-window
similarity do this?
• Focus on noun targets
• Narrow window, syntactic context contain:
• Modifiers
• Verbs that take target as argument
• Selectional constraints
• Traditionally formulated in terms of taxonomic
properties
• subject of “crawl”: animate
But wait, where do the
probabilities come from?
• Frequency in text is not frequency in real life
• Reporting bias: Almost no one says “Bananas are
yellow” (Bruni et al, 2012)
• Genre bias: “captive” and “westerner” respective
nearest neighbors in Lin 1998
• Then how can counts in text lead us to probabilities
relevant to grounded concepts?
But wait, where do the
probabilities come from?
• Two tricks in this study
1. Only consider properties that apply to all members of
a category (like “being an animal”)
2. Use distributional context only indirectly: Learn
correlation between distributional context and real-
world properties
• More recent work: trick 2 without trick 1
• I think we can use distributional context directly
and properly to get probabilities – more later
Learning properties from
distributional data
• Concrete noun concepts
• To learn: properties of a concept
• Focus on properties applying to all members of a
category (like taxonomic properties)
• Broad definition of a property: can be expressed as an
adjective, can be a hypernym, …
Property overlap
• Percentage of properties that are joint
• Jaccard coefficient on sets
• A, B, sets of properties:
• Degrees of property overlap
• Idea: The more properties in common, the higher the
distributional similarity
Jac(A, B) =
|A  B|
|A [ B|
jac = 2 / 6 = 0.33
Plan
• What can an agent learn from distributional context?
• A probabilistic information state
• Influencing a probabilistic information state with
distributional information
• A toy experiment
Information states
• Information state of Agent: set of worlds that the agent
considers possibilities
• Agent not omniscient
• As far as Agent is concerned, any of these worlds could be
the actual world
• Update semantics: Information state updated through
communication (Veltman 1996)
• Probabilistic information state: probability distribution
over worlds (van Benthem et al. 2009, Zeevat 2013)
• Not all worlds equally likely to be the actual world
Probabilistic logics
• Uncertainty about the world we are in
• Probability distribution over worlds
• Nilsson 1986
• Probability that a sentence is true depends on the
probabilities of the worlds in which it is true
P(') =
X
w:||'||w=t
P(w)
Generating a probability
distribution over worlds
• Text understanding as a generative process
• Agent mentally simulates (i.e., probabilistically
generates) the situation described in the text
• Goodman et al, 2015; Goodman and Lassiter, 2016
• To generate a person:
• draw gender: flip a fair coin
• draw height from the normal distribution of heights for
that gender.
Properties in a probabilistic
information state
• Property applies in a particular world: extension of
predicate included in extension of property in that
world
• Focus here: Properties that the agent is certain
about: apply in all worlds that have non-zero
probability
Plan
• What can an agent learn from distributional context?
• A probabilistic information state
• Influencing a probabilistic information state with
distributional information
• A toy experiment
Bayesian update on the probability
distribution over worlds
• Prior distribution over worlds P0
• Then we see distributional evidence Edist
• e.g.: Distributional similarity of “crocodile” and
“alligator” is 0.93
• Posterior distribution P1 given Edist
• How do we determine the likelihood?
P1(w) = P(w|Edist) =
P(Edist|w)P0(w)
P(Edist)
Interpreting distributional data
• Speaker observes words with known properties,
and their
distributional
similarity
Property overlap from McRae feature norms (McRae et al 2005).
Similarities from a narrow-context model computed on UKWaC+
Wikipedia+BNC
word 1 word 2 ovl sim
peacock raven 0.29 0.70
mixer toaster 0.19 0.72
crocodile frog 0.17 0.86
bagpipe banjo 0.10 0.72
scissors typewriter 0.04 0.62
crocodile lime 0.03 0.33
coconut porcupine 0.03 0.42
Observing regularities: high property overlap
goes with high distributional similarity
word 1 word 2 ovl sim
peacock raven 0.29 0.70
mixer toaster 0.19 0.72
crocodile frog 0.17 0.86
bagpipe banjo 0.10 0.72
scissors typewriter 0.04 0.62
crocodile lime 0.03 0.33
coconut porcupine 0.03 0.42
0.05 0.10 0.15 0.20 0.25 0.30
0.20.61.0
Property overlap versus
similarity (artificial data)
property overlap
dist.sim.
In the simplest case:
linear regression.
Given the regularities I observed, and the
distributional evidence, what do I now
think of world w?
• World w:
• property overlap of crocodile and alligator is o = 0.1
• Predicted similarity:
• Distributional evidence: sim(crocodile, alligator) = 0.93
• How likely are we to observe a distributional
similarity of 0.93 if the predicted similarity is 0.53?
• Standard move in hypothesis testing: How likely to
see an observed value this high or higher
given the predicted distribution?
0 + 1o = 0.53
Likelihood of the distributional
evidence in this world
• What distribution?
• Equivalent view of linear regression:
Observed similarity = predicted similarity + normally
distributed error
• Normal distribution with mean
f(o) = 0 + 1o
0.00.10.20.30.4
dist.rating
prob.density
f(o)
0.00.10.20.30.4
prob.density
Likelihood of the distributional
evidence in this world
• Distributional similarity s = sim(crocodile, alligator)
• Hypothesis testing: How likely to see similarity value
as high as s or higher given property overlap o?
0.00.10.20.30.4
prob.density
f(o)
0.00.10.20.30.4
prob.density
f(o) s
Computing posterior probabilities in
a probabilistic generative framework
• Probabilistically generate worlds:
• “To generate a person, flip a fair coin to determine their
gender…”
• Approximately determine probability distribution
over worlds: Sample n probabilistically generated
worlds
• Sample from posterior:
• Rejection sampling
• Formulate likelihood as a sampling condition
Computing posterior probabilities in
a probabilistic generative framework
• Property overlap o between crocodiles and alligators
in world w
• Distributional similarity s = sim(crocodile, alligator)
• Keep w if similarity as high as s or higher is likely
given o
• Sample s’ from the normal
distribution with mean f(o)
• Keep world w if s’ >= s
0.00.10.20.30.4
prob.density
f(o) 0.00.10.20.30.4
prob.density
f(o) s
Plan
• What can an agent learn from distributional context?
• A probabilistic information state
• Influencing a probabilistic information state with
distributional information
• A toy experiment
Toy experiments
• Property collection: McRae et al., 2005
• Human-generated definitional features for concrete noun
properties
• Distributional model: narrow context, UKWaC + Wikipedia +
BNC
• Hold out alligator as unknown word
• Given distributional evidence, how likely are we to believe…
1. All alligators are dangerous
2. All alligators are edible
3. All alligators are animals
Toy experiments
• All alligators are dangerous:
• Known word: crocodile. sim(alligator, crocodile) = 0.93
• Crocodiles are animals, dangerous, scaly, and crocodiles
• All alligators are edible:
• Known word: trout. sim(alligator, trout) = 0.68
• Trouts are animals, aquatic, edible, and trouts
• Probability should be lower because similarity is lower
• All alligators are animals:
• Known words: crocodile, trout.
• Can evidence accumulate with multiple similarity ratings?
Generative story for the
prior probability
• Fix domain size to 10
• For each entity in the domain:
• Flip a fair coin to determine if it is a crocodile. Likewise for
alligator.
• For each entity in the domain:
• If it is a crocodile, it is also an animal, dangerous, and scaly.
• Otherwise, flip a fair coin to see if it is an animal (dangerous,
scaly).
Implemented in Church.
Results: All alligators are…
Sentence words sim prior posterior
. . . dangerous alligator,
crocodile
0.93 0.26 0.47
. . . edible alligator, trout 0.68 0.26 0.38
• Aim: Significant increase in probability
• Absolute probabilities depend on domain size,
problem formulation
• Higher similarities lead to significantly more confident inferences
• “Crocodile” much more similar to “alligator” than “trout”:
Agent more confidently ascribes crocodile properties to alligators
Probability of property
overlap: prior versus posterior
0 0.2 0.4 0.6 0.8 1
no dist. evidence
with dist. evidence
Property overlap of 'alligator' and 'crocodile'
prop. overlap
num.worlds
0200400600800
0 0.2 0.4 0.6 0.8 1
no dist. evidence
with dist. evidence
Property overlap of 'alligator' and trout'
prop. overlap
num.worlds
0200400600800
Alligator vs crocodile Alligator vs trout
prior
posterior
Accumulating evidence:
“All alligators are animals”
sim of alligator to. . . prior posterior
crocodile: 0.93 0.53 0.68
trout: 0.68 0.53 0.63
crocodile: 0.93,
trout: 0.68
0.53 0.80
• Does distributional evidence accumulate?
• Both crocodiles and trouts are known to be animals
• Posterior significantly higher
when two pieces of evidence present
Summary
• How can people use a word whose reference they don’t
know?
• Suppose we don’t know what an alligator is, can we still
infer from context clues that it’s an animal?
• Proposal:
• (Narrow-window) distributional evidence is property overlap
evidence
• Distributional evidence affects probabilistic information state
• Can be described in probabilistic generative framework
Next questions
• Learning from a single sentence only
• On our last evening, the boatman killed an alligator as it
crawled past our camp-fire to go hunting in the reeds beyond.
• Distributional one-shot learning
• Doable: same setup, learn McRae et al. definitional features
using selectional constraints of neighboring predicates
• Properties that do not apply to all members of a category
• Some but not all crocodiles are dangerous
• Learn probability of generating a property for “alligator”
Next questions
• Here: Learn from context only indirectly,
from correlation with grounded properties
• Can we learn from what is said in the text?
• On our last evening, the boatman killed an alligator as it
crawled past our camp-fire to go hunting in the reeds beyond.
• Alligators are entities that generally crawl, hunt, and are
found in reeds
• P(q is a generic property of alligators that would be
mentioned by people)
• Relevant to “human experience of alligators”
(Thill/Padó/Ziemke 2014)
Thanks
Gemma Boleda, Louise McNally, Judith Tonhauser
(best editor on earth!), Nicholas Asher, Marco Baroni,
David Beaver, John Beavers, Ann Copestake, Ido
Dagan, Aurélie Herbelot, Hans Kamp, Alexander
Koller, Alessandro Lenci, Sebastian Löbner, Julian
Michael, Ray Mooney, Sebastian Padó, Manfred
Pinkal, Stephen Roller, Hinrich Schütze, Jan van Eijck,
Leah Velleman, Steve Wechsler, Roberto Zamparelli,
and the Foundations of Semantic Spaces reading group

More Related Content

Similar to Katrin Erk - 2017 - What do you know about an alligator when you know the company it keeps?

Variation in speech tempo: Capt. Kirk, Mr. Spock, and all of us in between
Variation in speech tempo: Capt. Kirk, Mr. Spock, and all of us in betweenVariation in speech tempo: Capt. Kirk, Mr. Spock, and all of us in between
Variation in speech tempo: Capt. Kirk, Mr. Spock, and all of us in betweenTyler Schnoebelen
 
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...Walid Saba
 
Week 5 genetic basis of evolution
Week 5   genetic basis of evolutionWeek 5   genetic basis of evolution
Week 5 genetic basis of evolutionYannick Wurm
 
Fuzzy mathematics:An application oriented introduction
Fuzzy mathematics:An application oriented introductionFuzzy mathematics:An application oriented introduction
Fuzzy mathematics:An application oriented introductionNagasuri Bala Venkateswarlu
 
You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8
You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8
You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8Manuela Pestana
 
Classical and Fuzzy Relations
Classical and Fuzzy RelationsClassical and Fuzzy Relations
Classical and Fuzzy RelationsMusfirah Malik
 
Joint Rumour Stance and Veracity
Joint Rumour Stance and VeracityJoint Rumour Stance and Veracity
Joint Rumour Stance and VeracityLeon Derczynski
 
DATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptxDATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptxDrPraveenPawar
 
Uncertain Knowledge in AI from Object Automation
Uncertain Knowledge in AI from Object Automation Uncertain Knowledge in AI from Object Automation
Uncertain Knowledge in AI from Object Automation Object Automation
 
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...IT Arena
 
009 Essay Example Maxr. Online assignment writing service.
009 Essay Example Maxr. Online assignment writing service.009 Essay Example Maxr. Online assignment writing service.
009 Essay Example Maxr. Online assignment writing service.Angelina Johnson
 
Scientific Method Lecture 1 WA UAM 1MA/2 sem/2013
Scientific Method Lecture 1 WA UAM 1MA/2 sem/2013Scientific Method Lecture 1 WA UAM 1MA/2 sem/2013
Scientific Method Lecture 1 WA UAM 1MA/2 sem/2013Barbara Konat
 
Essential human sciences in 2 lessons (with extension if required)
Essential human sciences in 2 lessons (with extension if required)Essential human sciences in 2 lessons (with extension if required)
Essential human sciences in 2 lessons (with extension if required)Kieran Ryan
 
Essential human sciences in 2 lessons (with extension if required)
Essential human sciences in 2 lessons (with extension if required)Essential human sciences in 2 lessons (with extension if required)
Essential human sciences in 2 lessons (with extension if required)Kieran Ryan
 
DH Tools Workshop #1: Text Analysis
DH Tools Workshop #1:  Text AnalysisDH Tools Workshop #1:  Text Analysis
DH Tools Workshop #1: Text Analysiscjbuckner
 
Thinking, Language, and Intelligence
Thinking, Language, and IntelligenceThinking, Language, and Intelligence
Thinking, Language, and IntelligenceTan Gent
 
1_Introductio_thinking, reasoning, logic, argument, fallacies.pptx
1_Introductio_thinking, reasoning, logic, argument, fallacies.pptx1_Introductio_thinking, reasoning, logic, argument, fallacies.pptx
1_Introductio_thinking, reasoning, logic, argument, fallacies.pptxHakimSudinpreeda
 

Similar to Katrin Erk - 2017 - What do you know about an alligator when you know the company it keeps? (20)

Variation in speech tempo: Capt. Kirk, Mr. Spock, and all of us in between
Variation in speech tempo: Capt. Kirk, Mr. Spock, and all of us in betweenVariation in speech tempo: Capt. Kirk, Mr. Spock, and all of us in between
Variation in speech tempo: Capt. Kirk, Mr. Spock, and all of us in between
 
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...
 
Tok 2 1
Tok 2 1Tok 2 1
Tok 2 1
 
Week 5 genetic basis of evolution
Week 5   genetic basis of evolutionWeek 5   genetic basis of evolution
Week 5 genetic basis of evolution
 
Fuzzy mathematics:An application oriented introduction
Fuzzy mathematics:An application oriented introductionFuzzy mathematics:An application oriented introduction
Fuzzy mathematics:An application oriented introduction
 
You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8
You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8
You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8
 
Classical and Fuzzy Relations
Classical and Fuzzy RelationsClassical and Fuzzy Relations
Classical and Fuzzy Relations
 
Joint Rumour Stance and Veracity
Joint Rumour Stance and VeracityJoint Rumour Stance and Veracity
Joint Rumour Stance and Veracity
 
Cat Essay Writer
Cat Essay WriterCat Essay Writer
Cat Essay Writer
 
DATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptxDATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptx
 
Uncertain Knowledge in AI from Object Automation
Uncertain Knowledge in AI from Object Automation Uncertain Knowledge in AI from Object Automation
Uncertain Knowledge in AI from Object Automation
 
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
 
009 Essay Example Maxr. Online assignment writing service.
009 Essay Example Maxr. Online assignment writing service.009 Essay Example Maxr. Online assignment writing service.
009 Essay Example Maxr. Online assignment writing service.
 
Scientific Method Lecture 1 WA UAM 1MA/2 sem/2013
Scientific Method Lecture 1 WA UAM 1MA/2 sem/2013Scientific Method Lecture 1 WA UAM 1MA/2 sem/2013
Scientific Method Lecture 1 WA UAM 1MA/2 sem/2013
 
Loubier slide share_qualitative
Loubier slide share_qualitativeLoubier slide share_qualitative
Loubier slide share_qualitative
 
Essential human sciences in 2 lessons (with extension if required)
Essential human sciences in 2 lessons (with extension if required)Essential human sciences in 2 lessons (with extension if required)
Essential human sciences in 2 lessons (with extension if required)
 
Essential human sciences in 2 lessons (with extension if required)
Essential human sciences in 2 lessons (with extension if required)Essential human sciences in 2 lessons (with extension if required)
Essential human sciences in 2 lessons (with extension if required)
 
DH Tools Workshop #1: Text Analysis
DH Tools Workshop #1:  Text AnalysisDH Tools Workshop #1:  Text Analysis
DH Tools Workshop #1: Text Analysis
 
Thinking, Language, and Intelligence
Thinking, Language, and IntelligenceThinking, Language, and Intelligence
Thinking, Language, and Intelligence
 
1_Introductio_thinking, reasoning, logic, argument, fallacies.pptx
1_Introductio_thinking, reasoning, logic, argument, fallacies.pptx1_Introductio_thinking, reasoning, logic, argument, fallacies.pptx
1_Introductio_thinking, reasoning, logic, argument, fallacies.pptx
 

More from Association for Computational Linguistics

Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Association for Computational Linguistics
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Association for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Association for Computational Linguistics
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Association for Computational Linguistics
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Association for Computational Linguistics
 

More from Association for Computational Linguistics (20)

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal TextMuis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
 
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
 
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour AnalysisCastro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text SimplificationElior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 

Recently uploaded

Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 

Recently uploaded (20)

Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 

Katrin Erk - 2017 - What do you know about an alligator when you know the company it keeps?

  • 1. What do you know about an alligator when you know the company it keeps? Katrin Erk University of Texas at Austin STARSEM 2017
  • 2. Distributional semantics and you • Distributional models/Embeddings: An incredible success story in computational linguistics • Do you make use of distributional information, too? • Landauer & Dumais, 1997: “A solution to Plato’s problem” • How do humans acquire such a gigantic vocabulary in such a short time? • Much debate in psychology, experimental support: McDonald&Ramscar, 2001, Lazaridou et al, 2016 • But how about the linguistic side of the story?
  • 3. “A solution to Plato’s problem” “Many well-read adults know that Buddha sat long under a banyan tree (whatever that is) and Tahitian natives lived idyllically on breadfruit and poi (whatever those are). More or less correct usage often precedes referential knowledge” (Landauer&Dumais, 1997)
  • 4. “A solution to Plato’s problem” “Many well-read adults know that Buddha sat long under a banyan tree (whatever that is) and Tahitian natives lived idyllically on breadfruit and poi (whatever those are). More or less correct usage often precedes referential knowledge” ” (Landauer&Dumais, 1997) But wait: How can you use the word “banyan” more or less correctly when you are not aware of its reference? When you couldn’t point out a banyan in a yard?
  • 5. Learning about word meaning from textual context • Main aim: insight • What information is present in distributional representations, and why? • Assuming a learner with grounded concepts: How can distributional information contribute?
  • 6. Learning about meaning from textual context Suppose you do not know what an alligator is. What do these sentence tell you about alligators? • On our last evening, the boatman killed an alligator as it crawled past our camp-fire to go hunting in the reeds beyond. • A study done by Edwin Colbert and his colleagues showed that a tiny 50 gramme (1.76 oz) alligator heated up 1◦C every minute and a half from the Sun[…] • The throne was occupied by a pipe-smoking alligator.
  • 7. Learning about word meaning from textual context • Setting: adult learner • What kind of information can you get from text? • How does it enable you to use “alligator” more or less correctly? • Why can you learn anything from text? • Textual clues are rarely 100% reliable • “An alligator was lying at the bottom of a pool” • Could be an animal, a pool-cleaning implement…
  • 8. The story in a nutshell • How can I successfully use the word “alligator” when I don’t know what it refers to? • I know some properties of alligators: they are animals, dangerous, … • So then I use “alligator” in animal-like textual contexts
  • 9. The story in a nutshell • How does distributional information help? • It lets me infer properties of words: • Suppose I don’t know what an alligator is • But it appears in similar contexts as “crocodile” • So it must be something like a crocodile: • That is, it must share properties with a crocodile • So it may be an animal, it may be dangerous…
  • 10. The story in a nutshell • But distributional information can never yield certain knowledge • Instead uncertain, probabilistic information • Formal semantics framework • Probabilistic semantics: • Probability distribution over worlds that could be the current one • Probability of a world influenced by distributional information
  • 11. Plan • What can an agent learn from distributional context? • A probabilistic information state • Influencing a probabilistic information state with distributional information • A toy experiment
  • 12. What is in an embedding? • What information can be encoded in an embedding computed from text data? • Lots of things, given the right objective function • But: • What objective function can we assume a human agent to use? • What individual linguistic phenomena have been shown to be encoded? • So, restrict ourselves to simple model
  • 13. What is in an embedding? • Count-based models of textual context • (and neural models like word2vec, see Levy&Goldberg 2015) • Long-time criticism in psychology, eg. Murphy (2002): only a vague notion of “similarity” • But in fact distributional models can distinguish between semantic relations • by choice of what “context” means • through relation-specific classifiers (Fu et al, 2014; Levy et al, 2015; Shwartz et al, 2016; Roller& Erk, 2016, …)
  • 14. The effect of context window size • Peirsman 2008 (Dutch): • Narrow context window: high ratings to “similar” words • Particularly to co-hyponyms • Syntactic context even more so • Wide context window: high ratings to “related” words • Baroni/Lenci 2011 (English): • Narrow context window: highest ratings to co-hyponyms • Wide context window: ratings equal across many relations
  • 15. What is narrow-window similarity? • High ratings for co-hyponyms, also synonyms, some hypernyms, antonyms (well-known bug) • What semantic relation is that? • Co-hyponymy is an odd relation • dictionary-specific • can be incompatible (cat/dog) or compatible (hotel/restaurant) • Proposal: property overlap • Alligator, crocodile have many properties in common: animal, reptile, scaly, dangerous, …
  • 16. Why does narrow-window similarity do this? • Focus on noun targets • Narrow window, syntactic context contain: • Modifiers • Verbs that take target as argument • Selectional constraints • Traditionally formulated in terms of taxonomic properties • subject of “crawl”: animate
  • 17. But wait, where do the probabilities come from? • Frequency in text is not frequency in real life • Reporting bias: Almost no one says “Bananas are yellow” (Bruni et al, 2012) • Genre bias: “captive” and “westerner” respective nearest neighbors in Lin 1998 • Then how can counts in text lead us to probabilities relevant to grounded concepts?
  • 18. But wait, where do the probabilities come from? • Two tricks in this study 1. Only consider properties that apply to all members of a category (like “being an animal”) 2. Use distributional context only indirectly: Learn correlation between distributional context and real- world properties • More recent work: trick 2 without trick 1 • I think we can use distributional context directly and properly to get probabilities – more later
  • 19. Learning properties from distributional data • Concrete noun concepts • To learn: properties of a concept • Focus on properties applying to all members of a category (like taxonomic properties) • Broad definition of a property: can be expressed as an adjective, can be a hypernym, …
  • 20. Property overlap • Percentage of properties that are joint • Jaccard coefficient on sets • A, B, sets of properties: • Degrees of property overlap • Idea: The more properties in common, the higher the distributional similarity Jac(A, B) = |A B| |A [ B| jac = 2 / 6 = 0.33
  • 21. Plan • What can an agent learn from distributional context? • A probabilistic information state • Influencing a probabilistic information state with distributional information • A toy experiment
  • 22. Information states • Information state of Agent: set of worlds that the agent considers possibilities • Agent not omniscient • As far as Agent is concerned, any of these worlds could be the actual world • Update semantics: Information state updated through communication (Veltman 1996) • Probabilistic information state: probability distribution over worlds (van Benthem et al. 2009, Zeevat 2013) • Not all worlds equally likely to be the actual world
  • 23. Probabilistic logics • Uncertainty about the world we are in • Probability distribution over worlds • Nilsson 1986 • Probability that a sentence is true depends on the probabilities of the worlds in which it is true P(') = X w:||'||w=t P(w)
  • 24. Generating a probability distribution over worlds • Text understanding as a generative process • Agent mentally simulates (i.e., probabilistically generates) the situation described in the text • Goodman et al, 2015; Goodman and Lassiter, 2016 • To generate a person: • draw gender: flip a fair coin • draw height from the normal distribution of heights for that gender.
  • 25. Properties in a probabilistic information state • Property applies in a particular world: extension of predicate included in extension of property in that world • Focus here: Properties that the agent is certain about: apply in all worlds that have non-zero probability
  • 26. Plan • What can an agent learn from distributional context? • A probabilistic information state • Influencing a probabilistic information state with distributional information • A toy experiment
  • 27. Bayesian update on the probability distribution over worlds • Prior distribution over worlds P0 • Then we see distributional evidence Edist • e.g.: Distributional similarity of “crocodile” and “alligator” is 0.93 • Posterior distribution P1 given Edist • How do we determine the likelihood? P1(w) = P(w|Edist) = P(Edist|w)P0(w) P(Edist)
  • 28. Interpreting distributional data • Speaker observes words with known properties, and their distributional similarity Property overlap from McRae feature norms (McRae et al 2005). Similarities from a narrow-context model computed on UKWaC+ Wikipedia+BNC word 1 word 2 ovl sim peacock raven 0.29 0.70 mixer toaster 0.19 0.72 crocodile frog 0.17 0.86 bagpipe banjo 0.10 0.72 scissors typewriter 0.04 0.62 crocodile lime 0.03 0.33 coconut porcupine 0.03 0.42
  • 29. Observing regularities: high property overlap goes with high distributional similarity word 1 word 2 ovl sim peacock raven 0.29 0.70 mixer toaster 0.19 0.72 crocodile frog 0.17 0.86 bagpipe banjo 0.10 0.72 scissors typewriter 0.04 0.62 crocodile lime 0.03 0.33 coconut porcupine 0.03 0.42 0.05 0.10 0.15 0.20 0.25 0.30 0.20.61.0 Property overlap versus similarity (artificial data) property overlap dist.sim. In the simplest case: linear regression.
  • 30. Given the regularities I observed, and the distributional evidence, what do I now think of world w? • World w: • property overlap of crocodile and alligator is o = 0.1 • Predicted similarity: • Distributional evidence: sim(crocodile, alligator) = 0.93 • How likely are we to observe a distributional similarity of 0.93 if the predicted similarity is 0.53? • Standard move in hypothesis testing: How likely to see an observed value this high or higher given the predicted distribution? 0 + 1o = 0.53
  • 31. Likelihood of the distributional evidence in this world • What distribution? • Equivalent view of linear regression: Observed similarity = predicted similarity + normally distributed error • Normal distribution with mean f(o) = 0 + 1o 0.00.10.20.30.4 dist.rating prob.density f(o) 0.00.10.20.30.4 prob.density
  • 32. Likelihood of the distributional evidence in this world • Distributional similarity s = sim(crocodile, alligator) • Hypothesis testing: How likely to see similarity value as high as s or higher given property overlap o? 0.00.10.20.30.4 prob.density f(o) 0.00.10.20.30.4 prob.density f(o) s
  • 33. Computing posterior probabilities in a probabilistic generative framework • Probabilistically generate worlds: • “To generate a person, flip a fair coin to determine their gender…” • Approximately determine probability distribution over worlds: Sample n probabilistically generated worlds • Sample from posterior: • Rejection sampling • Formulate likelihood as a sampling condition
  • 34. Computing posterior probabilities in a probabilistic generative framework • Property overlap o between crocodiles and alligators in world w • Distributional similarity s = sim(crocodile, alligator) • Keep w if similarity as high as s or higher is likely given o • Sample s’ from the normal distribution with mean f(o) • Keep world w if s’ >= s 0.00.10.20.30.4 prob.density f(o) 0.00.10.20.30.4 prob.density f(o) s
  • 35. Plan • What can an agent learn from distributional context? • A probabilistic information state • Influencing a probabilistic information state with distributional information • A toy experiment
  • 36. Toy experiments • Property collection: McRae et al., 2005 • Human-generated definitional features for concrete noun properties • Distributional model: narrow context, UKWaC + Wikipedia + BNC • Hold out alligator as unknown word • Given distributional evidence, how likely are we to believe… 1. All alligators are dangerous 2. All alligators are edible 3. All alligators are animals
  • 37. Toy experiments • All alligators are dangerous: • Known word: crocodile. sim(alligator, crocodile) = 0.93 • Crocodiles are animals, dangerous, scaly, and crocodiles • All alligators are edible: • Known word: trout. sim(alligator, trout) = 0.68 • Trouts are animals, aquatic, edible, and trouts • Probability should be lower because similarity is lower • All alligators are animals: • Known words: crocodile, trout. • Can evidence accumulate with multiple similarity ratings?
  • 38. Generative story for the prior probability • Fix domain size to 10 • For each entity in the domain: • Flip a fair coin to determine if it is a crocodile. Likewise for alligator. • For each entity in the domain: • If it is a crocodile, it is also an animal, dangerous, and scaly. • Otherwise, flip a fair coin to see if it is an animal (dangerous, scaly). Implemented in Church.
  • 39. Results: All alligators are… Sentence words sim prior posterior . . . dangerous alligator, crocodile 0.93 0.26 0.47 . . . edible alligator, trout 0.68 0.26 0.38 • Aim: Significant increase in probability • Absolute probabilities depend on domain size, problem formulation • Higher similarities lead to significantly more confident inferences • “Crocodile” much more similar to “alligator” than “trout”: Agent more confidently ascribes crocodile properties to alligators
  • 40. Probability of property overlap: prior versus posterior 0 0.2 0.4 0.6 0.8 1 no dist. evidence with dist. evidence Property overlap of 'alligator' and 'crocodile' prop. overlap num.worlds 0200400600800 0 0.2 0.4 0.6 0.8 1 no dist. evidence with dist. evidence Property overlap of 'alligator' and trout' prop. overlap num.worlds 0200400600800 Alligator vs crocodile Alligator vs trout prior posterior
  • 41. Accumulating evidence: “All alligators are animals” sim of alligator to. . . prior posterior crocodile: 0.93 0.53 0.68 trout: 0.68 0.53 0.63 crocodile: 0.93, trout: 0.68 0.53 0.80 • Does distributional evidence accumulate? • Both crocodiles and trouts are known to be animals • Posterior significantly higher when two pieces of evidence present
  • 42. Summary • How can people use a word whose reference they don’t know? • Suppose we don’t know what an alligator is, can we still infer from context clues that it’s an animal? • Proposal: • (Narrow-window) distributional evidence is property overlap evidence • Distributional evidence affects probabilistic information state • Can be described in probabilistic generative framework
  • 43. Next questions • Learning from a single sentence only • On our last evening, the boatman killed an alligator as it crawled past our camp-fire to go hunting in the reeds beyond. • Distributional one-shot learning • Doable: same setup, learn McRae et al. definitional features using selectional constraints of neighboring predicates • Properties that do not apply to all members of a category • Some but not all crocodiles are dangerous • Learn probability of generating a property for “alligator”
  • 44. Next questions • Here: Learn from context only indirectly, from correlation with grounded properties • Can we learn from what is said in the text? • On our last evening, the boatman killed an alligator as it crawled past our camp-fire to go hunting in the reeds beyond. • Alligators are entities that generally crawl, hunt, and are found in reeds • P(q is a generic property of alligators that would be mentioned by people) • Relevant to “human experience of alligators” (Thill/Padó/Ziemke 2014)
  • 45. Thanks Gemma Boleda, Louise McNally, Judith Tonhauser (best editor on earth!), Nicholas Asher, Marco Baroni, David Beaver, John Beavers, Ann Copestake, Ido Dagan, Aurélie Herbelot, Hans Kamp, Alexander Koller, Alessandro Lenci, Sebastian Löbner, Julian Michael, Ray Mooney, Sebastian Padó, Manfred Pinkal, Stephen Roller, Hinrich Schütze, Jan van Eijck, Leah Velleman, Steve Wechsler, Roberto Zamparelli, and the Foundations of Semantic Spaces reading group