SlideShare a Scribd company logo
1 of 85
Download to read offline
Seman&c	
  Analysis	
  in	
  Language	
  Technology	
  
http://stp.lingfil.uu.se/~santinim/sais/2016/sais_2016.htm 



Semantic Word Clouds
Marina	
  San(ni	
  
san$nim@stp.lingfil.uu.se	
  
	
  
Department	
  of	
  Linguis(cs	
  and	
  Philology	
  
Uppsala	
  University,	
  Uppsala,	
  Sweden	
  
	
  
Spring	
  2016	
  
	
  
	
  
Previous	
  lecture:	
  Ontologies	
  
2	
  
Semantic Web & Ontologies
•  The	
  goal	
  of	
  the	
  Seman(c	
  Web	
  is	
  to	
  allow	
  web	
  informa(on	
  and	
  services	
  to	
  be	
  more	
  
effec(vely	
  exploited	
  by	
  humans	
  and	
  automated	
  tools.	
  	
  
•  Essen(ally,	
  the	
  focus	
  of	
  the	
  seman(c	
  web	
  is	
  to	
  share	
  data	
  instead	
  of	
  documents.	
  	
  
•  This	
  data	
  must	
  be	
  ”meaningful”	
  both	
  for	
  human	
  and	
  for	
  machines	
  (ie	
  automated	
  tools	
  and	
  
web	
  applica(ons)	
  
•  Q:	
  How	
  are	
  we	
  going	
  to	
  represent	
  meaning	
  and	
  knowledge	
  on	
  the	
  web?	
  
•  A:	
  …	
  via	
  annota&on.	
  	
  
•  Knowledge	
  is	
  represented	
  in	
  the	
  form	
  of	
  rich	
  conceptual	
  schemas/formalisms	
  called	
  
ontologies.	
  	
  
•  Therefore,	
  ontologies	
  are	
  the	
  backbone	
  of	
  the	
  Seman(c	
  Web.	
  
•  Ontologies	
  give	
  formally	
  defined	
  meanings	
  to	
  the	
  terms	
  used	
  in	
  annota&ons,	
  transforming	
  
them	
  into	
  seman&c	
  annota&ons.	
   3
Ontologies	
  are…	
  
•  …	
  concepts	
  that	
  are	
  
hierarchically	
  
organized	
  
4	
  
Tree	
  of	
  Porphyry,	
  III	
  AD	
  
Wordnet,	
  XXI	
  AD	
  (see	
  Lect	
  5,	
  ex	
  similarity	
  measures)	
  
Reasoning:	
  
RDF/OWL	
  vs	
  Databases	
  (and	
  other	
  data	
  structures)	
  
OWL	
  axioms	
  behave	
  like	
  inference	
  rules	
  rather	
  than	
  database	
  constraints.	
  	
  
!
Class: Phoenix!
!SubClassOf: isPetOf only Wizard!
!
Individual: Fawkes!
Types: Phoenix!
Facts: isPetOf Dumbledore!
•  Fawkes	
  is	
  said	
  to	
  be	
  a	
  Phoenix	
  and	
  to	
  be	
  the	
  pet	
  of	
  Dumbledore,	
  and	
  it	
  is	
  also	
  stated	
  that	
  only	
  a	
  
Wizard	
  can	
  have	
  a	
  pet	
  Phoenix.	
  	
  
•  In	
  OWL,	
  this	
  leads	
  to	
  the	
  implica(on	
  that	
  Dumbledore	
  is	
  a	
  Wizard.	
  That	
  is,	
  if	
  we	
  were	
  to	
  query	
  the	
  
ontology	
  for	
  instances	
  of	
  Wizard,	
  then	
  Dumbledore	
  would	
  be	
  part	
  of	
  the	
  answer.	
  	
  
•  In	
  a	
  database	
  se[ng	
  the	
  schema	
  could	
  include	
  a	
  similar	
  statement	
  about	
  the	
  Phoenix	
  class,	
  but	
  in	
  
this	
  case	
  it	
  would	
  be	
  interpreted	
  as	
  a	
  constraint	
  on	
  the	
  data:	
  adding	
  the	
  fact	
  that	
  Fawkes	
  isPetOf	
  
Dumbledore	
  without	
  Dumbledore	
  being	
  already	
  known	
  to	
  be	
  a	
  Wizard	
  would	
  lead	
  to	
  an	
  invalid	
  
database	
  state,	
  and	
  such	
  an	
  update	
  would	
  therefore	
  be	
  rejected	
  by	
  a	
  database	
  management	
  
system	
  as	
  a	
  constraint	
  viola(on.	
  
5	
  
So, what is an ontology for us?
6
“An	
  ontology	
  is	
  a	
  FORMAl,	
  EXPLICIT	
  specifica&on	
  of	
  a	
  	
  SHARED	
  conceptualiza&on”	
  
Studer,	
  Benjamins,	
  Fensel.	
  Knowledge	
  Engineering:	
  Principles	
  and	
  Methods.	
  Data	
  and	
  Knowledge	
  Engineering.	
  25	
  (1998)	
  161-­‐197	
  
	
  
An ontology is an explicit specification of a conceptualization
Gruber, T. A translation Approach to portable ontology specifications. Knowledge Acquisition. Vol. 5. 1993. 199-220
Abstract model and
simplified view of some
phenomenon in the world
that we want to represent
Machine-readable
Concepts, properties
relations, functions,
constraints, axioms,
are explicitly defined
Consensual
Knowledge
How	
  to	
  build	
  an	
  ontology	
  
Generally	
  speaking	
  (and	
  roughly	
  said),	
  when	
  
designing	
  an	
  ontology,	
  four	
  main	
  components	
  
are	
  used:	
  
1.  Classes	
  
2.  Rela(ons	
  
3.  Axioms	
  
4.  Instances	
  
	
  
	
   7	
  
Prac(cal	
  Ac(vity:	
  emo(ons	
  
8	
  
Your	
  remarks:	
  
•  Emo(ons	
  are	
  ambiguous:	
  
eg.	
  happiness	
  can	
  be	
  also	
  
ill-­‐directed	
  
•  The	
  polarity	
  of	
  some	
  
emo(ons	
  cannot	
  be	
  
assessed…	
  
•  etc.	
  	
  
	
  
Classes	
  
Rela(ons	
  
Axioms	
  
Instances	
  
etc.	
  
	
  
Occupa(onal	
  psychology	
  (wikipedia)	
  
•  Industrial	
  and	
  organiza(onal	
  psychology	
  (also	
  known	
  as	
  I–O	
  
psychology,	
  occupa(onal	
  psychology,	
  work	
  psychology,	
  WO	
  
psychology,	
  IWO	
  psychology	
  and	
  business	
  psychology)	
  is	
  the	
  
scien$fic	
  study	
  of	
  human	
  behavior	
  in	
  the	
  workplace	
  and	
  applies	
  
psychological	
  theories	
  and	
  principles	
  to	
  organiza(ons	
  and	
  
individuals	
  in	
  their	
  workplace.	
  
•  I-­‐O	
  psychologists	
  are	
  trained	
  in	
  the	
  scien(st–prac((oner	
  model.	
  I-­‐O	
  
psychologists	
  contribute	
  to	
  an	
  organiza(on's	
  success	
  by	
  improving	
  
the	
  performance,	
  mo(va(on,	
  job	
  sa(sfac(on,	
  occupa(onal	
  safety	
  
and	
  health	
  as	
  well	
  as	
  the	
  overall	
  health	
  and	
  well-­‐being	
  of	
  its	
  
employees.	
  An	
  I–O	
  psychologist	
  conducts	
  research	
  on	
  employee	
  
behaviors	
  and	
  a[tudes,	
  and	
  how	
  these	
  can	
  be	
  improved	
  through	
  
hiring	
  prac(ces,	
  training	
  programs,	
  feedback,	
  and	
  management	
  
systems.	
  
9	
  
In	
  summary…	
  
Why	
  to	
  build	
  an	
  ontology?	
  
	
  
•  To	
  share	
  common	
  understanding	
  of	
  the	
  structure	
  
of	
  informa(on	
  among	
  people	
  or	
  machines	
  
•  To	
  make	
  domain	
  assump$ons	
  explicit	
  
•  Ojen	
  based	
  on	
  controlled	
  vocabulary	
  
•  To	
  analyze	
  domain	
  knowledge	
  
•  To	
  enable	
  reuse	
  of	
  domain	
  knowledge	
  
10	
  
Ontologies	
  and	
  Tags	
  
•  Ontologies	
  and	
  tagging	
  systems	
  are	
  two	
  different	
  
ways	
  to	
  organize	
  the	
  knowledge	
  present	
  in	
  Web.	
  	
  
•  The	
  first	
  one	
  has	
  a	
  formal	
  fundamental	
  that	
  
derives	
  from	
  descrip(ve	
  logic	
  and	
  ar(ficial	
  
intelligence.	
  Domain	
  experts	
  decide	
  the	
  terms.	
  
•  The	
  other	
  one	
  is	
  simpler	
  and	
  it	
  integrates	
  
heterogeneous	
  contents,	
  and	
  it	
  is	
  based	
  on	
  the	
  
collabora(on	
  of	
  users	
  in	
  the	
  Web	
  2.0.	
  User-­‐	
  
generated	
  annota(on.	
  	
  
11	
  
Folksonomies	
  
•  Tagging	
  facili(es	
  within	
  Web	
  2.0	
  applica(ons	
  
have	
  shown	
  how	
  it	
  might	
  be	
  possible	
  for	
  user	
  
communi$es	
  to	
  collabora$vely	
  annotate	
  web	
  
content,	
  and	
  create	
  simple	
  forms	
  of	
  ontology	
  
via	
  the	
  development	
  of	
  loosely-­‐hierarchically	
  
organised	
  sets	
  of	
  tags,	
  oNen	
  called	
  
folksonomies….	
  	
  
12	
  
Folksonomy=Social	
  Tagging	
  
•  Folksonomies	
  (also	
  known	
  as	
  social	
  tagging)	
  are	
  
user-­‐defined	
  metadata	
  collec(ons.	
  	
  
•  Users	
  do	
  not	
  deliberately	
  create	
  folksonomies	
  
and	
  there	
  is	
  rarely	
  a	
  prescribed	
  purpose,	
  but	
  a	
  
folksonomy	
  evolves	
  when	
  many	
  users	
  create	
  or	
  
store	
  content	
  at	
  par(cular	
  sites	
  and	
  iden(fy	
  what	
  
they	
  think	
  the	
  content	
  is	
  about.	
  	
  
•  “Tag	
  clouds”	
  pinpoint	
  the	
  frequency	
  of	
  certain	
  
tags.	
  
13	
  
•  A	
  common	
  
way	
  to	
  
organize	
  tags	
  
is	
  in	
  tag	
  
clouds…	
  
14	
  
Automa(c	
  folksonomy	
  construc(on	
  
•  The	
  collec(ve	
  knowledge	
  expressed	
  though	
  user-­‐
generated	
  tags	
  has	
  a	
  great	
  poten(al.	
  	
  
•  However,	
  we	
  need	
  tools	
  to	
  efficiently	
  aggregate	
  
data	
  from	
  large	
  numbers	
  of	
  users	
  with	
  highly	
  
idiosyncra$c	
  vocabularies	
  and	
  invented	
  words	
  
or	
  expressions.	
  	
  
•  Many	
  approaches	
  to	
  automa(c	
  folksonomy	
  
construc(on	
  combine	
  tags	
  using	
  sta(s(cal	
  
methods	
  ...	
  	
  
•  Ample	
  space	
  for	
  improvement…	
  
15	
  
Ontology,	
  taxonomy,	
  folksonomy,	
  etc.	
  	
  
•  Many	
  different	
  defini(ons…	
  
•  A	
  good	
  summary	
  and	
  interpreta(on	
  is	
  here:	
  
hpp://www.ideaeng.com/taxonomies-­‐
ontologies-­‐0602	
  	
  
16	
  
Today…	
  
•  We	
  will	
  talk	
  more	
  generally	
  about	
  word	
  
clouds…	
  
17	
  
Further	
  Reading	
  
Seman&c	
  Similarity	
  from	
  Natural	
  Language	
  and	
  Ontology	
  Analysis	
  
by	
  Sébas(en	
  Harispe,	
  Sylvie	
  Ranwez,	
  Stefan	
  Janaqi,	
  and	
  Jacky	
  
Montmain	
  
Synthesis	
  Lectures	
  on	
  Human	
  Language	
  Technologies,	
  May	
  2015,	
  Vol.	
  
8,	
  No.	
  1	
  
•  The	
  two	
  state-­‐of-­‐the-­‐art	
  approaches	
  for	
  es(ma(ng	
  and	
  quan(fying	
  
seman(c	
  similari(es/relatedness	
  of	
  seman(c	
  en((es	
  are	
  presented	
  
in	
  detail:	
  the	
  first	
  one	
  relies	
  on	
  corpora	
  analysis	
  and	
  is	
  based	
  on	
  
Natural	
  Language	
  Processing	
  techniques	
  and	
  seman(c	
  models	
  
while	
  the	
  second	
  is	
  based	
  on	
  more	
  or	
  less	
  formal,	
  computer-­‐
readable	
  and	
  workable	
  forms	
  of	
  knowledge	
  such	
  as	
  seman(c	
  
networks,	
  thesauri	
  or	
  ontologies.	
  
18	
  
Previous	
  lecture:	
  the	
  end	
  
19	
  
Acknowledgements	
  
This	
  presenta(on	
  is	
  based	
  on	
  the	
  following	
  paper:	
  	
  
•  Barth	
  et	
  al.	
  (2014).	
  Experimental	
  Comparison	
  of	
  Seman(c	
  
Word	
  Cloud.	
  In	
  Experimental	
  Algorithms,	
  Volume	
  8504	
  of	
  the	
  
series	
  Lecture	
  Notes	
  in	
  Computer	
  Science	
  pp	
  247-­‐258	
  	
  
–  Link:	
  hpps://www.cs.arizona.edu/~kobourov/wordle2.pdf	
  	
  
	
  
Some	
  slides	
  have	
  been	
  borrowed	
  from	
  Sergey	
  Pupyrev.	
  
20	
  
Today	
  
•  Experiments	
  on	
  seman&cs-­‐preserving	
  word	
  
clouds,	
  in	
  which	
  seman(cally	
  related	
  words	
  
are	
  close	
  to	
  each	
  other.	
  
21	
  
Outline	
  
•  What	
  is	
  a	
  Word	
  Cloud?	
  
•  3	
  early	
  algorithms	
  
•  3	
  new	
  algorithms	
  
•  Metrics	
  &	
  Quan(ta(ve	
  Evalua(on	
  
22	
  
Word	
  Clouds	
  
•  Word	
  clouds	
  have	
  become	
  a	
  standard	
  tool	
  for	
  
abstrac(ng,	
  visualizing	
  and	
  comparing	
  texts…	
  
•  We	
  could	
  apply	
  the	
  same	
  or	
  similar	
  
techniques	
  to	
  the	
  huge	
  amonts	
  of	
  tags	
  
produced	
  by	
  users	
  interac(ng	
  in	
  the	
  social	
  
networks	
  	
  
23	
  
Comparison	
  &	
  conceptualiza(on	
  Tool	
  
24	
  
•  Word	
  Clouds	
  as	
  a	
  tool	
  for	
  ”conceptualizing”	
  documents.	
  Cf	
  
Ontologies	
  
•  Ex:	
  2008,	
  	
  comparison	
  of	
  speeches:	
  Obama	
  vs	
  McCain	
  
Cf.	
  Lect	
  10:	
  
Extrac(ve	
  
summariza(on	
  &	
  	
  
Abstrac(ve	
  
summariza(on	
  
Word	
  Clouds	
  and	
  Tag	
  Clouds…	
  
•  …	
  are	
  ojen	
  used	
  to	
  represent	
  importance	
  
among	
  terms	
  (ex,	
  band	
  popularity)	
  or	
  serve	
  as	
  
a	
  naviga(on	
  tool	
  (ex,	
  Google	
  search	
  results).	
  
25	
  
The	
  Problem…	
  
• How	
  to	
  compute	
  seman(c-­‐preserving	
  word	
  
clouds	
  in	
  which	
  seman(cally-­‐related	
  words	
  
are	
  close	
  to	
  each	
  other?	
  
26	
  
Wordle	
  
hpp://www.wordle.net	
  	
  
•  Prac(cal	
  tools,	
  like	
  Wordle,	
  
make	
  word	
  cloud	
  visualiza(on	
  
easy.	
  
They	
  offer	
  an	
  appealing	
  way	
  
to	
  SUMMARIZE	
  text…	
  
Shortoming:	
  they	
  do	
  not	
  capture	
  
the	
  rela(onships	
  between	
  words	
  in	
  
any	
  way	
  since	
  word	
  placement	
  is	
  
independent	
  of	
  context	
  
27	
  
Many	
  word	
  clouds	
  are	
  arranged	
  randomly	
  (look	
  
also	
  at	
  the	
  scapered	
  colours)	
  
28	
  
Paperns	
  and	
  Vicinity/Adjacency	
  
Humans	
  are	
  spontaneously	
  papern-­‐seekers:	
  
	
  
if	
  they	
  see	
  two	
  words	
  close	
  to	
  each	
  other	
  in	
  a	
  
word	
  cloud,	
  they	
  spontaneously	
  think	
  they	
  are	
  
related…	
  
29	
  
In	
  Linguis(cs	
  and	
  NLP…	
  
•  This	
  natural	
  tendency	
  in	
  linking	
  spacial	
  vicinity	
  
to	
  seman&c	
  relatedness	
  is	
  exploited	
  as	
  
evidence	
  that	
  words	
  are	
  seman(cally	
  related	
  
or	
  seman(cally	
  similar…	
  
Remember?	
  :	
  ”You	
  shall	
  know	
  a	
  word	
  by	
  the	
  
company	
  it	
  keeps	
  (Firth,	
  J.	
  R.	
  1957:11)”	
  	
  
30	
  
So,	
  it	
  makes	
  sense	
  to	
  place	
  such	
  related	
  words	
  close	
  
to	
  each	
  other	
  (look	
  also	
  at	
  the	
  color	
  distribu(on)	
  
31	
  
Seman(c	
  word	
  clouds	
  have	
  higher	
  user	
  
sa(sfac(on	
  compared	
  to	
  other	
  layouts…	
  
32	
  
All	
  recent	
  word	
  cloud	
  visualiza(on	
  tools	
  aim	
  to	
  
incoprorate	
  seman(cs	
  in	
  the	
  layout…	
  	
  
33	
  
…	
  but	
  none	
  of	
  them	
  provide	
  any	
  guarantee	
  about	
  the	
  
quality	
  of	
  the	
  layout	
  in	
  terms	
  of	
  seman(cs	
  
34	
  
Early	
  algorithms:	
  Force-­‐Directed	
  Graph	
  
•  Most	
  of	
  the	
  exis(ng	
  algorithms	
  are	
  based	
  
on	
  force-­‐directed	
  graph	
  layout.	
  	
  
•  Force-­‐directed	
  graph	
  drawing	
  algorithms	
  
are	
  a	
  class	
  of	
  algorithms	
  for	
  drawing	
  
graphs	
  in	
  an	
  aesthe(cally	
  pleasing	
  way	
  
–  Aprac(ve	
  forces	
  between	
  pairs	
  to	
  reduce	
  
empty	
  space	
  
–  Repulsive	
  forces	
  ensure	
  that	
  words	
  do	
  not	
  
overlap	
  
–  Final	
  force	
  preserve	
  seman(c	
  rela(ons	
  
between	
  words.	
  	
  
35	
  
Some	
  of	
  the	
  most	
  flexible	
  
algorithms	
  for	
  calcula(ng	
  
layouts	
  of	
  simple	
  undirected	
  
graphs	
  belong	
  to	
  a	
  class	
  
known	
  as	
  force-­‐directed	
  
algorithms.	
  Such	
  algorithms	
  
calculate	
  the	
  layout	
  of	
  a	
  
graph	
  using	
  only	
  
informa(on	
  contained	
  
within	
  the	
  structure	
  of	
  the	
  
graph	
  itself,	
  rather	
  than	
  
relying	
  on	
  domain-­‐specific	
  
knowledge.	
  Graphs	
  drawn	
  
with	
  these	
  algorithms	
  tend	
  
to	
  be	
  aesthe(cally	
  pleasing,	
  
exhibit	
  symmetries,	
  and	
  
tend	
  to	
  produce	
  crossing-­‐
free	
  layouts	
  for	
  planar	
  
graphs.	
  
Newer	
  Algorithms:	
  rectangle	
  
representa(on	
  of	
  graphs	
  
•  Vertex-­‐weighted	
  and	
  edge-­‐weighed	
  graph:	
  
–  The	
  ver(ces	
  of	
  the	
  graph	
  are	
  the	
  words	
  
•  Their	
  weight	
  correspond	
  to	
  some	
  measure	
  of	
  importance	
  
(eg.	
  word	
  frequencies)	
  
–  The	
  edges	
  capture	
  the	
  seman(c	
  relatedness	
  of	
  pair	
  of	
  
words	
  (eg.	
  co-­‐occurrence)	
  
•  Their	
  weight	
  correspond	
  to	
  the	
  strength	
  of	
  the	
  rela(on	
  
–  Each	
  vertex	
  can	
  be	
  drawn	
  as	
  a	
  box	
  (rectangle)	
  with	
  a	
  
dimension	
  determing	
  by	
  its	
  weight	
  
–  A	
  realized	
  adjacency	
  	
  is	
  the	
  sum	
  of	
  the	
  edge	
  weights	
  
for	
  all	
  pairs	
  of	
  touching	
  boxes.	
  	
  
–  The	
  goal	
  is	
  to	
  maximize	
  the	
  realized	
  adjacencies.	
  
36	
  
Purpose	
  of	
  the	
  experiments	
  that	
  are	
  shown	
  
here:	
  
•  Seman(cs	
  preserva(on	
  in	
  terms	
  of	
  closeness/
vicinity/adjacency	
  
37	
  
Example	
  
•  A	
  contact	
  of	
  2	
  boxes	
  is	
  a	
  common	
  boundary.	
  
•  The	
  contact	
  of	
  two	
  boxes	
  is	
  interpredet	
  as	
  
seman(c	
  relatedness	
  
•  The	
  contact	
  of	
  2	
  boxes	
  can	
  be	
  calculated,	
  so	
  the	
  
adjacency	
  can	
  be	
  computed	
  and	
  evaluated.	
  
38	
  
Preprocessing:	
  	
  
1)	
  Term	
  Extrac(on	
  	
  
2)	
  Ranking	
  	
  
3)	
  Similarity/Dissimilarity	
  Computa(on	
  
39	
  
•  Similarity/dissimilarity	
  matrix	
  
40	
  
Lect	
  6:	
  
Repe((on	
  
large	
   data	
   computer	
  
apricot	
   1	
   0	
   0	
  
digital	
   0	
   1	
   2	
  
informa(on	
   1	
   6	
   1	
  
41	
  
Which	
  pair	
  of	
  words	
  is	
  more	
  similar?	
  
cosine(apricot,informa(on)	
  =	
  	
  
	
  
	
  
cosine(digital,informa(on)	
  =	
  
	
  
	
  
	
  
cosine(apricot,digital)	
  =	
  
	
  
cos(

v,

w)=

v•

w

v

w
=

v

v
•

w

w
=
viwii=1
N
∑
vi
2
i=1
N
∑ wi
2
i=1
N
∑
1+0+0
1+0+0
1+36+1
1+36+1
0+1+4
0+1+4
1+0+0
0+6+2
0+0+0
=
1
38
=.16
=
8
38 5
=.58
= 0
Lect	
  06:	
  Other	
  possible	
  similarity	
  measures	
  
42	
  
Input	
  -­‐	
  Output	
  
•  The	
  input	
  for	
  all	
  algorithms	
  is	
  	
  
– a	
  collec(on	
  of	
  n	
  rectangles,	
  each	
  with	
  a	
  fixed	
  
width	
  and	
  height	
  propor(onal	
  to	
  the	
  rank	
  of	
  the	
  
word	
  
– A	
  similarity/dissimilarity	
  matrix	
  
•  The	
  output	
  is	
  a	
  set	
  of	
  non-­‐overlapping	
  
posi(ons	
  for	
  the	
  rectangles.	
  
43	
  
Early	
  Algorithms	
  
1.  Wordle	
  (Random)	
  
2.  Context-­‐Preserving	
  Word	
  Cloud	
  Visualiza(on	
  
(CPWCV)	
  
3.  Seam	
  Carving	
  
44	
  
Wordle	
  à	
  Random	
  
•  	
  The	
  Wordle	
  algorithm	
  places	
  one	
  word	
  at	
  a	
  (me	
  
in	
  a	
  greedy	
  fashion,	
  ie	
  aiming	
  to	
  use	
  space	
  as	
  
efficiently	
  as	
  possible.	
  	
  
•  First	
  the	
  words	
  are	
  sorted	
  by	
  weight/rank	
  in	
  
decreasing	
  order.	
  	
  
•  Then	
  for	
  each	
  word	
  in	
  the	
  order,	
  a	
  posi(on	
  is	
  
picked	
  at	
  random.	
  	
  
45	
  
1:	
  Random	
  
46	
  
2:	
  Random	
  
47	
  
3:	
  Random	
  
48	
  
4:	
  Random	
  
49	
  
5:	
  Random	
  
50	
  
6:	
  Random	
  
51	
  
Context-­‐Preserving	
  Word	
  Cloud	
  Visualiza(on	
  (CPWCV)	
  
	
  
•  First,	
  a	
  dissimilarity	
  matrix	
  is	
  computed	
  and	
  
Mul(dimensional	
  Scaling	
  (MDS)	
  is	
  performed	
  
•  Second,	
  effort	
  to	
  create	
  a	
  compact	
  layout	
  	
  
52	
  
Mul(dimensional	
  Scaling	
  
(MDS)	
  aims	
  at	
  detec(ng	
  
meaningful	
  underlying	
  
dimensions	
  in	
  the	
  data.	
  	
  
1:	
  Context-­‐Preserving	
  	
  
53	
  
2:	
  Context-­‐Preserving	
  :	
  repulsive	
  force	
  
54	
  
3:	
  Context-­‐Preserving	
  :	
  aprac(ve	
  force	
  
55	
  
Seam	
  Carving	
  
	
  
•  Basically,	
  an	
  algorithm	
  for	
  image	
  resizing	
  
•  It	
  was	
  invented	
  at	
  Mitsubishi’s	
  
56	
  
1:	
  Seam	
  Carving	
  
57	
  
2:	
  Seam	
  Carving	
  :	
  space	
  is	
  divided	
  into	
  
regions	
  
58	
  
3:	
  Seam	
  Carving	
  :	
  empty	
  paths	
  
trimmed	
  out	
  itera(vely	
  
59	
  
4:	
  Seam	
  Carving	
  
60	
  
5:	
  Seam	
  Carving	
  
61	
  
6:	
  Seam	
  Carving:	
  space	
  divided	
  into	
  
regions	
  
62	
  
7:	
  Seam	
  Carving	
  
63	
  
3	
  New	
  Algorithms	
  
1.  Inflate	
  and	
  Push	
  
2.  Star	
  Forest	
  
3.  Cycle	
  Cover	
  
64	
  
Inflate-­‐and-­‐Push	
  
•  Simple	
  heuris(c	
  method	
  for	
  word	
  layout,	
  which	
  aims	
  
to	
  preserve	
  seman(c	
  rela(ons	
  between	
  pair	
  of	
  words.	
  
•  Based	
  on	
  	
  
1.  Heuris(cs:	
  scaling	
  down	
  all	
  word	
  rectangles	
  by	
  some	
  
constant;	
  	
  
2.  Compu(ng	
  MDS	
  (mul(dimensional	
  scaling)	
  on	
  the	
  
dissimilarity	
  matrix	
  
3.  Iteretavely	
  increase	
  the	
  size	
  of	
  rectangles	
  by	
  5%	
  (ie	
  
”inflate”	
  words;	
  	
  
4.  When	
  words	
  overlaps,	
  apply	
  a	
  force-­‐directed	
  algorithm	
  
to	
  ”push”	
  words	
  away.	
  
65	
  
Inflate:	
  star(ng	
  point	
  
66	
  
Inflate	
  :	
  scaling	
  down	
  
67	
  
Inflate	
  :	
  seman(cally-­‐related	
  words	
  are	
  placed	
  close	
  
to	
  each	
  other.	
  Apply	
  ”inflate	
  words”	
  (5%)	
  itera(vely.	
  
68	
  
Inflate:	
  ”push	
  words”:	
  repulsive	
  force	
  
to	
  resolve	
  overlaps	
  
69	
  
Inflate:	
  final	
  stage	
  
70	
  
Star	
  Forest	
  
•  A	
  star	
  is	
  a	
  tree	
  	
  
•  A	
  star	
  forest	
  is	
  a	
  forest	
  whose	
  connected	
  
components	
  are	
  all	
  stars.	
  
71	
  
Repe((on:	
  trees	
  and	
  graphs	
  
•  A	
  tree	
  is	
  special	
  form	
  of	
  graph	
  i.e.	
  minimally	
  
connected	
  graph	
  and	
  having	
  only	
  one	
  path	
  between	
  
any	
  two	
  ver(ces.	
  	
  
•  In	
  a	
  graph	
  there	
  can	
  be	
  more	
  than	
  one	
  path	
  i.e.	
  graph	
  
can	
  have	
  uni-­‐direc(onal	
  or	
  bi-­‐direc(onal	
  paths	
  (edges)	
  
between	
  nodes.	
  
72	
  
Three	
  steps	
  
1.  Extrac(ng	
  the	
  star	
  forest:	
  par&&on	
  a	
  graph	
  
into	
  disjoint	
  stars	
  	
  
2.  Realising	
  a	
  star:	
  build	
  a	
  word	
  cloud	
  for	
  every	
  
star	
  
3.  Pack	
  all	
  the	
  stars	
  together	
  
73	
  
Star	
  Forest	
  :	
  star	
  =	
  tree	
  
1.  Extract	
  stars	
  greedily	
  from	
  a	
  dissimilarity	
  matrix	
  à	
  disjoint	
  stars	
  =	
  star	
  forest	
  
2.  Compute	
  the	
  op(mal	
  stars,	
  ie	
  the	
  best	
  set	
  of	
  words	
  to	
  be	
  adjacent	
  
3.  Aprac(ve	
  force	
  to	
  get	
  a	
  compact	
  layout	
  
74	
  
Cycle	
  Cover	
  
•  This	
  algorithm	
  is	
  based	
  on	
  a	
  similarity	
  matrix.	
  
•  First,	
  a	
  similarity	
  path	
  is	
  created	
  
•  Then,	
  the	
  op(mal	
  level	
  of	
  compact-­‐ness	
  is	
  computed	
  
75	
  
Quan(ta(ve	
  Metrics	
  
76	
  
1.  Realized	
  Adjacenies	
  
–  how	
  close	
  are	
  similar	
  words	
  to	
  each	
  
other?	
  
2.  Distor(on	
  
–  how	
  distant	
  are	
  dissimilar	
  words?	
  
3.  Uniform	
  Area	
  U(liza(on	
  
–  uniformity	
  of	
  the	
  distribu(on	
  
(overpopulated	
  vs	
  sparse	
  areas	
  in	
  
the	
  word	
  cloud)	
  
4.  Comptactness	
  
–  how	
  well	
  u(lized	
  is	
  the	
  drawing	
  
area?	
  
5.  Aspect	
  Ra(o	
  
–  width	
  and	
  height	
  of	
  the	
  bounding	
  
box	
  
6.  Running	
  Time	
  
–  execu(on	
  (me	
  
2	
  datasets	
  
	
  (1)	
  WIKI	
  ,	
  a	
  set	
  of	
  112	
  	
  plain-­‐text	
  ar(cles	
  
extracted	
  from	
  the	
  English	
  Wikipedia,	
  each	
  
consis(ng	
  of	
  at	
  least	
  200	
  	
  dis(nct	
  words	
  
	
  
(2)	
  PAPERS	
  ,	
  a	
  set	
  of	
  56	
  	
  research	
  papers	
  
published	
  in	
  conferences	
  on	
  experimental	
  
algorithms	
  (SEA	
  and	
  ALENEX)	
  in	
  2011-­‐2012.	
  
77	
  
Cycle	
  Cover	
  wins	
  
78	
  
Seam	
  Carving	
  wins	
  
79	
  
Random	
  wins	
  
80	
  
Inflate	
  wins	
  
81	
  
Random	
  and	
  Seam	
  Carving	
  win	
  
82	
  
All	
  ok	
  except	
  Seam	
  Carving	
  	
  
83	
  
Demo	
  
84	
  
The	
  end	
  
85	
  

More Related Content

What's hot

Quick Tour of Text Mining
Quick Tour of Text MiningQuick Tour of Text Mining
Quick Tour of Text MiningYi-Shin Chen
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language ProcessingSebastian Ruder
 
The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalTony Russell-Rose
 
2017 ncu experience sharing
2017 ncu experience sharing2017 ncu experience sharing
2017 ncu experience sharingYi-Shin Chen
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Andre Freitas
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Andre Freitas
 
Nlp presentation
Nlp presentationNlp presentation
Nlp presentationSurya Sg
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentationSurya Sg
 
Assistive Technology
Assistive TechnologyAssistive Technology
Assistive Technologyjpuglia
 
Best Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingBest Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingOntotext
 
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?Constantin Orasan
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Nonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
Nonparametric Bayesian Word Discovery for Symbol Emergence in RoboticsNonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
Nonparametric Bayesian Word Discovery for Symbol Emergence in RoboticsTadahiro Taniguchi
 
Detecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaDetecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaTraian Rebedea
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
Quick tour all handout
Quick tour all handoutQuick tour all handout
Quick tour all handoutYi-Shin Chen
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information RetrievalRoelof Pieters
 
Opinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianOpinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianTraian Rebedea
 

What's hot (20)

Quick Tour of Text Mining
Quick Tour of Text MiningQuick Tour of Text Mining
Quick Tour of Text Mining
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information Retrieval
 
NLP Project Full Cycle
NLP Project Full CycleNLP Project Full Cycle
NLP Project Full Cycle
 
2017 ncu experience sharing
2017 ncu experience sharing2017 ncu experience sharing
2017 ncu experience sharing
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)
 
Nlp presentation
Nlp presentationNlp presentation
Nlp presentation
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Assistive Technology
Assistive TechnologyAssistive Technology
Assistive Technology
 
Best Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingBest Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining Processing
 
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Nonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
Nonparametric Bayesian Word Discovery for Symbol Emergence in RoboticsNonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
Nonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
 
Detecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaDetecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large Corpora
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Quick tour all handout
Quick tour all handoutQuick tour all handout
Quick tour all handout
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
Opinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianOpinion mining for social media and news items in Romanian
Opinion mining for social media and news items in Romanian
 

Viewers also liked

Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part) Marina Santini
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebMarina Santini
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationMarina Santini
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Marina Santini
 
Information Gain
Information GainInformation Gain
Information Gainguest32311f
 
Lecture 2: Preliminaries (Understanding and Preprocessing data)
Lecture 2: Preliminaries (Understanding and Preprocessing data)Lecture 2: Preliminaries (Understanding and Preprocessing data)
Lecture 2: Preliminaries (Understanding and Preprocessing data)Marina Santini
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word CloudsMarina Santini
 
How Emotional Are Users' Needs? Emotion in Query Logs
How Emotional Are Users' Needs? Emotion in Query LogsHow Emotional Are Users' Needs? Emotion in Query Logs
How Emotional Are Users' Needs? Emotion in Query LogsMarina Santini
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Peter Mika
 
An Introduction to Causal Discovery, a Bayesian Network Approach
An Introduction to Causal Discovery, a Bayesian Network ApproachAn Introduction to Causal Discovery, a Bayesian Network Approach
An Introduction to Causal Discovery, a Bayesian Network ApproachCOST action BM1006
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingMarina Santini
 
Towards Contextualized Information: How Automatic Genre Identification Can Help
Towards Contextualized Information: How Automatic Genre Identification Can HelpTowards Contextualized Information: How Automatic Genre Identification Can Help
Towards Contextualized Information: How Automatic Genre Identification Can HelpMarina Santini
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational SemanticsMarina Santini
 
Lecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsLecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsMarina Santini
 
Word Sense Disambiguation and Induction
Word Sense Disambiguation and InductionWord Sense Disambiguation and Induction
Word Sense Disambiguation and InductionLeon Derczynski
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked DataIsabelle Augenstein
 
Natural Language Processing for the Semantic Web
Natural Language Processing for the Semantic WebNatural Language Processing for the Semantic Web
Natural Language Processing for the Semantic WebIsabelle Augenstein
 

Viewers also liked (20)

Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)
 
Information Gain
Information GainInformation Gain
Information Gain
 
Lecture 2: Preliminaries (Understanding and Preprocessing data)
Lecture 2: Preliminaries (Understanding and Preprocessing data)Lecture 2: Preliminaries (Understanding and Preprocessing data)
Lecture 2: Preliminaries (Understanding and Preprocessing data)
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
How Emotional Are Users' Needs? Emotion in Query Logs
How Emotional Are Users' Needs? Emotion in Query LogsHow Emotional Are Users' Needs? Emotion in Query Logs
How Emotional Are Users' Needs? Emotion in Query Logs
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
 
An Introduction to Causal Discovery, a Bayesian Network Approach
An Introduction to Causal Discovery, a Bayesian Network ApproachAn Introduction to Causal Discovery, a Bayesian Network Approach
An Introduction to Causal Discovery, a Bayesian Network Approach
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
Towards Contextualized Information: How Automatic Genre Identification Can Help
Towards Contextualized Information: How Automatic Genre Identification Can HelpTowards Contextualized Information: How Automatic Genre Identification Can Help
Towards Contextualized Information: How Automatic Genre Identification Can Help
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
 
Lecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsLecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest Neighbors
 
Word Sense Disambiguation and Induction
Word Sense Disambiguation and InductionWord Sense Disambiguation and Induction
Word Sense Disambiguation and Induction
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked Data
 
Natural Language Processing for the Semantic Web
Natural Language Processing for the Semantic WebNatural Language Processing for the Semantic Web
Natural Language Processing for the Semantic Web
 

Similar to Lecture: Semantic Word Clouds

SWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalSWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalgowthamnaidu0986
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
 
Swoogle: Showcasing the Significance of Semantic Search
Swoogle: Showcasing the Significance of Semantic SearchSwoogle: Showcasing the Significance of Semantic Search
Swoogle: Showcasing the Significance of Semantic SearchIDES Editor
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
ISWC2023-McGuinnessTWC16x9FinalShort.pdf
ISWC2023-McGuinnessTWC16x9FinalShort.pdfISWC2023-McGuinnessTWC16x9FinalShort.pdf
ISWC2023-McGuinnessTWC16x9FinalShort.pdfDeborah McGuinness
 
An Ontology Model for Knowledge Representation over User Profiles
An Ontology Model for Knowledge Representation over User ProfilesAn Ontology Model for Knowledge Representation over User Profiles
An Ontology Model for Knowledge Representation over User ProfilesIJMER
 
Experiencai significativa 1 webquest
Experiencai significativa 1 webquestExperiencai significativa 1 webquest
Experiencai significativa 1 webquestSandra Delgado
 
Gadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLGadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLLawrie Hunter
 
Collaborative Ontology Building Project
Collaborative Ontology Building Project  Collaborative Ontology Building Project
Collaborative Ontology Building Project Jie Bao
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...
Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...
Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...Marko Grobelnik
 
A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...ijcnes
 
A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...
A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...
A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...María Poveda Villalón
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionKent State University
 
Introduction
IntroductionIntroduction
Introductionsriniefs
 
Topic Pages. From articles to answers.
Topic Pages. From articles to answers.Topic Pages. From articles to answers.
Topic Pages. From articles to answers.Deep Kayal
 

Similar to Lecture: Semantic Word Clouds (20)

Ontology
OntologyOntology
Ontology
 
SWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalSWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professional
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
The basics of ontologies
The basics of ontologiesThe basics of ontologies
The basics of ontologies
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Swoogle: Showcasing the Significance of Semantic Search
Swoogle: Showcasing the Significance of Semantic SearchSwoogle: Showcasing the Significance of Semantic Search
Swoogle: Showcasing the Significance of Semantic Search
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
ISWC2023-McGuinnessTWC16x9FinalShort.pdf
ISWC2023-McGuinnessTWC16x9FinalShort.pdfISWC2023-McGuinnessTWC16x9FinalShort.pdf
ISWC2023-McGuinnessTWC16x9FinalShort.pdf
 
An Ontology Model for Knowledge Representation over User Profiles
An Ontology Model for Knowledge Representation over User ProfilesAn Ontology Model for Knowledge Representation over User Profiles
An Ontology Model for Knowledge Representation over User Profiles
 
Experiencai significativa 1 webquest
Experiencai significativa 1 webquestExperiencai significativa 1 webquest
Experiencai significativa 1 webquest
 
Gadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLGadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALL
 
Collaborative Ontology Building Project
Collaborative Ontology Building Project  Collaborative Ontology Building Project
Collaborative Ontology Building Project
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...
Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...
Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...
 
A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...
 
A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...
A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...
A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
Topic Pages. From articles to answers.
Topic Pages. From articles to answers.Topic Pages. From articles to answers.
Topic Pages. From articles to answers.
 

More from Marina Santini

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Marina Santini
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsMarina Santini
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-Marina Santini
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesMarina Santini
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationMarina Santini
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Marina Santini
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Marina Santini
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Marina Santini
 
Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)Marina Santini
 
Lecture: Joint, Conditional and Marginal Probabilities
Lecture: Joint, Conditional and Marginal Probabilities Lecture: Joint, Conditional and Marginal Probabilities
Lecture: Joint, Conditional and Marginal Probabilities Marina Santini
 
Mathematics for Language Technology: Introduction to Probability Theory
Mathematics for Language Technology: Introduction to Probability TheoryMathematics for Language Technology: Introduction to Probability Theory
Mathematics for Language Technology: Introduction to Probability TheoryMarina Santini
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free GrammarsMarina Santini
 
Lecture: Regular Expressions and Regular Languages
Lecture: Regular Expressions and Regular LanguagesLecture: Regular Expressions and Regular Languages
Lecture: Regular Expressions and Regular LanguagesMarina Santini
 

More from Marina Santini (16)

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability Features
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
 
Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)
 
Lecture: Joint, Conditional and Marginal Probabilities
Lecture: Joint, Conditional and Marginal Probabilities Lecture: Joint, Conditional and Marginal Probabilities
Lecture: Joint, Conditional and Marginal Probabilities
 
Mathematics for Language Technology: Introduction to Probability Theory
Mathematics for Language Technology: Introduction to Probability TheoryMathematics for Language Technology: Introduction to Probability Theory
Mathematics for Language Technology: Introduction to Probability Theory
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free Grammars
 
Lecture: Regular Expressions and Regular Languages
Lecture: Regular Expressions and Regular LanguagesLecture: Regular Expressions and Regular Languages
Lecture: Regular Expressions and Regular Languages
 
Lecture: Automata
Lecture: AutomataLecture: Automata
Lecture: Automata
 

Recently uploaded

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 

Recently uploaded (20)

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 

Lecture: Semantic Word Clouds

  • 1. Seman&c  Analysis  in  Language  Technology   http://stp.lingfil.uu.se/~santinim/sais/2016/sais_2016.htm 
 
 Semantic Word Clouds Marina  San(ni   san$nim@stp.lingfil.uu.se     Department  of  Linguis(cs  and  Philology   Uppsala  University,  Uppsala,  Sweden     Spring  2016      
  • 3. Semantic Web & Ontologies •  The  goal  of  the  Seman(c  Web  is  to  allow  web  informa(on  and  services  to  be  more   effec(vely  exploited  by  humans  and  automated  tools.     •  Essen(ally,  the  focus  of  the  seman(c  web  is  to  share  data  instead  of  documents.     •  This  data  must  be  ”meaningful”  both  for  human  and  for  machines  (ie  automated  tools  and   web  applica(ons)   •  Q:  How  are  we  going  to  represent  meaning  and  knowledge  on  the  web?   •  A:  …  via  annota&on.     •  Knowledge  is  represented  in  the  form  of  rich  conceptual  schemas/formalisms  called   ontologies.     •  Therefore,  ontologies  are  the  backbone  of  the  Seman(c  Web.   •  Ontologies  give  formally  defined  meanings  to  the  terms  used  in  annota&ons,  transforming   them  into  seman&c  annota&ons.   3
  • 4. Ontologies  are…   •  …  concepts  that  are   hierarchically   organized   4   Tree  of  Porphyry,  III  AD   Wordnet,  XXI  AD  (see  Lect  5,  ex  similarity  measures)  
  • 5. Reasoning:   RDF/OWL  vs  Databases  (and  other  data  structures)   OWL  axioms  behave  like  inference  rules  rather  than  database  constraints.     ! Class: Phoenix! !SubClassOf: isPetOf only Wizard! ! Individual: Fawkes! Types: Phoenix! Facts: isPetOf Dumbledore! •  Fawkes  is  said  to  be  a  Phoenix  and  to  be  the  pet  of  Dumbledore,  and  it  is  also  stated  that  only  a   Wizard  can  have  a  pet  Phoenix.     •  In  OWL,  this  leads  to  the  implica(on  that  Dumbledore  is  a  Wizard.  That  is,  if  we  were  to  query  the   ontology  for  instances  of  Wizard,  then  Dumbledore  would  be  part  of  the  answer.     •  In  a  database  se[ng  the  schema  could  include  a  similar  statement  about  the  Phoenix  class,  but  in   this  case  it  would  be  interpreted  as  a  constraint  on  the  data:  adding  the  fact  that  Fawkes  isPetOf   Dumbledore  without  Dumbledore  being  already  known  to  be  a  Wizard  would  lead  to  an  invalid   database  state,  and  such  an  update  would  therefore  be  rejected  by  a  database  management   system  as  a  constraint  viola(on.   5  
  • 6. So, what is an ontology for us? 6 “An  ontology  is  a  FORMAl,  EXPLICIT  specifica&on  of  a    SHARED  conceptualiza&on”   Studer,  Benjamins,  Fensel.  Knowledge  Engineering:  Principles  and  Methods.  Data  and  Knowledge  Engineering.  25  (1998)  161-­‐197     An ontology is an explicit specification of a conceptualization Gruber, T. A translation Approach to portable ontology specifications. Knowledge Acquisition. Vol. 5. 1993. 199-220 Abstract model and simplified view of some phenomenon in the world that we want to represent Machine-readable Concepts, properties relations, functions, constraints, axioms, are explicitly defined Consensual Knowledge
  • 7. How  to  build  an  ontology   Generally  speaking  (and  roughly  said),  when   designing  an  ontology,  four  main  components   are  used:   1.  Classes   2.  Rela(ons   3.  Axioms   4.  Instances       7  
  • 8. Prac(cal  Ac(vity:  emo(ons   8   Your  remarks:   •  Emo(ons  are  ambiguous:   eg.  happiness  can  be  also   ill-­‐directed   •  The  polarity  of  some   emo(ons  cannot  be   assessed…   •  etc.       Classes   Rela(ons   Axioms   Instances   etc.    
  • 9. Occupa(onal  psychology  (wikipedia)   •  Industrial  and  organiza(onal  psychology  (also  known  as  I–O   psychology,  occupa(onal  psychology,  work  psychology,  WO   psychology,  IWO  psychology  and  business  psychology)  is  the   scien$fic  study  of  human  behavior  in  the  workplace  and  applies   psychological  theories  and  principles  to  organiza(ons  and   individuals  in  their  workplace.   •  I-­‐O  psychologists  are  trained  in  the  scien(st–prac((oner  model.  I-­‐O   psychologists  contribute  to  an  organiza(on's  success  by  improving   the  performance,  mo(va(on,  job  sa(sfac(on,  occupa(onal  safety   and  health  as  well  as  the  overall  health  and  well-­‐being  of  its   employees.  An  I–O  psychologist  conducts  research  on  employee   behaviors  and  a[tudes,  and  how  these  can  be  improved  through   hiring  prac(ces,  training  programs,  feedback,  and  management   systems.   9  
  • 10. In  summary…   Why  to  build  an  ontology?     •  To  share  common  understanding  of  the  structure   of  informa(on  among  people  or  machines   •  To  make  domain  assump$ons  explicit   •  Ojen  based  on  controlled  vocabulary   •  To  analyze  domain  knowledge   •  To  enable  reuse  of  domain  knowledge   10  
  • 11. Ontologies  and  Tags   •  Ontologies  and  tagging  systems  are  two  different   ways  to  organize  the  knowledge  present  in  Web.     •  The  first  one  has  a  formal  fundamental  that   derives  from  descrip(ve  logic  and  ar(ficial   intelligence.  Domain  experts  decide  the  terms.   •  The  other  one  is  simpler  and  it  integrates   heterogeneous  contents,  and  it  is  based  on  the   collabora(on  of  users  in  the  Web  2.0.  User-­‐   generated  annota(on.     11  
  • 12. Folksonomies   •  Tagging  facili(es  within  Web  2.0  applica(ons   have  shown  how  it  might  be  possible  for  user   communi$es  to  collabora$vely  annotate  web   content,  and  create  simple  forms  of  ontology   via  the  development  of  loosely-­‐hierarchically   organised  sets  of  tags,  oNen  called   folksonomies….     12  
  • 13. Folksonomy=Social  Tagging   •  Folksonomies  (also  known  as  social  tagging)  are   user-­‐defined  metadata  collec(ons.     •  Users  do  not  deliberately  create  folksonomies   and  there  is  rarely  a  prescribed  purpose,  but  a   folksonomy  evolves  when  many  users  create  or   store  content  at  par(cular  sites  and  iden(fy  what   they  think  the  content  is  about.     •  “Tag  clouds”  pinpoint  the  frequency  of  certain   tags.   13  
  • 14. •  A  common   way  to   organize  tags   is  in  tag   clouds…   14  
  • 15. Automa(c  folksonomy  construc(on   •  The  collec(ve  knowledge  expressed  though  user-­‐ generated  tags  has  a  great  poten(al.     •  However,  we  need  tools  to  efficiently  aggregate   data  from  large  numbers  of  users  with  highly   idiosyncra$c  vocabularies  and  invented  words   or  expressions.     •  Many  approaches  to  automa(c  folksonomy   construc(on  combine  tags  using  sta(s(cal   methods  ...     •  Ample  space  for  improvement…   15  
  • 16. Ontology,  taxonomy,  folksonomy,  etc.     •  Many  different  defini(ons…   •  A  good  summary  and  interpreta(on  is  here:   hpp://www.ideaeng.com/taxonomies-­‐ ontologies-­‐0602     16  
  • 17. Today…   •  We  will  talk  more  generally  about  word   clouds…   17  
  • 18. Further  Reading   Seman&c  Similarity  from  Natural  Language  and  Ontology  Analysis   by  Sébas(en  Harispe,  Sylvie  Ranwez,  Stefan  Janaqi,  and  Jacky   Montmain   Synthesis  Lectures  on  Human  Language  Technologies,  May  2015,  Vol.   8,  No.  1   •  The  two  state-­‐of-­‐the-­‐art  approaches  for  es(ma(ng  and  quan(fying   seman(c  similari(es/relatedness  of  seman(c  en((es  are  presented   in  detail:  the  first  one  relies  on  corpora  analysis  and  is  based  on   Natural  Language  Processing  techniques  and  seman(c  models   while  the  second  is  based  on  more  or  less  formal,  computer-­‐ readable  and  workable  forms  of  knowledge  such  as  seman(c   networks,  thesauri  or  ontologies.   18  
  • 19. Previous  lecture:  the  end   19  
  • 20. Acknowledgements   This  presenta(on  is  based  on  the  following  paper:     •  Barth  et  al.  (2014).  Experimental  Comparison  of  Seman(c   Word  Cloud.  In  Experimental  Algorithms,  Volume  8504  of  the   series  Lecture  Notes  in  Computer  Science  pp  247-­‐258     –  Link:  hpps://www.cs.arizona.edu/~kobourov/wordle2.pdf       Some  slides  have  been  borrowed  from  Sergey  Pupyrev.   20  
  • 21. Today   •  Experiments  on  seman&cs-­‐preserving  word   clouds,  in  which  seman(cally  related  words   are  close  to  each  other.   21  
  • 22. Outline   •  What  is  a  Word  Cloud?   •  3  early  algorithms   •  3  new  algorithms   •  Metrics  &  Quan(ta(ve  Evalua(on   22  
  • 23. Word  Clouds   •  Word  clouds  have  become  a  standard  tool  for   abstrac(ng,  visualizing  and  comparing  texts…   •  We  could  apply  the  same  or  similar   techniques  to  the  huge  amonts  of  tags   produced  by  users  interac(ng  in  the  social   networks     23  
  • 24. Comparison  &  conceptualiza(on  Tool   24   •  Word  Clouds  as  a  tool  for  ”conceptualizing”  documents.  Cf   Ontologies   •  Ex:  2008,    comparison  of  speeches:  Obama  vs  McCain   Cf.  Lect  10:   Extrac(ve   summariza(on  &     Abstrac(ve   summariza(on  
  • 25. Word  Clouds  and  Tag  Clouds…   •  …  are  ojen  used  to  represent  importance   among  terms  (ex,  band  popularity)  or  serve  as   a  naviga(on  tool  (ex,  Google  search  results).   25  
  • 26. The  Problem…   • How  to  compute  seman(c-­‐preserving  word   clouds  in  which  seman(cally-­‐related  words   are  close  to  each  other?   26  
  • 27. Wordle   hpp://www.wordle.net     •  Prac(cal  tools,  like  Wordle,   make  word  cloud  visualiza(on   easy.   They  offer  an  appealing  way   to  SUMMARIZE  text…   Shortoming:  they  do  not  capture   the  rela(onships  between  words  in   any  way  since  word  placement  is   independent  of  context   27  
  • 28. Many  word  clouds  are  arranged  randomly  (look   also  at  the  scapered  colours)   28  
  • 29. Paperns  and  Vicinity/Adjacency   Humans  are  spontaneously  papern-­‐seekers:     if  they  see  two  words  close  to  each  other  in  a   word  cloud,  they  spontaneously  think  they  are   related…   29  
  • 30. In  Linguis(cs  and  NLP…   •  This  natural  tendency  in  linking  spacial  vicinity   to  seman&c  relatedness  is  exploited  as   evidence  that  words  are  seman(cally  related   or  seman(cally  similar…   Remember?  :  ”You  shall  know  a  word  by  the   company  it  keeps  (Firth,  J.  R.  1957:11)”     30  
  • 31. So,  it  makes  sense  to  place  such  related  words  close   to  each  other  (look  also  at  the  color  distribu(on)   31  
  • 32. Seman(c  word  clouds  have  higher  user   sa(sfac(on  compared  to  other  layouts…   32  
  • 33. All  recent  word  cloud  visualiza(on  tools  aim  to   incoprorate  seman(cs  in  the  layout…     33  
  • 34. …  but  none  of  them  provide  any  guarantee  about  the   quality  of  the  layout  in  terms  of  seman(cs   34  
  • 35. Early  algorithms:  Force-­‐Directed  Graph   •  Most  of  the  exis(ng  algorithms  are  based   on  force-­‐directed  graph  layout.     •  Force-­‐directed  graph  drawing  algorithms   are  a  class  of  algorithms  for  drawing   graphs  in  an  aesthe(cally  pleasing  way   –  Aprac(ve  forces  between  pairs  to  reduce   empty  space   –  Repulsive  forces  ensure  that  words  do  not   overlap   –  Final  force  preserve  seman(c  rela(ons   between  words.     35   Some  of  the  most  flexible   algorithms  for  calcula(ng   layouts  of  simple  undirected   graphs  belong  to  a  class   known  as  force-­‐directed   algorithms.  Such  algorithms   calculate  the  layout  of  a   graph  using  only   informa(on  contained   within  the  structure  of  the   graph  itself,  rather  than   relying  on  domain-­‐specific   knowledge.  Graphs  drawn   with  these  algorithms  tend   to  be  aesthe(cally  pleasing,   exhibit  symmetries,  and   tend  to  produce  crossing-­‐ free  layouts  for  planar   graphs.  
  • 36. Newer  Algorithms:  rectangle   representa(on  of  graphs   •  Vertex-­‐weighted  and  edge-­‐weighed  graph:   –  The  ver(ces  of  the  graph  are  the  words   •  Their  weight  correspond  to  some  measure  of  importance   (eg.  word  frequencies)   –  The  edges  capture  the  seman(c  relatedness  of  pair  of   words  (eg.  co-­‐occurrence)   •  Their  weight  correspond  to  the  strength  of  the  rela(on   –  Each  vertex  can  be  drawn  as  a  box  (rectangle)  with  a   dimension  determing  by  its  weight   –  A  realized  adjacency    is  the  sum  of  the  edge  weights   for  all  pairs  of  touching  boxes.     –  The  goal  is  to  maximize  the  realized  adjacencies.   36  
  • 37. Purpose  of  the  experiments  that  are  shown   here:   •  Seman(cs  preserva(on  in  terms  of  closeness/ vicinity/adjacency   37  
  • 38. Example   •  A  contact  of  2  boxes  is  a  common  boundary.   •  The  contact  of  two  boxes  is  interpredet  as   seman(c  relatedness   •  The  contact  of  2  boxes  can  be  calculated,  so  the   adjacency  can  be  computed  and  evaluated.   38  
  • 39. Preprocessing:     1)  Term  Extrac(on     2)  Ranking     3)  Similarity/Dissimilarity  Computa(on   39  
  • 41. Lect  6:   Repe((on   large   data   computer   apricot   1   0   0   digital   0   1   2   informa(on   1   6   1   41   Which  pair  of  words  is  more  similar?   cosine(apricot,informa(on)  =         cosine(digital,informa(on)  =         cosine(apricot,digital)  =     cos(  v,  w)=  v•  w  v  w =  v  v •  w  w = viwii=1 N ∑ vi 2 i=1 N ∑ wi 2 i=1 N ∑ 1+0+0 1+0+0 1+36+1 1+36+1 0+1+4 0+1+4 1+0+0 0+6+2 0+0+0 = 1 38 =.16 = 8 38 5 =.58 = 0
  • 42. Lect  06:  Other  possible  similarity  measures   42  
  • 43. Input  -­‐  Output   •  The  input  for  all  algorithms  is     – a  collec(on  of  n  rectangles,  each  with  a  fixed   width  and  height  propor(onal  to  the  rank  of  the   word   – A  similarity/dissimilarity  matrix   •  The  output  is  a  set  of  non-­‐overlapping   posi(ons  for  the  rectangles.   43  
  • 44. Early  Algorithms   1.  Wordle  (Random)   2.  Context-­‐Preserving  Word  Cloud  Visualiza(on   (CPWCV)   3.  Seam  Carving   44  
  • 45. Wordle  à  Random   •   The  Wordle  algorithm  places  one  word  at  a  (me   in  a  greedy  fashion,  ie  aiming  to  use  space  as   efficiently  as  possible.     •  First  the  words  are  sorted  by  weight/rank  in   decreasing  order.     •  Then  for  each  word  in  the  order,  a  posi(on  is   picked  at  random.     45  
  • 52. Context-­‐Preserving  Word  Cloud  Visualiza(on  (CPWCV)     •  First,  a  dissimilarity  matrix  is  computed  and   Mul(dimensional  Scaling  (MDS)  is  performed   •  Second,  effort  to  create  a  compact  layout     52   Mul(dimensional  Scaling   (MDS)  aims  at  detec(ng   meaningful  underlying   dimensions  in  the  data.    
  • 54. 2:  Context-­‐Preserving  :  repulsive  force   54  
  • 55. 3:  Context-­‐Preserving  :  aprac(ve  force   55  
  • 56. Seam  Carving     •  Basically,  an  algorithm  for  image  resizing   •  It  was  invented  at  Mitsubishi’s   56  
  • 58. 2:  Seam  Carving  :  space  is  divided  into   regions   58  
  • 59. 3:  Seam  Carving  :  empty  paths   trimmed  out  itera(vely   59  
  • 62. 6:  Seam  Carving:  space  divided  into   regions   62  
  • 64. 3  New  Algorithms   1.  Inflate  and  Push   2.  Star  Forest   3.  Cycle  Cover   64  
  • 65. Inflate-­‐and-­‐Push   •  Simple  heuris(c  method  for  word  layout,  which  aims   to  preserve  seman(c  rela(ons  between  pair  of  words.   •  Based  on     1.  Heuris(cs:  scaling  down  all  word  rectangles  by  some   constant;     2.  Compu(ng  MDS  (mul(dimensional  scaling)  on  the   dissimilarity  matrix   3.  Iteretavely  increase  the  size  of  rectangles  by  5%  (ie   ”inflate”  words;     4.  When  words  overlaps,  apply  a  force-­‐directed  algorithm   to  ”push”  words  away.   65  
  • 67. Inflate  :  scaling  down   67  
  • 68. Inflate  :  seman(cally-­‐related  words  are  placed  close   to  each  other.  Apply  ”inflate  words”  (5%)  itera(vely.   68  
  • 69. Inflate:  ”push  words”:  repulsive  force   to  resolve  overlaps   69  
  • 71. Star  Forest   •  A  star  is  a  tree     •  A  star  forest  is  a  forest  whose  connected   components  are  all  stars.   71  
  • 72. Repe((on:  trees  and  graphs   •  A  tree  is  special  form  of  graph  i.e.  minimally   connected  graph  and  having  only  one  path  between   any  two  ver(ces.     •  In  a  graph  there  can  be  more  than  one  path  i.e.  graph   can  have  uni-­‐direc(onal  or  bi-­‐direc(onal  paths  (edges)   between  nodes.   72  
  • 73. Three  steps   1.  Extrac(ng  the  star  forest:  par&&on  a  graph   into  disjoint  stars     2.  Realising  a  star:  build  a  word  cloud  for  every   star   3.  Pack  all  the  stars  together   73  
  • 74. Star  Forest  :  star  =  tree   1.  Extract  stars  greedily  from  a  dissimilarity  matrix  à  disjoint  stars  =  star  forest   2.  Compute  the  op(mal  stars,  ie  the  best  set  of  words  to  be  adjacent   3.  Aprac(ve  force  to  get  a  compact  layout   74  
  • 75. Cycle  Cover   •  This  algorithm  is  based  on  a  similarity  matrix.   •  First,  a  similarity  path  is  created   •  Then,  the  op(mal  level  of  compact-­‐ness  is  computed   75  
  • 76. Quan(ta(ve  Metrics   76   1.  Realized  Adjacenies   –  how  close  are  similar  words  to  each   other?   2.  Distor(on   –  how  distant  are  dissimilar  words?   3.  Uniform  Area  U(liza(on   –  uniformity  of  the  distribu(on   (overpopulated  vs  sparse  areas  in   the  word  cloud)   4.  Comptactness   –  how  well  u(lized  is  the  drawing   area?   5.  Aspect  Ra(o   –  width  and  height  of  the  bounding   box   6.  Running  Time   –  execu(on  (me  
  • 77. 2  datasets    (1)  WIKI  ,  a  set  of  112    plain-­‐text  ar(cles   extracted  from  the  English  Wikipedia,  each   consis(ng  of  at  least  200    dis(nct  words     (2)  PAPERS  ,  a  set  of  56    research  papers   published  in  conferences  on  experimental   algorithms  (SEA  and  ALENEX)  in  2011-­‐2012.   77  
  • 82. Random  and  Seam  Carving  win   82  
  • 83. All  ok  except  Seam  Carving     83