SlideShare a Scribd company logo
1 of 38
Uprising microblogs: A Bayesian network
    retrieval model for tweet search

 Lamjed Ben Jabeur, Lynda Tamine and Mohand Boughanem
 IRIT, Université Paul Sabatier
A Bayesian network retrieval model for tweet search

     Outline

1.   Microblogging service
2.   Tweet search
3.   Bayesian network topology
4.   Computing conditional probabilities
5.   Experimental evaluation
6.   Conclusion and future work




                                                           2
Microblogging service

        Microblog?

“   Microblogging is a new form of communication [….]
    that enables users to broadcast and share information
    about their activities, opinions and status. [Java et
    al.2007].
                  ”
• Microblog post
    –   Short (140 characters)
                                   1 billions    Publications /week
    –   Real-time                  50 millions Publications /day
    –   Social motivation           177 million Publications in mars 2011
    –   Mobile device            +106 millions User accounts

                                                                            3
Microblogging service

          Tweet, retweet et hashtag ?

“
    Jack Dorsey 21 Mars 06  1ier Tweet
inviting coworkers                                                        #oilspill


“
    Stephen Colbert 21 Juin 2010  Golden Tweet Award 2010
In honor of oil-soaked birds, 'tweets' are now 'gurgles. http://bit.ly/cIhZNf



“
    Wendy's 8 Juin 2011  Golden Tweet Award 2011
RT for a good cause. Each Retweet sends 50¢ to help kids in foster care. #TreatItFwd




                  “
                       CORIA11 16 mars 2010
                   CORIA 2011 : Université d'Avignon #CORIA11 http://yfrog.com/h3y




                                   ““
                                          MohBoughanem 17 Mars 2010
                                    @coria2011 well visualized, quickly found
                                         MohBoughanem     CORIA11 17 Mars 2010
                                                                                       4
                                      @coria2011 well visualized, quickly found
Microblogging service

Social information network




                             5
Tweet search

       Microblog IR

• Users overwhelmed by the huge quantity of tweets
   – Important publication rate
   – Diverse sources of information
       Difficulty to accessing to interesting posts

• Microblog IR tasks
   –   Person search and follower suggestion
   –   Trend extraction
   –   Opinion search
   –   Tweet search
                                                      6
Tweet search

        Tweet search task

“   real-time search task, where the user wishes to see the
    most recent but relevant information to the query. (Ounis
    et al., 2011).
                       ”
“   adhoc search on Twitter, where a user’s information need is

                                                                  ”
    represented by a query at a specific time. (Ounis et al., 2011).

• Search motivations
    –   access to concise and credible information
    –   access to fresh and real-time news
    –   follow an event
    –   collect opinions and public sentiments
                                                                       7
Tweet search

     Related work

1. Spatio-temporel context
 TwitterStand (Sankaranarayanan J. et al, 2009)   TweetSieve (Grinev M et al, 2009)




2. Microblog features
   – followership, tweets, retweets, reply, hashtags, URLs
   – Linear combination (Nagmoti et al., 2010)
   – Learn to Rank (Duan Y et al., 2010)

                                                                                      8
Tweet search

    Related work

3. Social network structure
   – Indegree, Retweet et Mention influence (Cha et al.,
     2010).,TweetRank, FollowerRank (Nagmoti et al., 2010).
   – Authority (Kwak et al., 2010)
   – Influence (Kwak et al., 2010), TwitterRank (Weng et al., 2010),
     Popularity (Duan et al.,2010)




                                                                       9
Tweet search

        Contributions
                                        topical
•   Relevance features:
    –     Term occurrence
    –     social influence
    –     time magnitude


• Bayesian network model
                             temporal        social




                                                      10
Bayesian network topology

    Definitions and notations

•   Query: q  0,1            q, q
• Term: ki  0,1 k , ki i

• Term configuration: k
    example : k1 , k 2
    
   k   k1 , k2 ), (k1 , k2 ), (k1 , k2 ), (k1 , k2 )
         (
• Tweet: t j  0,1 ti , ti
• Microblogger: uk  0,1 uk , uk
                                                          11
Bayesian network topology

Network nodes and edges

            Query                q




            Terms     k1         k2   k3




            Tweets    t1         t2   t3




            Microbloggers   u1        u2



                                           12
Computing conditional probabilities
         Query evaluation


Query                  q
                                                                
                                         P(q  t i )   P(q | k )P(k | t i ) P( t i | u k ) P(u k )
                                                        
                                                        k
Terms       k1        k2        k3
                                                             
                                       P(q  t j )   P(q | k )P( t j | u k ) P(u k )
                                                      
                                                          k
Tweets      t1        t2         t3
                                                                            
                                         P(k i | t j )   P(k i | t j ) 
                                        k |on(i,k ) 1                   
                                        i                k i |on(i,k )  0 
Microbloggers    u1             u2



                                                                                                13
Computing conditional probabilities

        Query
                                                                                   
P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j ) 
                                               k |on(i,k )1                    
              k                                i                k i |on(i,k )  0 


                           
P(q | k )           on(i, k )
                   i , ki q




                                                                                         14
Computing conditional probabilities

        Tweet
                                                                                   
P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j ) 
                                               k |on(i,k )1                    
              k                                i                k i |on(i,k )  0 

 P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j )

                                           Term occurrence                     Tweet properties




  P( k i | t j )  1  P( k i | t j )



                                                                                                  15
Computing conditional probabilities

        Term frequency
                                                                                   
P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j ) 
                                               k |on(i,k )1                    
              k                                i                k i |on(i,k )  0 

 P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j )

                    a
                                       if k i  t j
                                                                       F ( ki , t j )
                1                                                     1
F (ki , t j )   tf ki ,t j                                         0,8
                 0
                                                                                                            a=0,1

                                       otherwise                    0,6                                    a=0,25

                                                                     0,4                                    a=0,5

                                                                     0,2                                    a=0,75

                                                                        0                                   a=1

                                                                            0            5   tf ki ,t j10
                                                                                                            16
Computing conditional probabilities

        Hashtag
                                                                                   
P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j ) 
                                               k |on(i,k )1                    
              k                                i                k i |on(i,k )  0 

 P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j )

                     b                if # k i  t j
                1 
H (ki , t j )   tf #ki ,t j
                 b                     otherwise
                




                                                                                         17
Computing conditional probabilities

        Time magnitude
                                                                                   
P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j ) 
                                               k |on(i,k )1                    
              k                                i                k i |on(i,k )  0 

 P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j )

                                                       tweets
                        df k i, j
 T ( ki , t j )                                           30

                           j                              20
                                                                                                        t1
                                                           10
                                                                                                        t2
                                                             0
                                                                 1       2                       tems
                                                                                 3       4   5

                  
     j  t k ,  t j   t k  t                                                  time

                                                                                                   18
Computing conditional probabilities

        Tweet length
                                                                                   
P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j ) 
                                               k |on(i,k )1                    
              k                                i                k i |on(i,k )  0 

 P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j )

                1
L(t j ) 
          1  avgtl  tltj




                                                                                         19
Computing conditional probabilities

        Microblogger
                                                                                   
P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j ) 
                                               k |on(i,k )1                    
              k                                i                k i |on(i,k )  0 


                            1
 P( t j | u k ) 
                          u   k




                                                                                         20
Computing conditional probabilities

        Social influence
                                                                                   
P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j ) 
                                               k |on(i,k )1                    
              k                                i                k i |on(i,k )  0 



 P(uk )  Inf (uk )


PageRank on Retweet Social Network
                   1                                  Inf G 1 (ui )
                                                           k
Inf Gk (ui )  d        (1  d )  w j ,i
                  U              u j ,e ( u j ,ui )E   O(u j )
          (u j )   (u j )
w j ,i 
                (u j )
                                                                                         21
Computing conditional probabilities

        Social influence
                                                                                   
P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j ) 
                                               k |on(i,k )1                    
              k                                i                k i |on(i,k )  0 




                                                                         (u j )   (ui )
                                                             wi , j 
                                                                               (ui )
                                                                                             22
Experimental evaluation

TREC 2011 Microblog
                                                                  NESTOR
                                                        Microblog Search Engine

Tweets         16 141 812      Microbloggers                            5 356 432
Retweets        1 128 179      Retweet relationships                    1 060 551
Tweet           1 860 112      Social network of retweets: nodes        5 495 081
Terms           7 781 775      Social network of retweets: edges        1 024 914
Hashtags         455 179       Giant component                            11.12%


    Term frequency                 Hashtags                     Tweet length

                  1.5E8                       1.5E 7                           1.5E 6




0          5          10 0            5            10     0                20
                Term frequency, hashtags and length distributions
                                                                                        23
Experimental evaluation

   Queries and ground truth

• “Arab Spring” query dataset (25 queries)
  – Topical
     “Number of protesters in Tahrir”, “Tunisian revolution”

  – Temporal
     “ElBaradei arrvies in Egypt”, “Clashes in Tahrir”, “SMS Down Egypt”

  – Social
     “Wael Ghonim”, “Mubarak dissolves government”

• User rating (relevant, not relevent)
• Tweets ranked by Score; p@10; p@20
                                                                           24
Experimental evaluation

       Configurations and baselines

BNTS         Bayesian network model for tweet search*
BNTS-L       BNTS, Tweet length feature disabled
BNTS-T       BNTS, Time magnitude feature disabled
BNTS-H       BNTS, Hashtag feature disabled
BNTS-S       BNTS, Social influence feature disabled
BM25         Okapi BM25
VSM          Vector Space Model
BM           Boolean Model


*   0.25, a  0.25, b  0.4, t  1h, d  0.15

                                                        25
Experimental evaluation

 Features impact

     BNTS         BNTS-L           BNTS-T       BNTS-H          BNTS-S

                      0,584 0,58
0,552 0,532                             0,548                   0,542 0,528
                                                0,502



                                                        0,294
              0,256




              p@10                                      p@20

                                                                     26
Experimental evaluation

Features impact
                                                                     Topical
              BNTS        BNTS-L     BNTS-T    BNTS-H        BNTS-S
                       0,7533 0,7333
                                                                    0,7233
0,66 0,6867                                0,6867 0,6833                     0,6833




                                                           0,3767
              0,2867




              p@10                                         p@20
                                                                                      27
Experimental evaluation

 Features impact
                                                                      Temporal
                  BNTS       BNTS-L       BNTS-T      BNTS-H     BNTS-S

0,4333
                                    0,4
                           0,3333                  0,35
                                                                       0,3 0,3167
         0,2333
                                                          0,2

                                                                0,1
                  0,0667


                  p@10                                          p@20
                                                                                    28
Experimental evaluation

 Features impact
                                                                           Social
                  BNTS       BNTS-L        BNTS-T   BNTS-H         BNTS-L
0,3714
         0,3286            0,3286 0,3286        0,3357

                  0,2714                                                          0,2857
                                                         0,2429          0,2571
                                                                  0,2




                  p@10                                            p@20
                                                                                           29
Experimental evaluation

 Retrieval effectiveness




                    p@10             p@20
BNTS                0,552            0,548
BM25                0,576      -4%   0,494     11%
BM                  0,416   ** 33%   0,382   ** 34%
VSM                 0,376   ** 47%    0,36   ** 52%
                                                      30
A Bayesian network retrieval model for tweet search

        Conclusion and future work

•   Tweet search model
    –     Normalized Term frequency
    –     Time magnitude
    –     Social influence
•   Integrating relevance factors within a Bayesian network
•   Query profile impact features performances.
•   Our model outperforms traditional IR baselines.
•   Future work
    –     Automatically detect optimal time window
    –     Select appropriate feature depending on the query profile
                                                                      31
Thank you for your attention!




            Follow me on Twitter!
             http://twitter.com/amjedbj
Computing conditional probabilities
      Query evaluation
                                                                
            q
                               P(t j | q)   P(q | k ) P(t j | k )P(k )
                                             
                                                 k
                                                                                          
k1         k2         k3       P(t j | q)   P(q | k ) P(tkj | k )P(toj | k ) P(t sj | k ) P(k )
                                             
                                                 k




                                           o1              o2               u1            u1




tk1        tk2       tk3
                                to3             to2        to3        ts1        ts2      ts3




                                      t1              t2         t3                             33
Experimental evaluation

      Term frequency normalization

•    BNTS.K
    p @ 30
                                                                                 1                 tf ki ,t j  
                                                                                         
    0,35
                                                                   P(t kj | k )          
     0,3                                                                          k   ki k t j      tf ki ,t j

    0,25

     0,2

    0,15

     0,1

    0,05

      0
           0   0,1   0,2   0,3   0,4       0,5   0,6   0,7   0,8   0,9     1

                                                                                                                   34
Experimental evaluation

               Time window

 •             BNTS.KO
p @ 30
  0,32
                                                                             t      t 
                                                                  oe :  oe  , oe  
 0,315                                                                       2       2

  0,31


 0,305


     0,3


 0,295

                                                                                   jours
  0,29

           0    1   2    3   4   5       6   7    8   9   10 11 12 13 14 15 16 17
                                                                                             35
                                                 t
Experimental evaluation

       Retrieval effectiveness
       isiFDL   DFReeKLIM30      BNTS   Médiane   Nestor   BM25   Disjunctive
 0,5
0,45
 0,4
0,35
 0,3
0,25
 0,2
0,15
 0,1
0,05
  0
                     p@30                                  MAP
                                                                                36
Experimental evaluation

         TREC Microblogs 2011
                                  Ranked by time                       Ranked by score
                        All rel                    High rel                All rel
                   p@30           MAP       p@30          MAP          p@30          MAP
Nestor*                0.2027      0.1305    0.0838           0.1287     0.2218      0.1384
Nestor-S*              0.2027      0.1305    0.0838           0.1286     0.2184      0.1360
Nestor-T               0.2082      0.1343    0.0585           0.0912     0.1912      0.1196
Nestor-L               0.2048      0.1306    0.0565           0.0867     0.2293      0.1426
Median                 0.2592      0.1433    0.2646           0.1381




                                                                                           37
Experimental evaluation

        TREC Microblogs 2011
                    Système               Seuil    p@10      p@20      p@30     Map
1    Somme IDF des termes présents            30    0,3633    0,3316   0,3333   0,1759
2    BM25                                     30    0,3571    0,3245   0,2973   0,1546
3    Proportion des termes présents           30    0,2653    0,2561   0,2782      0,14
4    Somme des fréquences booléennes          30    0,2571    0,2663   0,2755   0,1387
5    EBM (AND)                                30    0,3041    0,2918   0,2714   0,1282
6    Réseau d’inférence Bayésien              30     0,302    0,2888   0,2687   0,1274
7    Somme TF*IDF                             30     0,302    0,2888   0,2687   0,1274
8    VSM                                      30     0,302    0,2888   0,2687   0,1274
9    Somme TF                                 30    0,2327    0,2276   0,2238   0,1066
10   Nestor                                         0,2857    0,2347   0,2027   0,1305
11   EBM (OR)                                 30    0,1837    0,1786    0,166   0,0541
12   Sommes des fréquences des Hashtags       30    0,1612    0,1541   0,1469   0,0512
13   Lucene-Baseline                        1000    0,1612    0,1143   0,0986   0,1411
14    Somme TF (normalise par longueur)       30    0,0816    0,0673   0,0612   0,0223
15   Ordre chronologique inverse              30    0,0184    0,0255   0,0218   0,0082

                                                                                      38

More Related Content

Viewers also liked

UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...Pierpaolo Basile
 
Web-scale semantic search
Web-scale semantic searchWeb-scale semantic search
Web-scale semantic searchEdgar Meij
 
(Micro)Blog : un sujet de recherche actuel [08/02/2011]
(Micro)Blog : un sujet de recherche actuel [08/02/2011](Micro)Blog : un sujet de recherche actuel [08/02/2011]
(Micro)Blog : un sujet de recherche actuel [08/02/2011]Guillaume Cabanac
 
Barometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociaux
Barometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociauxBarometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociaux
Barometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociauxHelloWork
 
Quels facteurs de pertinence pour la recherche de produits e-commerce ?
Quels facteurs de pertinence pour la recherche de produits e-commerce ?Quels facteurs de pertinence pour la recherche de produits e-commerce ?
Quels facteurs de pertinence pour la recherche de produits e-commerce ?Lamjed Ben Jabeur
 
Moederpresentatie Cross Media Cafe - Uit het Lab
Moederpresentatie Cross Media Cafe - Uit het LabMoederpresentatie Cross Media Cafe - Uit het Lab
Moederpresentatie Cross Media Cafe - Uit het LabMedia Perspectives
 

Viewers also liked (7)

UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
 
Semantic Microblogging
Semantic MicrobloggingSemantic Microblogging
Semantic Microblogging
 
Web-scale semantic search
Web-scale semantic searchWeb-scale semantic search
Web-scale semantic search
 
(Micro)Blog : un sujet de recherche actuel [08/02/2011]
(Micro)Blog : un sujet de recherche actuel [08/02/2011](Micro)Blog : un sujet de recherche actuel [08/02/2011]
(Micro)Blog : un sujet de recherche actuel [08/02/2011]
 
Barometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociaux
Barometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociauxBarometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociaux
Barometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociaux
 
Quels facteurs de pertinence pour la recherche de produits e-commerce ?
Quels facteurs de pertinence pour la recherche de produits e-commerce ?Quels facteurs de pertinence pour la recherche de produits e-commerce ?
Quels facteurs de pertinence pour la recherche de produits e-commerce ?
 
Moederpresentatie Cross Media Cafe - Uit het Lab
Moederpresentatie Cross Media Cafe - Uit het LabMoederpresentatie Cross Media Cafe - Uit het Lab
Moederpresentatie Cross Media Cafe - Uit het Lab
 

More from Lamjed Ben Jabeur

Accès à l’information dans les réseaux sociaux : quelles formes de collaborat...
Accès à l’information dans les réseaux sociaux : quelles formes de collaborat...Accès à l’information dans les réseaux sociaux : quelles formes de collaborat...
Accès à l’information dans les réseaux sociaux : quelles formes de collaborat...Lamjed Ben Jabeur
 
IRIT at clef 2015: A product search model for head queries
IRIT at clef 2015: A product search model for head queriesIRIT at clef 2015: A product search model for head queries
IRIT at clef 2015: A product search model for head queriesLamjed Ben Jabeur
 
Challenges of managing Data Science Project
Challenges of managing Data Science ProjectChallenges of managing Data Science Project
Challenges of managing Data Science ProjectLamjed Ben Jabeur
 
Leveraging social relevance: Using social networks to enhance literature acce...
Leveraging social relevance: Using social networks to enhance literature acce...Leveraging social relevance: Using social networks to enhance literature acce...
Leveraging social relevance: Using social networks to enhance literature acce...Lamjed Ben Jabeur
 
A social model for Literature Access: Towards a weighted social network of au...
A social model for Literature Access: Towards a weighted social network of au...A social model for Literature Access: Towards a weighted social network of au...
A social model for Literature Access: Towards a weighted social network of au...Lamjed Ben Jabeur
 
An Exploratory Study on Using Social Information Networks for Flexible Litera...
An Exploratory Study on Using Social Information Networks for Flexible Litera...An Exploratory Study on Using Social Information Networks for Flexible Litera...
An Exploratory Study on Using Social Information Networks for Flexible Litera...Lamjed Ben Jabeur
 

More from Lamjed Ben Jabeur (6)

Accès à l’information dans les réseaux sociaux : quelles formes de collaborat...
Accès à l’information dans les réseaux sociaux : quelles formes de collaborat...Accès à l’information dans les réseaux sociaux : quelles formes de collaborat...
Accès à l’information dans les réseaux sociaux : quelles formes de collaborat...
 
IRIT at clef 2015: A product search model for head queries
IRIT at clef 2015: A product search model for head queriesIRIT at clef 2015: A product search model for head queries
IRIT at clef 2015: A product search model for head queries
 
Challenges of managing Data Science Project
Challenges of managing Data Science ProjectChallenges of managing Data Science Project
Challenges of managing Data Science Project
 
Leveraging social relevance: Using social networks to enhance literature acce...
Leveraging social relevance: Using social networks to enhance literature acce...Leveraging social relevance: Using social networks to enhance literature acce...
Leveraging social relevance: Using social networks to enhance literature acce...
 
A social model for Literature Access: Towards a weighted social network of au...
A social model for Literature Access: Towards a weighted social network of au...A social model for Literature Access: Towards a weighted social network of au...
A social model for Literature Access: Towards a weighted social network of au...
 
An Exploratory Study on Using Social Information Networks for Flexible Litera...
An Exploratory Study on Using Social Information Networks for Flexible Litera...An Exploratory Study on Using Social Information Networks for Flexible Litera...
An Exploratory Study on Using Social Information Networks for Flexible Litera...
 

Recently uploaded

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Recently uploaded (20)

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

Uprising microblogs: A Bayesian network retrieval model for tweet search

  • 1. Uprising microblogs: A Bayesian network retrieval model for tweet search Lamjed Ben Jabeur, Lynda Tamine and Mohand Boughanem IRIT, Université Paul Sabatier
  • 2. A Bayesian network retrieval model for tweet search Outline 1. Microblogging service 2. Tweet search 3. Bayesian network topology 4. Computing conditional probabilities 5. Experimental evaluation 6. Conclusion and future work 2
  • 3. Microblogging service Microblog? “ Microblogging is a new form of communication [….] that enables users to broadcast and share information about their activities, opinions and status. [Java et al.2007]. ” • Microblog post – Short (140 characters) 1 billions Publications /week – Real-time 50 millions Publications /day – Social motivation 177 million Publications in mars 2011 – Mobile device +106 millions User accounts 3
  • 4. Microblogging service Tweet, retweet et hashtag ? “ Jack Dorsey 21 Mars 06  1ier Tweet inviting coworkers #oilspill “ Stephen Colbert 21 Juin 2010  Golden Tweet Award 2010 In honor of oil-soaked birds, 'tweets' are now 'gurgles. http://bit.ly/cIhZNf “ Wendy's 8 Juin 2011  Golden Tweet Award 2011 RT for a good cause. Each Retweet sends 50¢ to help kids in foster care. #TreatItFwd “ CORIA11 16 mars 2010 CORIA 2011 : Université d'Avignon #CORIA11 http://yfrog.com/h3y ““ MohBoughanem 17 Mars 2010 @coria2011 well visualized, quickly found MohBoughanem CORIA11 17 Mars 2010 4 @coria2011 well visualized, quickly found
  • 6. Tweet search Microblog IR • Users overwhelmed by the huge quantity of tweets – Important publication rate – Diverse sources of information Difficulty to accessing to interesting posts • Microblog IR tasks – Person search and follower suggestion – Trend extraction – Opinion search – Tweet search 6
  • 7. Tweet search Tweet search task “ real-time search task, where the user wishes to see the most recent but relevant information to the query. (Ounis et al., 2011). ” “ adhoc search on Twitter, where a user’s information need is ” represented by a query at a specific time. (Ounis et al., 2011). • Search motivations – access to concise and credible information – access to fresh and real-time news – follow an event – collect opinions and public sentiments 7
  • 8. Tweet search Related work 1. Spatio-temporel context TwitterStand (Sankaranarayanan J. et al, 2009) TweetSieve (Grinev M et al, 2009) 2. Microblog features – followership, tweets, retweets, reply, hashtags, URLs – Linear combination (Nagmoti et al., 2010) – Learn to Rank (Duan Y et al., 2010) 8
  • 9. Tweet search Related work 3. Social network structure – Indegree, Retweet et Mention influence (Cha et al., 2010).,TweetRank, FollowerRank (Nagmoti et al., 2010). – Authority (Kwak et al., 2010) – Influence (Kwak et al., 2010), TwitterRank (Weng et al., 2010), Popularity (Duan et al.,2010) 9
  • 10. Tweet search Contributions topical • Relevance features: – Term occurrence – social influence – time magnitude • Bayesian network model temporal social 10
  • 11. Bayesian network topology Definitions and notations • Query: q  0,1 q, q • Term: ki  0,1 k , ki i • Term configuration: k example : k1 , k 2  k   k1 , k2 ), (k1 , k2 ), (k1 , k2 ), (k1 , k2 ) ( • Tweet: t j  0,1 ti , ti • Microblogger: uk  0,1 uk , uk 11
  • 12. Bayesian network topology Network nodes and edges Query q Terms k1 k2 k3 Tweets t1 t2 t3 Microbloggers u1 u2 12
  • 13. Computing conditional probabilities Query evaluation Query q   P(q  t i )   P(q | k )P(k | t i ) P( t i | u k ) P(u k )  k Terms k1 k2 k3  P(q  t j )   P(q | k )P( t j | u k ) P(u k )  k Tweets t1 t2 t3     P(k i | t j )   P(k i | t j )   k |on(i,k ) 1    i k i |on(i,k )  0  Microbloggers u1 u2 13
  • 14. Computing conditional probabilities Query    P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j )    k |on(i,k )1   k  i k i |on(i,k )  0    P(q | k )   on(i, k ) i , ki q 14
  • 15. Computing conditional probabilities Tweet    P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j )    k |on(i,k )1   k  i k i |on(i,k )  0  P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j ) Term occurrence Tweet properties P( k i | t j )  1  P( k i | t j ) 15
  • 16. Computing conditional probabilities Term frequency    P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j )    k |on(i,k )1   k  i k i |on(i,k )  0  P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j )  a if k i  t j F ( ki , t j ) 1  1 F (ki , t j )   tf ki ,t j 0,8  0 a=0,1  otherwise 0,6 a=0,25 0,4 a=0,5 0,2 a=0,75 0 a=1 0 5 tf ki ,t j10 16
  • 17. Computing conditional probabilities Hashtag    P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j )    k |on(i,k )1   k  i k i |on(i,k )  0  P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j )  b if # k i  t j 1  H (ki , t j )   tf #ki ,t j  b otherwise  17
  • 18. Computing conditional probabilities Time magnitude    P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j )    k |on(i,k )1   k  i k i |on(i,k )  0  P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j ) tweets df k i, j T ( ki , t j )  30 j 20 t1 10 t2 0 1 2 tems 3 4 5   j  t k ,  t j   t k  t  time 18
  • 19. Computing conditional probabilities Tweet length    P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j )    k |on(i,k )1   k  i k i |on(i,k )  0  P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j ) 1 L(t j )  1  avgtl  tltj 19
  • 20. Computing conditional probabilities Microblogger    P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j )    k |on(i,k )1   k  i k i |on(i,k )  0  1 P( t j | u k )  u k 20
  • 21. Computing conditional probabilities Social influence    P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j )    k |on(i,k )1   k  i k i |on(i,k )  0  P(uk )  Inf (uk ) PageRank on Retweet Social Network 1 Inf G 1 (ui ) k Inf Gk (ui )  d  (1  d )  w j ,i U u j ,e ( u j ,ui )E O(u j )  (u j )   (u j ) w j ,i   (u j ) 21
  • 22. Computing conditional probabilities Social influence    P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j )    k |on(i,k )1   k  i k i |on(i,k )  0   (u j )   (ui ) wi , j   (ui ) 22
  • 23. Experimental evaluation TREC 2011 Microblog NESTOR Microblog Search Engine Tweets 16 141 812 Microbloggers 5 356 432 Retweets 1 128 179 Retweet relationships 1 060 551 Tweet 1 860 112 Social network of retweets: nodes 5 495 081 Terms 7 781 775 Social network of retweets: edges 1 024 914 Hashtags 455 179 Giant component 11.12% Term frequency Hashtags Tweet length 1.5E8 1.5E 7 1.5E 6 0 5 10 0 5 10 0 20 Term frequency, hashtags and length distributions 23
  • 24. Experimental evaluation Queries and ground truth • “Arab Spring” query dataset (25 queries) – Topical “Number of protesters in Tahrir”, “Tunisian revolution” – Temporal “ElBaradei arrvies in Egypt”, “Clashes in Tahrir”, “SMS Down Egypt” – Social “Wael Ghonim”, “Mubarak dissolves government” • User rating (relevant, not relevent) • Tweets ranked by Score; p@10; p@20 24
  • 25. Experimental evaluation Configurations and baselines BNTS Bayesian network model for tweet search* BNTS-L BNTS, Tweet length feature disabled BNTS-T BNTS, Time magnitude feature disabled BNTS-H BNTS, Hashtag feature disabled BNTS-S BNTS, Social influence feature disabled BM25 Okapi BM25 VSM Vector Space Model BM Boolean Model *   0.25, a  0.25, b  0.4, t  1h, d  0.15 25
  • 26. Experimental evaluation Features impact BNTS BNTS-L BNTS-T BNTS-H BNTS-S 0,584 0,58 0,552 0,532 0,548 0,542 0,528 0,502 0,294 0,256 p@10 p@20 26
  • 27. Experimental evaluation Features impact Topical BNTS BNTS-L BNTS-T BNTS-H BNTS-S 0,7533 0,7333 0,7233 0,66 0,6867 0,6867 0,6833 0,6833 0,3767 0,2867 p@10 p@20 27
  • 28. Experimental evaluation Features impact Temporal BNTS BNTS-L BNTS-T BNTS-H BNTS-S 0,4333 0,4 0,3333 0,35 0,3 0,3167 0,2333 0,2 0,1 0,0667 p@10 p@20 28
  • 29. Experimental evaluation Features impact Social BNTS BNTS-L BNTS-T BNTS-H BNTS-L 0,3714 0,3286 0,3286 0,3286 0,3357 0,2714 0,2857 0,2429 0,2571 0,2 p@10 p@20 29
  • 30. Experimental evaluation Retrieval effectiveness p@10 p@20 BNTS 0,552 0,548 BM25 0,576 -4% 0,494 11% BM 0,416 ** 33% 0,382 ** 34% VSM 0,376 ** 47% 0,36 ** 52% 30
  • 31. A Bayesian network retrieval model for tweet search Conclusion and future work • Tweet search model – Normalized Term frequency – Time magnitude – Social influence • Integrating relevance factors within a Bayesian network • Query profile impact features performances. • Our model outperforms traditional IR baselines. • Future work – Automatically detect optimal time window – Select appropriate feature depending on the query profile 31
  • 32. Thank you for your attention! Follow me on Twitter! http://twitter.com/amjedbj
  • 33. Computing conditional probabilities Query evaluation    q P(t j | q)   P(q | k ) P(t j | k )P(k )  k      k1 k2 k3 P(t j | q)   P(q | k ) P(tkj | k )P(toj | k ) P(t sj | k ) P(k )  k o1 o2 u1 u1 tk1 tk2 tk3 to3 to2 to3 ts1 ts2 ts3 t1 t2 t3 33
  • 34. Experimental evaluation Term frequency normalization • BNTS.K p @ 30  1 tf ki ,t j    0,35 P(t kj | k )    0,3 k ki k t j tf ki ,t j 0,25 0,2 0,15 0,1 0,05 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1  34
  • 35. Experimental evaluation Time window • BNTS.KO p @ 30 0,32  t t  oe :  oe  , oe   0,315  2 2 0,31 0,305 0,3 0,295 jours 0,29 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 35 t
  • 36. Experimental evaluation Retrieval effectiveness isiFDL DFReeKLIM30 BNTS Médiane Nestor BM25 Disjunctive 0,5 0,45 0,4 0,35 0,3 0,25 0,2 0,15 0,1 0,05 0 p@30 MAP 36
  • 37. Experimental evaluation TREC Microblogs 2011 Ranked by time Ranked by score All rel High rel All rel p@30 MAP p@30 MAP p@30 MAP Nestor* 0.2027 0.1305 0.0838 0.1287 0.2218 0.1384 Nestor-S* 0.2027 0.1305 0.0838 0.1286 0.2184 0.1360 Nestor-T 0.2082 0.1343 0.0585 0.0912 0.1912 0.1196 Nestor-L 0.2048 0.1306 0.0565 0.0867 0.2293 0.1426 Median 0.2592 0.1433 0.2646 0.1381 37
  • 38. Experimental evaluation TREC Microblogs 2011 Système Seuil p@10 p@20 p@30 Map 1 Somme IDF des termes présents 30 0,3633 0,3316 0,3333 0,1759 2 BM25 30 0,3571 0,3245 0,2973 0,1546 3 Proportion des termes présents 30 0,2653 0,2561 0,2782 0,14 4 Somme des fréquences booléennes 30 0,2571 0,2663 0,2755 0,1387 5 EBM (AND) 30 0,3041 0,2918 0,2714 0,1282 6 Réseau d’inférence Bayésien 30 0,302 0,2888 0,2687 0,1274 7 Somme TF*IDF 30 0,302 0,2888 0,2687 0,1274 8 VSM 30 0,302 0,2888 0,2687 0,1274 9 Somme TF 30 0,2327 0,2276 0,2238 0,1066 10 Nestor 0,2857 0,2347 0,2027 0,1305 11 EBM (OR) 30 0,1837 0,1786 0,166 0,0541 12 Sommes des fréquences des Hashtags 30 0,1612 0,1541 0,1469 0,0512 13 Lucene-Baseline 1000 0,1612 0,1143 0,0986 0,1411 14 Somme TF (normalise par longueur) 30 0,0816 0,0673 0,0612 0,0223 15 Ordre chronologique inverse 30 0,0184 0,0255 0,0218 0,0082 38