https://imatge.upc.edu/web/publications/semantic-summarization-egocentric-photo-stream-events
With the rapid increase of users of wearable cameras in recent years and of the amount of data they produce, there is a strong need for automatic retrieval and summarization techniques. This work addresses the problem of automatically summarizing egocentric photo streams captured through a wearable camera by taking an image retrieval perspective. After removing non-informative images by a new CNN-based filter, images are ranked by relevance to ensure semantic diversity and finally re-ranked by a novelty criterion to reduce redundancy. To assess the results, a new evaluation metric is proposed which takes into account the non-uniqueness of the solution. Experimental results applied on a database of 7,110 images from 6 different subjects and evaluated by experts gave 95.74% of experts satisfaction and a Mean Opinion Score of 4.57 out of 5.0.
Semantic and Diverse Summarization of Egocentric Photo Stream Events
1. LIFELOGGING
TOOLS &
APPLICATIONS
WORKSHOP
ACM
MULTIMEDIA
Mountain View, CA, USA
23 October 2017
Xavier Giro-i-Nieto
xavier.giro@upc.edu
Associate Professor
Universitat Politecnica de Catalunya
Technical University of Catalonia
Semantic Summarization
of Egocentric Photo Stream
Events
#ACMMM
#lifelogging@DocXavi[Slides on Slideshare]
2. Joint Work
Aniol Lidon Marc Bolaños Mariella Dimiccoli Petia Radeva Maite Garolera Xavi Giró
5. Motivation
P. Piasek, K. Irving, and A.F. Smeaton, “SenseCam intervention based on Cognitive Stimulation Therapy
framework for early-stage dementia”. In Pervasive Computing Technologies for Healthcare 2011.
19. Relevance ranking
Saliency maps
Aim:
Relevance score: Average saliency per image.
Junting Pan, Kevin McGuinness, Elisa Sayorl, Noel E. O’Connor, Xavier Giró-i-Nieto, “Shallow and Deep
Convolutional Networks for Saliency Prediction”. CVPR 2016.
20. Relevance ranking
Objects
Aim:
Relevance score: Sum of all object detection scores.
Judy Hoffman,Sergio Guadarrama, EricTzeng,Je Donahue,RossB.Girshick, Trevor Darrell, and Kate Saenko.
2014. LSDA: Large Scale Detection Through Adaptation. CoRR abs/1407.5035 (2014).
21. Relevance ranking
Faces
Aim:
Relevance score: Exponential sum of face detection scores.
Xiangxin Zhu and Deva Ramanan. 2012. Face detection, pose estimation, and landmark localization in the
wild. CVPR 2012.
22. Relevance ranking
A ranked list rk
(x) is built for each of the k semantic-aware
rankings of M elements, with scores normalized by position.
1
...
0
1
...
0
1
...
0
Saliency Objects Faces
24. Weighted Fusion of Ranks
A weighted linear combination of scores to build r(x).
1
...
0
1
...
0
1
...
0
Saliency Objects Faces
...
Fused
25. Weighted Fusion of Ranks
Weights can be estimated based on the performance of each
relevance rank separately in solving the summarization task.
Set (Ground truth)
RankedList
(Prediction)
Good match ?
26. Weighted Fusion of Ranks
Summaries of any size (T) can be built from the ranked list by
thresholding.
Set (Ground truth)
RankedList
(Prediction)
Good match ?
T=1
27. Weighted Fusion of Ranks
Summaries of any size (T) can be built from the ranked list by
thresholding.
Set (Ground truth)
RankedList
(Prediction)
Good match ?T=2
28. Weighted Fusion of Ranks
Summaries of any size (T) can be built from the ranked list by
thresholding.
Set (Ground truth)
RankedList
(Prediction)
Good match ?
T=3
29. Weighted Fusion of Ranks
Summaries of any size (T) can be built from the ranked list by
thresholding.
Set (Ground truth)
RankedList
(Prediction)
Good match ?
T=4
30. Weighted Fusion of Ranks
Summaries of any size (T) can be built from the ranked list by
thresholding.
Set (Ground truth)
RankedList
(Prediction)
Good match ?
T=5
31. Weighted Fusion of Ranks
Forcing an exact matching between predictions and ground
truth is a too hard approach.
34. Weighted Fusion of Ranks
Set (Ground truth)
RankedList(Prediction)
P: Size of ground truth summary
xvi
: Images in the ground truth set
YT
: Predicted summary of size T
s(x,YT
): Maximum similarity of x with
respect to all elements in set Y
T=1
35. RankedList(Prediction)
Weighted Fusion of Ranks
Set (Ground truth)
T=2
P: Size of ground truth summary
xvi
: Images in the ground truth set
YT
: Predicted summary of size T
s(x,YT
): Maximum similarity of x with
respect to all elements in set Y
36. RankedList(Prediction)
Weighted Fusion of Ranks
Set (Ground truth)
T=3 P: Size of ground truth summary
xvi
: Images in the ground truth set
YT
: Predicted summary of size T
s(x,YT
): Maximum similarity of x with
respect to all elements in set Y
37. RankedList(Prediction)
Weighted Fusion of Ranks
Set (Ground truth)
T=4
P: Size of ground truth summary
xvi
: Images in the ground truth set
YT
: Predicted summary of size T
s(x,YT
): Maximum similarity of x with
respect to all elements in set Y
38. RankedList(Prediction)
Weighted Fusion of Ranks
Set (Ground truth)
T=5
P: Size of ground truth summary
xvi
: Images in the ground truth set
YT
: Predicted summary of size T
s(x,YT
): Maximum similarity of x with
respect to all elements in set Y
48. Novelty-based re-ranking
We estimate the novelty of a candidate image x* with respect
to a set Yt
based on the visual similarity to its closest match:
Visual similarity computed as Euclidean distance between
FC7 features from AlexNet-like CNN (CaffeNet).
49. Novelty-based re-ranking
Re-ranking as a greedy selection between candidate images
x* to build a set Y of t+1 images, leveraging relevance r(x) and
novelty n(x,Y).