The RoboCup Rescue Dataset

▶ robocup.tugraz.at
* LORENZ Peter and STEINBAUER Gerald
August 2018
Institute of Software Technology
Graz University of Technology, Austria
The RoboCup Rescue Dataset
1

▶ robocup.tugraz.at▶ robocup.tugraz.at
Content
● Introduction - RoboCup Rescue League
● Data Set
○ Properties (Database Size, Image Size, Ground Truth)
○ Definition of the Confusion Matrix
● Standard Algorithms
○ Haar Cascade
○ CNN-Overfeat
○ Histogram of Oriented Gradients (HOG)
● Own Method: Bag of Visual Words
● Results

RoboCup Rescue League
● Competition between international teams
● Several disciplines - our Focus: Autonomy
3
German Opens in
Magdeburg, 2015.

Dataset Properties
● Size: 640×480 pixels
● Images for binary classification
○ 955 positive images: Victim on it.
○ 746 negative images: No victim.
● XML-Files: contain a polygon, that labels the
victim’s face → Ground Truth. (Only for positive images.)

Confusion Matrix (1/4)
● True Positive (TP): Actual class and the prediction
of the detector are intersecting beyond a certain
threshold.
Rectangles:
- Ground Truth
- Prediction

● True Negative (TN): There is no victim on the
image and there is no victim predicted.
Rectangles:
- Ground Truth
- Prediction

● False Positive (FP): There is no victim on the
image, but there is one predicted.
Rectangles:
- Ground Truth
- Prediction

● False Negative (FN):
○ There is a victim, but no prediction.
○ The prediction does not intersect with the actual
class.
Rectangles:
- Ground Truth
- Prediction

Haar Cascade [Viola and Jones]
● Is continuously running, due to low computational
expensiveness.
● Standard Face Detection Algorithm.
● Face can only recognized in portrait. Solution:

Haar Cascade (Recap)
● Haar Features:
● Similar to a Convolution.
● Each feature is a single value obtained by subtracting sum of pixels.
under white rectangle from sum of pixels under black rectangle.
● Integral Image
● Adaboost
● Cascade
Edge Features
Line Features

Haar Cascade
● No big difference between 30% and 70% threshold
haar cascade 30% threshold. haar cascade 70% threshold.
Overlapping Area
Rectangles:
- Ground Truth
- Prediction

CNN-Overfeat [Sermanet, Eigen, Zhang, Mathieu, Fergus and LeCun]
● Takes about 2 seconds for prediction.
● Only used, by previous action → heat detected.
● Is not trained by this dataset. Private dataset, data
used from public sources.
● Last layer is trained for classification.

CNN-Overfeat
● The results tends to be positive!
CNN-Overfeat.
● Instead of only faces, whole images as training.
Learns whole image:
○ Whole body
○ Clothes
○ Environment
● Different dataset

Histogram of oriented Gradients [Dalal and Triggs]
● HOG feature extraction (Recap)
○ Compute centered horizontal and vertical gradients.
○ Compute gradient orientation and magnitudes.
○ Example: Divide the image into 16x16 block of 50%
overlap. Each block consists of 2x2 cells.
○ Quantize the gradient orientation into 9 bins
■ Vote is the gradient magnitude
■ Interpolate votes bi-linearly between neighboring
bin center.
○ Concatenate histograms

Histogram of oriented Gradients
Goal of the HOG detection. A real human
face [Haghighat] is taken and on the right
side signs of nose, eye socket and lips
can be seen.
Parameters: Cell size = 2x2,
Block Size = 1x1
TP is a
low!
HOG
.

Bag of Visual Words (BoVW)
● Given:
○ positive training images containing an object class
“victim” → we cropped only to have the faces.
○ negative training images that do not.
● Classify:
○ a test image whether it contains the object class or
not.

Several Stages for BoVW
1. Extract SIFT features from each image
● Invariant:
● Image transformation
● Lighting variations
● Occlusions
2. Setup a visual vocabulary from the training data
● Mini-Batch-K-Means: centroids are new words
● A histogram (length corresponds to the number of words) is
created for each image.
● A main histogram is setup. Combined of all sub-
histograms

3. Pyramid Matching (Graumann and Darrel)
● to distinguish the data more discriminatively
● we do not know how far away the victim is from
our camera.
● merges information of 3 levels.

4. Recognize Victims via SVM
● x_i and x_j are the i-th and j-th histogram.
● Similarity:
● SVM is trained:
● Kernel: G_ij
● {victim, no_victim}
● {dark_light, hole, normal_light, no_victim}

2 Categories
We splitted the dataset into 2 categories:
1. with_puppet: There is at least one victim in the scene.
2. without_puppet: There is no victim in the scene.
● Accuracy is about 78.57%.
● The algorithm can strongly distinguish between the 2
categories.
● Sometimes, the algorithm finds faces in the wood
pattern.

More Categories
We splitted the dataset into 4 categories:
1. dark_light: Faces are not well illuminated.
2. hole: Victim is situated in a hole in the wall.
3. normal_light: The victim’s face is lighted perfectly.
4. without_puppet: A scene of the arena, where no victim can be seen.
● Accuracy falls down on 56.90%.
● Diagonal of the matrix ought to be dark blue for a
perfect prediction.
● Mixes dark_light and hole.
● Mixes dark_light and normal_light.
● The algorithm can strongly distinguish puppet and
without_puppet.

Thank you!
Q&A
Q: When will you provide a download link for the database?
A: I will provide a download link within next few weeks. https://osf.io/dwsnm/

References
1. K. Lassnig and S. Loigge, “RoboCup Rescue 2016 Team Description Paper Tedusar,” in Proc. of the Intern. RoboCup
Symposium, 2016, robocup2016.org/Tedusar.pdf.
2. P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “OverFeat: Integrated Recognition, Localization
and Detection using Convolutional Networks,” CoRR, vol. abs/1312.6229, 2014. [Online]. Available:
http://arxiv.org/abs/1312.6229.
3. D. Sculley, “Web-scale K-means Clustering,” in Proc. Intern. Conf. on World Wide Web. New York, NY, USA: ACM, 2010,
pp. 1177–1178. [Online]. Available: http://doi.acm.org/10.1145/1772690.1772862
4. K. Grauman and T. Darrell, “The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features,” in
Intern. Conf. on Computer Vision, 2005. [Online]. Available: http://www.cs.utexas.edu/users/ai-lab/?kgrauman:iccv2005
5. N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” in Intern. Conf. on Computer Vision and
Pattern Recognition, vol. 1, 2005, pp. 886–893 vol. 1.
6. X. Peng, L. Wang, X. Wang, and Y. Qiao, “Bag of Visual Words and Fusion Methods for Action Recognition:
Comprehensive study and good Practice,” Intern. Conf. on Computer Vision Image Understanding, vol. 150, pp. 109 – 125,
2016. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1077314216300091
7. Q. Zhu, Y. Zhong, B. Zhao, G. S. Xia, and L. Zhang, “Bag-of-Visual-Words Scene Classifier With Local and Global
Features for High Spatial Resolution Remote Sensing Imagery,” IEEE Geoscience and Remote Sensing Letters, vol. 13,
no. 6, pp. 747–751, 2016.
8. M. Haghighat, “Biometrics for Cybersecurity and Unconstrained Environments,” 2016,
scholarlyrepository.miami.edu/oa_dissertations/1675
9. B. C. Russell, A. Toolbar, K. P. Murphy, and W. T. Freeman, “LabelMe: A Database and Web-Based Tool for Image
Annotation,” Intern. Journal Computer Vision, vol. 77, no. 1-3, pp. 157–173, 2008. [Online]. Available:
http://dx.doi.org/10.1007/s11263-007-0090-8
10. P. Viola and M. Jones, “Robust Real-Time Face Detection,” Intern. Journal Computer Vision, vol. 57, no. 2, pp. 137–154,
2004. [Online]. Available: https://doi.org/10.1023/B:VISI.0000013087.49260.fb

Future Work
● As a replacement of Haar Cascade→Aggregate
Channel Features (ACF) https://github.com/pdollar/toolbox
○ Faster.
○ Might be more accurate.
○ Detect face in profile possible. http://www.cbsr.ia.ac.cn/users/zlei/papers/Yang-IJCB-14.pdf
● As a replacement of CNN-Overfeat → DCNN by Li
et al. http://users.eecs.northwestern.edu/~xsh835/assets/cvpr2015_cascnn.pdf
○ 14 FPS on a single CPU.
○ Robust:
■ Occlusion.
■ Light.

The RoboCup Rescue Dataset

Recommended

Recommended

More Related Content

Similar to The RoboCup Rescue Dataset

Similar to The RoboCup Rescue Dataset (20)

Recently uploaded

Recently uploaded (20)

The RoboCup Rescue Dataset