SlideShare a Scribd company logo
1 of 8
Download to read offline
3D Human Pose and Shape Estimation from Multi-view Imagery
Atul Kanaujia
ObjectVideo, Inc.
akanaujia@objectvideo.com
Niels Haering
ObjectVideo, Inc.
nhaering@objectvideo.com
Graham Taylor
New york University
gwtaylor@cs.nyu.edu
Chris Bregler
New York University
chris.bregler@nyu.edu
Abstract
In this study we present robust solution for estimating
3D pose and shape of human targets from multiple, syn-
chronized video streams. The objective is to automatically
estimate physical attributes of the targets that would allow
us to analyze its behavior non-intrusively. Proposed system
estimates the anthropometric skeleton, pose and shape of
the human target from the 3D visual hull reconstructed from
multiple silhouettes of the target. Discriminative (bottom-
up) method is used to first initialize 3D pose of the tar-
gets using low-level features extracted from the 2D image.
The pose is refined using generative (top-down) method
that also estimates the optimal skeleton of the target us-
ing anthropometric prior models learned from the CAESAR
dataset. Statistical shape models are also learned from the
CAESAR dataset and are used to model both global and lo-
cal shape variability of human body parts. We also propose
a novel optimization scheme to fit 3D shape by searching in
the parametric space of local parts model and constraining
the overall shape using a global shape model. The system
provides a useful framework for automatically identifying
dispropotionate body parts, estimating size of backpacks
and inferring attributes like gender, age and ethnicity of the
human target.
1. Introduction
With rapid advancements in computer vision technology
and the emergence of matured technologies for detection
and tracking of human targets from a significant stand-off
point, there is a greater need for cognitive video analytics
with the ability to infer subtle attributes of humans and ana-
lyze human behavior. Tremendous progress has been made
in sensor technology in recent years, thus enabling develop-
ment of advanced video sensors with gigapixel resolution,
that are capable of providing sufficient image resolution for
detailed analysis of human targets from a large distance.
In this paper we propose a framework for inferring var-
ious human attributes from multi-view video based 3D hu-
man pose and shape estimates. Detailed 3D human shape
estimation from multi-view imagery is still a difficult prob-
lem that does not have satisfactory solution. Our fully auto-
mated system estimates the skeleton, 3D pose and shape of
human targets from multi-view images obtained from syn-
chronized and calibrated sensors, in a non-intrusive way.
Concealed objects typically manifests as local artifacts on
human body surface and are difficult to fit using global hu-
man shape models. In order to accurately fit local bulges on
the human body, we use an iterative mechanism to locally
fit a body part shapes to the sensor data. The estimated 3D
shape of the target is used to classify its gender, whether it is
carrying backpack or not, concealing an object and to infer
dimensions of various body parts.
Contributions: Our approach combines the strengths
of discriminative (bottom-up) approach with model-based
generative (top-down) algorithms for efficient estimation
of 3D pose and shape of the target. In that respect, our
work is similar to [19]. However, as discussed in section
3, our human modeling technique is significantly differ-
ent in representation compared to theirs. In addition, we
use local, parts-based shape models for fitting 3D shapes
to the data. Local parts-based shape models provide richer
and more flexible representations of human body shapes en-
abling improved fitting to the abnormal target shapes. Fol-
lowing are the key contributions of our approach: (1) We
propose coarse-to-fine, 3D pose and shape fitting algorithm
that uses an intermediate step of pose refinement using a
cylindrical parts based human shape model. This allows us
to easily enforce anthropometric constraints like non-self
penetration of body parts, which is costly to impose when
modeling finer surface deformation ; (2) To efficiently infer
both skeleton and shape of the human target in any pose,
we model human body using a joint subspace of anthropo-
metric skeletons and 3D shapes ; (3) We have developed a
novel parts-based 3D shape and pose optimization scheme
that fits the part shapes locally to the observation at the same
time constraining the overall shape globally ; (4) We extend
3D shape model to fit the shapes of humans carrying acces-
sories such as backpack.
Related work: Initial work on marker-less motion-capture
focused on accurate 3D pose estimation from single and
49
multi-view imagery. A comprehensive survey of existing
state of the art techniques in vision-based motion capture
is provided in [15]. Bregler and Malik [4] proposed a rep-
resentation for articulated human models using twists that
has been widely employed in a number of single and multi-
ple camera based motion capture systems [9, 17, 21, 8, 7].
Compared to earlier approaches [15] that modeled hu-
man shapes with cylindrical or superquadrics parts, current
methods use more accurate modeling of 3D human shapes
using SCAPE body models [2] or CAESAR dataset [1]. A
number of recent multi-camera based systems proposed by
Balan and Sigal [2, 3, 19] employed SCAPE data to model
variability in 3D human shapes due to anthropometry and
pose. They have used these shape models to estimate hu-
man body shape under loose clothing and also efficiently
track across multiple frames. Guan et. al[12] used SCAPE
based shape model to perform height-constrained estima-
tion of body shape. These approaches however lack ar-
ticulated skeleton underlying the human body shape. The
3D shape deformation of body surface is captured by track-
ing the 3D mesh surfaces directly. Deforming the 3D mesh
while maintaining the surface smoothness is not only com-
putationally demanding but also ill-constrained, occasion-
ally causing poor surface deformation due to noisy silhou-
ettes (or visual hull). In parallel to above approaches, Mun-
derman et. al [16] developed a SCAPE model with an un-
derlying skeleton to track 3D shapes of a human target in
multi-view image sequences using an extension of Itera-
tive Closest Point (ICP) algorithm. Our proposed system
resembles more closely to the work proposed by Gall et.
al[8],[21]. In addition to proposing combined skeleton and
3D shape based human models, they fit 3D pose to multi-
view image data using a combined local and global opti-
mization scheme.An extension of the above work [9] used
action based priors to improve pose tracking and 3D shape
estimation from multi-view image data. Moll et. al [17]
proposed a multi-modal system to improve 3D human pose
and shape estimation from multi-view imagery by using
both visual cues and global orientation information from in-
ertial sensors. Chen et. al [5] developed a non-linear mani-
fold representation of 3D shape variability of humans due to
pose and anthropometry. They use non-linear optimization
to search in the low-dimensional parameter space of shape
and camera parameters to optimally fit 3D shape to the sil-
houette.
2. System Overview
Fig. 1 shows the overview of the system. The system
uses synchronized streams of multi-view image sequences
of human target from a set of calibrated cameras as inputs.
It generates a 3D volumetric reconstruction (visual hull) of
the target using space carving (fig. 1(a)) from the target sil-
houettes. We use bottom-up predictors to generate initial
Figure 1. Overview of the proposed system for 3D human pose and
shape estimation;(a) 3D Data is acquired as volumetric reconstruc-
tion using space carving;(b) Human shape is modeled by register-
ing 3D template mesh to laser scans of human subjects and using it
to learn PCA subspace ; (c) Bottom-up predictors are used to gen-
erate initial human pose hypotheses using features extracted from
the image; (d) Pose predictions are refined by top-down search
in pose and shape space using a coarse 3D human shape model ;
(e) Detailed 3D shape is estimated by searching in the parametric
space of shape models of individual parts; (f) Estimated pose and
shape is used for inferring human attributes and anomalous shapes
hypotheses of the articulated 3D pose of the human inde-
pendently from each sensor and fuse them at the semantic
3D pose level (fig. 1(c)).
The 3D pose is refined by top-down (generative) meth-
ods that uses Markov Chain Monte Carlo (MCMC) based
search to efficiently fit a coarse 3D human shape model
(with cylindrical body parts) to the extracted visual hull
(fig. 1(d)). The top-down models are used to search in
both pose and parametric space of skeleton and coarse
3D human shapes to maximize the overlap with the visual
hull. We model the space of detailed human shape vari-
ation using Principal Component Analysis (PCA). Human
3D shape model is learned by first establishing one-to-one
correspondence between a hole-filled, template 3D mesh
model and a corpus of human body scans from CAESAR
Dataset [18] (fig. 1(b)). The registered 3D mesh data is
used to learn low-dimensional models for local parts-based
and global shape variability in humans. Detailed 3D shape
of a target human is obtained by searching in the PCA-based
low-dimensional parametric shape space for the best fitting
match (fig. 1(e)).
The developed system is used for analyzing 3D human
shapes and inferring attributes of the human target such as
50
Figure 2. 3D mesh surface and underlying skeleton of a template
human model is iteratively deformed to align it to the human body
scan data (CAESAR dataset). All scans have 73 landmark points
on body surface that are used for 3D shape registration
gender and dimensions of their body parts. In all of our ex-
periments we employed 4 calibrated cameras placed along
directions to maximally capture the entire viewing sphere
around the target. Although using fewer cameras introduces
ambiguity, we overcome this problem by using efficient an-
thropometric priors for searching in both pose and shape
space.
3. 3D Human Pose and Shape Modeling
We model human body as combination of an articulated
skeleton and 3D shape. The shape is modeled both coarsely
(using cylindrical parts) and finely (using detailed 3D sur-
face mesh). We learn the 3D shape models for both entire
human body and individual body parts (15 components).
We make the assumption that the human body shape gets
deformed only due to the underlying skeleton (and not due
to other factors such as clothing). Use of skeleton in de-
forming a 3D mesh surface is more robust to noisy silhou-
ettes compared to skeleton free shape estimation [2] as it
puts additional constraints to the shape fitting by searching
in the parametric space of human shape models.
3D Data Acquisition: Targets are localized using change
detection. We model background pixel intensity distribu-
tion as non-parametric kernel density estimate to extract
silhouettes of moving targets. Image streams from multi-
ple calibrated sensors are used to reconstruct 3D volumetric
representation (visual hull) of the human target using space
carving. We use octree-based fast iterative space carving
algorithm to extract volumetric reconstruction of the target.
A single volume (cube) that completely encloses the work-
ing space of the acquisition system is defined. Based on the
projection to the camera image plane each voxel is classi-
fied as inside, outside or on the boundary of the visual hull
using the target silhouette. The boundary voxels are itera-
tively subdivided into eight parts (voxels) until the size of
voxels is less than the threshold size.
As 2D shapes of the silhouette play a critical role in dis-
criminative 3D pose prediction (see section 4), visual hull
is back projected to obtain clean silhouettes of the target us-
ing Z-buffering. The improved silhouettes generate cleaner
shape descriptors for improved 3D pose estimation using
bottom-up methods.
Human 3D Shape Registration: Laser scans of human
body from CAESAR dataset are used to learn parametric
models for 3D human shapes. Human body scans are first
registered to a perfect, hole-filled, reference template hu-
man model composed of both 3D mesh surface and accu-
rately aligned skeleton. We use a detailed template model
of standard anthropometry, in order to capture subtle and
wide range of variations in human 3D shapes. The CAE-
SAR dataset has 73 landmark points on various positions,
and these are used to guide the 3D shape registration. The
deformation is an iterative process that gradually brings the
template surface mesh vertices (and the skeleton) close to
the laser scan data points by translating them along surface
normal while maintaining the surface smoothness.
Anthropometric Prior and Coarse Human Shape Mod-
eling: We learn parametric models for the space of human
skeletons and coarse representation of 3D shape of the hu-
man body L using cylindrical parts (see fig. 3). Princi-
pal Component Analysis (PCA) is used to learn the space
of human skeletons and variability of dimensions of the
cylindrical body parts from the registered CAESAR dataset
[18](see fig. 2). The space of human skeletons is parame-
terized using 5 dimensional PCA subspace, capturing 94%
of the variability in length of skeletal links. The coarse 3D
human shape model parameters L = [l r1 r2] include the
length and the two radii of the tapered cylindrical human
parts.
Global and Part-based Shape Modeling: We characterize
the space of human body shapes and the individual body
parts using Principal Component Analysis(PCA). Global
3D human shape models are excessively restrictive in cap-
turing shape variabilities due to a concealed object and dis-
proportionate or abnormally sized body part. In compar-
ison, parts-based 3D shape models are richer in model-
ing asymmetries and surface protrusion arising due to ob-
ject concealment. We use PCA to learn subspace for each
of the body parts from the parts vertices of the registered
shape, that are in one-to-one correspondence with the pre-
segmented template mesh model.
Detailed Parts Shapes from Coarse Human Model: In
order to efficiently initialize the detailed 3D parts from the
coarse cylindrical body parts, we employ approach simi-
lar to [1], for learning relation between the PCA coeffi-
cients of the ith
body part and dimensions of its corre-
sponding cylindrical shape model (L(i)
=

l(i)
r
(i)
1 r
(i)
2

).
Specifically, we learn linear regression map from the
PCA coefficients [P]Nxk of the N data points in k-
dimensional PCA subspace. For the regression function :
51
Figure 3. (Top left) Space of articulated human skeletons; (Top
right) Coarse human shape model used in our system; (Bottom
left) Average detailed human shape model ; (Bottom right) Coarse
human shape model with size of parts estimated from the detailed
3D shape
M

l(i)
r
(i)
1 r
(i)
2 1
T
=

P
(i)
1 · · · P
(i)
k
T
. The mapping is
learned as a pseudo-inverse:
M = P(LLT
+ λI)−1
(1)
where λ is the regularization constant of the ridge regres-
sion. The PCA coefficients of the detailed 3D shape of the
ith
body part can be directly computed from the dimensions
of cylindrical body part as M[l(i)
r
(i)
1 r
(i)
2 1].
4. Bottom-up 3D Pose Estimation
Due to high degree of articulation of human body,
searching in high dimensional pose space is prone to lo-
cal optima. We overcome this problem by initializing the
search near the global optima using discriminative (bottom-
up) methods. To this end, we employ a regression based
framework to directly predict multiple plausible 3D poses
(obtained as probabilistic distribution over pose space) us-
ing the visual cues extracted from individual sensors. The
predictive distribution from multiple sensors are then ob-
tained by simple summing these distributions.Inferring 3D
pose using only 2D visual observation is an ill-posed prob-
lem, due to loss of depth information from perspective pro-
jection. Learning therefore involves modeling inverse per-
spective mapping that is one-to-many, as several 3D human
configurations can generate similar 2D visual observations.
We therefore model these relations as multi-valued map-
pings using Bayesian Mixture of Experts (BME)[20] model.
Formally, the BME model is
p(x|r) =
M

i=1
gi(r)pi(x|r) (2)
gi(r) =
exp(λ⊤
i r)

k exp(λ⊤
k r)
(3)
pi(x|r) = G(x|Wir, Ω−1
i ) (4)
where r is the input or predictor variable(image descrip-
tors), x is the output or response(3D pose parameters), and
gi is the input-dependent positive gate functions. Gates
gi output value between [0, 1] and are computed using (3).
For a particular input r, gates output the probability of the
expert function that should be used to map r to the out-
put pose x. In the model, pi refers to Gaussian distribu-
tions with covariances Ω−1
i centered at different ”expert”
predictions. BME is learned in Sparse Bayesian Learning
(SBL) paradigm that uses Automatic Relevance Determina-
tion(ARD) mechanism to train sparse (less parameterized)
models of regression. We use accelerated training algorithm
based on forward basis selection[6] to train our discrimina-
tive models on a large database of labeled poses observed
from different viewpoints.
In multi-camera settings, visual cues can be fused at fea-
ture level to train a single discriminative model to predict
3D pose using concatenated feature vector obtained from
multiple sensors. However, such a model will be depen-
dent on the camera configurations. Rather, we train a sin-
gle mixture of expert model to predict 3D pose from sin-
gle camera input but with training examples captured from
multiple viewpoints. We use this model to predict poses
from each of the viewpoints independently. The combined
predictive distribution is obtained by simply summing the
mixture of Gaussian distributions obtained from each of the
sensor models C = {C1, · · · , CN } with gate weights re-
weighted to sum to one:
p(x|r, W, Ω, λ) =
N

Cj
M

i=1
gij(r|λij)pij(x|r, Wij, Ω−1
ij )
(5)
where N is the number of sensors and M are the experts
in each of the Mixture of Experts model used to learn the
mapping.
5. Top-down 3D Pose Refinement and Coarse
Shape Estimation
Generative(top-down) model based feedback stage is
used to further refine the 3D pose estimates obtained from
bottom-up methods. Our generative model consist of a
coarse 3D human shape model with each body part rep-
resented using simple geometric primitive shapes such as
tapered cylinders. Geometric shapes allow fast image like-
lihood computation and enforcing non-self penetration con-
straint for the body parts. The top-down search fits the hu-
man model to the visual hull by optimizing the parameters
of the human skeleton model (5 dimensional), coarse 3D
52
Figure 4. (left) Top-down model fitting is initialized by aligning
the root joint and the centroid of the visual hull(shown in blue)
(right) Overlap cost is computed as number of voxels(visual hull
elements) lying inside the cylindrical body part. Parts self inter-
section is penalized by adding an additional cost proportional to
(R1 + R2 − D) for every self-penetrating part.
shapes (5 dimensional) and joints angles (≈ 15 after vari-
ance based pruning). We use predictive distribution from
the feed-forward methods to prune the joint angles having
low variance. The likelihood cost is computed as sum of
degree of overlap of each part to the visual hull with an
added cost for each pair of intersecting parts (see fig. 4). In
computing the self-penetration cost, we compute the short-
est distance D between the two axes of the cylindrical body
parts of radii R1 and R2. For the two intersecting parts, we
add a penalty term proportional to (R1 + R2 − D) in the
likelihood function.
Stochastic Optimization using MCMC: We use Markov
Chain Monte Carlo (MCMC) simulation for searching in
the parameter space of the human skeletal links(L), the
coarse shape models (S) and 3D pose (θ). MCMC is a
suitable methodology for computing a maximum a pos-
terior(MAP) solution of the posterior argmaxxp(x|r) by
drawing samples from the proposal density (that approxi-
mates the posterior) using a random walk based Metropolis
algorithm[14]. At the tth
iteration, a candidate xi
is sam-
pled from a proposal distribution q(x′
|xt−1) and accepted
as the new state with a probability a(xt−1 → x′
) where:
a(xt−1 → x′
) = min{1,
p(x′
|r)q(xt−1|x′
)
p(xt−1|r)q(x′|xt−1)
} (6)
where x′
= {L, S, θ} are the parameters which are op-
timized to maximize the overlap between the coarse 3D
human model and visual hull. Here S denotes the low-
dimensional PCA coefficients of anthropometric prior. In
order to avoid local optima, we use simulated annealing that
gradually introduces global optima in the distribution to be
maximized p(x|r)1/Ti
. The parameter Ti is gradually de-
creased under the assumption that p(x|r)∞
mostly concen-
trates around the global maxima[10].
Proposal Map Computation: The proposal distribution
plays critical role in MCMC search and is assumed to be
independent for shape and pose parameters. We adopt
Metropolis algorithm for sampling our proposal map that
are not conditioned on the current state xt−1. The proposal
distribution q(θ) is obtained as mixture of Gaussians from
the bottom-up predictors (5) and are ill-suited for searching
in the joint angle space. Sampling from the angular pri-
ors of the joints higher in the skeletal hierarchy (such as
shoulder and femur joints) may produce larger spatial mo-
tion compared to the lower joints (such as elbow and knee
joints). Optimizing simultaneously in the entire 3D pose
space may cause instability and more iterations for conver-
gence. This problem may be resolved by fitting joints higher
in the skeletal hierarchy first. We adopt a more principled
approach [13] whereby we sample from the spatial prior as
opposed to angular prior. Specifically, for the ith
skele-
tal link, we sample from the p(θi, Σθi ) = N(F(θi), ΣF )
and F(θi) = F(θ
(p)
i ) ∗ R(θi) + T (θi) where F(θi) is the
end location of the ith
joint link and θ
(p)
i is its parent joint.
Sampling from F(θi) is not straight forward as unlike θi,
it spans non-linear manifold M. In order to compute the
covariance, we linearly approximate the manifold at a point
by the tangent space at that point. We compute the jacobian
J and use it to compute covariance as ΣF = Jθi Σθi JT
θi
. At
tth
iteration, sampling from the distribution N(F(θi), ΣF )
generates locations of end-effectors of the joints that is used
to compute the angle by minimization of the function:
θ
(t)
i = minθi ||F′
(t) − F(θi)||2
s.t. θmin
i ≤ θi ≤ θmax
i ,
(7)
The minimization is performed using standard Levenberg-
Marquardt optimization algorithm.
6. Detailed 3D Shape Estimation
3D pose and coarse shape, estimated from top-down
method, is used to initialize the search in parameter space of
detailed 3D human shapes. We model 3D shape of humans
using polygonal 3D mesh surfaces skinned to an underly-
ing skeleton. We assume that the 3D mesh surface under-
goes deformation only under the influence of the skeleton
attached to it. Shape of human body can vary both due to
anthropometry or the pose of the target. Anthropometric
variability is modeled by the learned 3D shape models for
humans. The shape deformation due to pose is obtained by
first skinning the 3D mesh to the skeleton and transforming
the vertices under the influence of associated skeletal joints.
Skinning 3D Mesh to the Skeleton: We use Linear Blend
Skinning (LBS) for efficient non-rigid deformation of skin
as a function of underlying skeleton. LBS is achieved by as-
sociating the vertices to two nearest joints. The transforma-
tion is computed as weighted sum of the transformation due
to each of the joints where weights are computed as inverse
distance from the joints. Fig. 5 illustrates the computation
53
Figure 5. Linear Blend Skinning is used to deform the 3D mesh
under the influence of the skeleton,(left) Rigidly deforming human
body parts causes artifacts around the joints ;(middle) Vertices are
transformed using weighted sum of transformation due to multi-
ple associated joints ; (right) Shape deformation with backpack
accessory attached to the torso
of the transformation of vertices associated to different body
segments.
Although rich in terms of representation, global 3D hu-
man shape representation cannot model 3D shapes with dis-
proportionately sized body parts. In order to support rich
set of human shapes we use a combined local part-based
and global optimization scheme that first searches in the lo-
cal subspace of human body parts to match the observation,
followed by constraining the whole shape using global hu-
man shape model. Fitting body parts independently causes
discontinuities along the joints and generates unrealistic
shapes (see fig. 6). Constraining the shape to lie in the
global shape space therefore ensures it to be a valid shape.
For linear PCA based shape models, this is efficiently done
by ensuring the PCA coefficients of the shape (when pro-
jected to the subspace) to lie within a range of variance.
Stochastic Search in Local and Global Shape Space: Our
algorithm does alternate search in the parameter space of
3D human pose (θ) and shape (S) to simultaneously re-
fine the pose and fit detailed 3D shape to the observation.
The search is performed using Data Driven MCMC with
metropolis-hasting method wherein the proposal map does
not use the predictive distribution obtained from bottom-
up methods but rather is modeled as Gaussian distribu-
tion conditioned on the current state q(x′
|xt−1) where
xt−1 = {θt−1, St−1}.The likelihood distribution is mod-
eled as symmetrical chamfer distance map[2] to match the
2D projection of the model to the observed image silhou-
ettes from multiple sensors. For optimizing the 3D pose, we
use the current 3D shape to search in the parameter space of
articulated human pose. The regression function M (1), that
maps the coarse human shape model to the detailed shape
PCA coefficients, is used to initialize the search. Plausi-
ble 3D shapes are sampled from the Gaussian distributions
that the PCA based subspace represents for each of the body
Figure 6. Detailed 3D shape fitting by sampling from PCA based
shape models of various body components, (left) Average human
shape model, (middle) Shape with each body part sampled from
the parts shape model, (right) 3D shape obtained after constraining
the shape using global shape model
Figure 7. Accurate 3D surface reconstruction of human body is
provided for all the poses in I3DPost [11] dataset. 3D shape fitting
algorithms are evaluated by matching the fitted 3D shape (shown
as red colored vertices) with the ground truth surface reconstruc-
tion(shown as blue colored vertices).
parts. The search is performed by alternately fitting the 3D
pose first, followed by optimization of the shape parame-
ters of the individual body parts. At every iteration, the
3D shape of human body is constrained using global shape
model to ensure a valid shape (see fig. 6).
7. Experimental Evaluation
We conducted experiments on both publically available
datasets and those captured at our motion capture facility. In
all our experiments, we used 4 synchronized image streams
from calibrated sensors to estimate 3D pose and shape of the
human targets. 3D motion capture data was used to train our
bottom-up predictors. BME model was trained with 3 ex-
perts For training bottom-up methods, we used vector quan-
tized, shape context histograms computed over both outer
contour and the internal edges of the foreground object as
the inputs for regression. Fig. 8 illustrates the results of our
framework on walking sequences with and without back-
pack. I3DPost data[11] also provide accurate 3D surface
reconstruction of subjects in different walking poses. We
evaluate the accuracy of our shape fitting algorithms using
this as a groundtruth. Error is computed as sum of distance
of the surface vertex to the nearest vertex of the fitted 3D
shape. Fig. 7 illustrates the technique on an example image
54
Figure 8. 3D Pose and shape fitting results for different sequences. Three columns on the right show the results with backpack accessory
from walking sequence.
7.1. Shape Fitting to Accessories
Our system also supports automatic estimation of size of
an accessory bag carried by humans. Backpack is modeled
as a trapezoidal shape and is assumed to be rigidly attached
to the torso such that the translation and orientation of the
backpack can be directly computed using that of torso. The
two parameters of the trapezoid (thickness and orientation
of non-perpendicular face) are iteratively estimated during
the 3D shape fitting. The shape of the accessory is initial-
ized to mean thickness of human torso. The framework
functions as a generative classifier to identify whether a hu-
man is carrying backpack or not. Improvement in the likeli-
hood of fit for the model with the attached accessory implies
presence of backpack. This is illustrated in the fig. 9(b)
whereby use of model with an attached accessory improved
the likelihood of fit from 1.043 to 1.3441.
7.2. Human Attribute Inference Using 3D Shape
Analysis
The estimated 3D shape of the human target can be used
for inference of a variety of human attributes that are use-
ful for identifying a potentially hostile behavior. Demo-
graphic features such as gender and ethnicity, physical at-
tributes such as height, weight and body appearance can be
inferred either by computing spatial statistics of different
regions of the fitted 3D shape or by determining anthropo-
metric variations that characterizes these features.Various
anthropometric measurements can be directly inferred from
the 3D shape fitting to the observed multi-sensor data. Fig.
9(c) shows the measurements of different body parts esti-
mated from the 3D shapes fitted to the observations.
Gender Classification: We use linear discriminant analysis
(LDA) to find the feature projections that best discriminate
the shape profiles of the two gender classes. Linear Dis-
criminant Analysis (LDA) essentially learns a linear clas-
sification boundary between the two classes under the as-
sumption that the samples from each of the two classes are
normally distributed. The LDA vector can be used to clas-
sify a person’s gender based on the fitted 3D shape. Similar
to gender classification, age and ethnicity attributes of a per-
son can be inferred depending on the body stature. Fig. 9(a)
shows the gender classification results using LDA. Here the
threshold for gender classification is set to 0 and negative
LDA coefficients denote female shapes.
8. Conclusions
We have proposed an integrated approach that combines
bottom-up and top-down methods for 3D pose and shape
estimation of human targets from multi-view imagery. We
55
Figure 9. Human attribute inference using shape analysis,(a) Gender classification (b) 3D shape fitting without and with backpack in middle
and bottom row respectively. The observation matching cost (using chamfer distance) without and with backpack model were 1.3441 and
1.043 respectively. (c) 3D shape estimation can be used estimate dimensions of various body parts
limit the number of sensors used in our framework to 4. To
overcome ambiguity and ill-constrained nature of the prob-
lem, we use efficient anthropometric priors of human shape
and pose learned from the CAESAR dataset. Accurate 3D
pose and shape estimated from our framework can be used
for inferring attributes like gender, age, ethnicity and body
weight. Currently our framework does not use tracking, but
fits pose and shape for every frame independently. Pose
and surface tracking will be employed in future to obtain
smoother 3D shape deformation in a video.
Acknowledgements: We thank George Williams, Peter
Birdsall and Kirill Smoleskiy for assisting us in data collec-
tion. We thank Asaad Hakeem for discussions and useful
comments on the work. This work was supported by Air
Force Research Lab, contract number FA8650-10-M-6094.
References
[1] B. Allen, B. Curless, and Z. Popovic. The space of human body shapes: recon-
struction and parameterization from range scans. ACM SIGGRAPH, 2003. 50,
51
[2] A. Balan, L. Sigal, M. Black, J. Davis, and H. Haussecker. Detailed human
shape and pose from images. CVPR, 2007. 50, 51, 54
[3] A. O. Balan and M. J. Black. The naked truth: Estimating body shape under
clothing. In ECCV (2), pages 15–29, 2008. 50
[4] C. Bregler, J. Malik, and K. Pullen. Twist based acquisition and tracking
of animal and human kinematics. International Journal of Computer Vision,
56(3):179–194, 2004. 50
[5] Y. Chen, T.-K. Kim, and R. Cipolla. Inferring 3d shapes and deformations from
single views. In ECCV (3), pages 300–313, 2010. 50
[6] A. C. Faul and M. E. Tipping. Analysis of sparse bayesian learning. Proc.
Neural Information Processing Systems, pages 383–389, 2001. 52
[7] J. Gall, B. Rosenhahn, and H.-P. Seidel. Drift-free tracking of rigid and articu-
lated objects. In CVPR. IEEE Computer Society, 2008. 50
[8] J. Gall, C. Stoll, E. de Aguiar, C. Theobalt, B. Rosenhahn, and H.-P. Seidel.
Motion capture using joint skeleton tracking and surface estimation. In IEEE
Computer Society Conference on Computer Vision and Pattern Recognition,
pages 1746–1753, 2009. 50
[9] J. Gall, A. Yao, and L. J. V. Gool. 2d action recognition serves 3d human pose
estimation. In ECCV (3), pages 425–438, 2010. 50
[10] S. Geman and D. Geman. Stochastic relaxation, gibbs distributions and the
bayesian restoration of images. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 6(6):721–741, 2010. 53
[11] N. Gkalelis, H. Kim, A. Hilton, N. Nikolaidis, and I. Pitas. The i3dpost multi-
view and 3d human action/interaction. In Proc. Conference on Visual Media
Production, 1(1):159–168, 2009. 54
[12] P. Guan, A. Weiss, A. O. Balan, and M. J. Black. Estimating human shape and
pose from a single image. In ICCV, pages 1381–1388. IEEE, 2009. 50
[13] S. Hauberg, S. Sommer, and K. S. Pedersen. Gaussian-like spatial priors for
articulated tracking. ECCV, 2010. 53
[14] M. Lee and I. Cohen. Proposal maps driven mcmc for estimating human body
pose in static images. Proc. Computer Vision and Pattern Recognition Conf.,
pages 334–341, 2004. 53
[15] T. B. Moeslund, A. Hilton, and V. Krüger. A survey of advances in vision-based
human motion capture and analysis. Computer Vision and Image Understand-
ing, 104(2-3):90–126, 2006. 50
[16] L. Mündermann, S. Corazza, and T. P. Andriacchi. Accurately measuring hu-
man movement using articulated icp with soft-joint constraints and a repository
of articulated models. In CVPR. IEEE Computer Society, 2007. 50
[17] G. Pons-Moll, A. Baak, T. Helten, M. Müller, H.-P. Seidel, and B. Rosenhahn.
Multisensor-fusion for 3d full-body human motion capture. In CVPR, pages
663–670, 2010. 50
[18] K. Robinette and H. Daanen. The caesar project: A 3-d surface anthropometry
survey. Second International Conference on 3-D Imaging and Modeling, 1999.
50, 51
[19] L. Sigal, A. O. Balan, and M. J. Black. Combined discriminative and genera-
tive articulated pose and non-rigid shape estimation. In J. C. Platt, D. Koller,
Y. Singer, and S. T. Roweis, editors, NIPS. MIT Press, 2007. 49, 50
[20] C. Sminchisescu, A. Kanaujia, Z. Li, and D. N. Metaxas. Discriminative density
propagation for 3d human motion estimation. In Proc. Computer Vision Pattern
Recognition, 2005. 52
[21] C. Stoll, J. Gall, E. de Aguiar, S. Thrun, and C. Theobalt. Video-based re-
construction of animatable human characters. ACM Trans. Graph., 29(6):139,
2010. 50
56

More Related Content

Similar to 3D Human Pose And Shape Estimation From Multi-View Imagery

3 d models of female pelvis structures reconstructed
3 d models of female pelvis structures reconstructed3 d models of female pelvis structures reconstructed
3 d models of female pelvis structures reconstructedJanan Syakuro
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Jia-Bin Huang
 
3D Body Scanning for Human Anthropometry
3D Body Scanning for Human Anthropometry3D Body Scanning for Human Anthropometry
3D Body Scanning for Human Anthropometryijtsrd
 
ROBUST STATISTICAL APPROACH FOR EXTRACTION OF MOVING HUMAN SILHOUETTES FROM V...
ROBUST STATISTICAL APPROACH FOR EXTRACTION OF MOVING HUMAN SILHOUETTES FROM V...ROBUST STATISTICAL APPROACH FOR EXTRACTION OF MOVING HUMAN SILHOUETTES FROM V...
ROBUST STATISTICAL APPROACH FOR EXTRACTION OF MOVING HUMAN SILHOUETTES FROM V...ijitjournal
 
Using skeleton model to recognize human gait gender
Using skeleton model to recognize human gait genderUsing skeleton model to recognize human gait gender
Using skeleton model to recognize human gait genderIAESIJAI
 
IRJET- Virtual Changing Room using Image Processing
IRJET- Virtual Changing Room using Image ProcessingIRJET- Virtual Changing Room using Image Processing
IRJET- Virtual Changing Room using Image ProcessingIRJET Journal
 
Reliability of Three-dimensional Photonic Scanner Anthropometry Performed by ...
Reliability of Three-dimensional Photonic Scanner Anthropometry Performed by ...Reliability of Three-dimensional Photonic Scanner Anthropometry Performed by ...
Reliability of Three-dimensional Photonic Scanner Anthropometry Performed by ...CSCJournals
 
To identify the person using gait knn based approach
To identify the person using gait   knn based approachTo identify the person using gait   knn based approach
To identify the person using gait knn based approacheSAT Journals
 
Hoip10 articulo counting people in crowded environments_univ_berlin
Hoip10 articulo counting people in crowded environments_univ_berlinHoip10 articulo counting people in crowded environments_univ_berlin
Hoip10 articulo counting people in crowded environments_univ_berlinTECNALIA Research & Innovation
 
Literature survey for 3 d reconstruction of brain mri images
Literature survey for 3 d reconstruction of brain mri imagesLiterature survey for 3 d reconstruction of brain mri images
Literature survey for 3 d reconstruction of brain mri imageseSAT Journals
 
Literature survey for 3 d reconstruction of brain mri
Literature survey for 3 d reconstruction of brain mriLiterature survey for 3 d reconstruction of brain mri
Literature survey for 3 d reconstruction of brain mrieSAT Publishing House
 
Integration of poses to enhance the shape of the object tracking from a singl...
Integration of poses to enhance the shape of the object tracking from a singl...Integration of poses to enhance the shape of the object tracking from a singl...
Integration of poses to enhance the shape of the object tracking from a singl...eSAT Journals
 
AI Personal Trainer Using Open CV and Media Pipe
AI Personal Trainer Using Open CV and Media PipeAI Personal Trainer Using Open CV and Media Pipe
AI Personal Trainer Using Open CV and Media PipeIRJET Journal
 
A Hand Gesture Based Interactive Presentation System Utilizing Heterogeneous ...
A Hand Gesture Based Interactive Presentation System Utilizing Heterogeneous ...A Hand Gesture Based Interactive Presentation System Utilizing Heterogeneous ...
A Hand Gesture Based Interactive Presentation System Utilizing Heterogeneous ...Lisa Cain
 
A STOCHASTIC STATISTICAL APPROACH FOR TRACKING HUMAN ACTIVITY
A STOCHASTIC STATISTICAL APPROACH FOR TRACKING HUMAN ACTIVITYA STOCHASTIC STATISTICAL APPROACH FOR TRACKING HUMAN ACTIVITY
A STOCHASTIC STATISTICAL APPROACH FOR TRACKING HUMAN ACTIVITYZac Darcy
 
A Methodology for Extracting Standing Human Bodies From Single Images
A Methodology for Extracting Standing Human Bodies From Single ImagesA Methodology for Extracting Standing Human Bodies From Single Images
A Methodology for Extracting Standing Human Bodies From Single ImagesProjectsatbangalore
 

Similar to 3D Human Pose And Shape Estimation From Multi-View Imagery (20)

C045071117
C045071117C045071117
C045071117
 
3 d models of female pelvis structures reconstructed
3 d models of female pelvis structures reconstructed3 d models of female pelvis structures reconstructed
3 d models of female pelvis structures reconstructed
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)
 
3D Body Scanning for Human Anthropometry
3D Body Scanning for Human Anthropometry3D Body Scanning for Human Anthropometry
3D Body Scanning for Human Anthropometry
 
ROBUST STATISTICAL APPROACH FOR EXTRACTION OF MOVING HUMAN SILHOUETTES FROM V...
ROBUST STATISTICAL APPROACH FOR EXTRACTION OF MOVING HUMAN SILHOUETTES FROM V...ROBUST STATISTICAL APPROACH FOR EXTRACTION OF MOVING HUMAN SILHOUETTES FROM V...
ROBUST STATISTICAL APPROACH FOR EXTRACTION OF MOVING HUMAN SILHOUETTES FROM V...
 
Using skeleton model to recognize human gait gender
Using skeleton model to recognize human gait genderUsing skeleton model to recognize human gait gender
Using skeleton model to recognize human gait gender
 
IRJET- Virtual Changing Room using Image Processing
IRJET- Virtual Changing Room using Image ProcessingIRJET- Virtual Changing Room using Image Processing
IRJET- Virtual Changing Room using Image Processing
 
proj525
proj525proj525
proj525
 
Reliability of Three-dimensional Photonic Scanner Anthropometry Performed by ...
Reliability of Three-dimensional Photonic Scanner Anthropometry Performed by ...Reliability of Three-dimensional Photonic Scanner Anthropometry Performed by ...
Reliability of Three-dimensional Photonic Scanner Anthropometry Performed by ...
 
To identify the person using gait knn based approach
To identify the person using gait   knn based approachTo identify the person using gait   knn based approach
To identify the person using gait knn based approach
 
Hoip10 articulo counting people in crowded environments_univ_berlin
Hoip10 articulo counting people in crowded environments_univ_berlinHoip10 articulo counting people in crowded environments_univ_berlin
Hoip10 articulo counting people in crowded environments_univ_berlin
 
Literature survey for 3 d reconstruction of brain mri images
Literature survey for 3 d reconstruction of brain mri imagesLiterature survey for 3 d reconstruction of brain mri images
Literature survey for 3 d reconstruction of brain mri images
 
Literature survey for 3 d reconstruction of brain mri
Literature survey for 3 d reconstruction of brain mriLiterature survey for 3 d reconstruction of brain mri
Literature survey for 3 d reconstruction of brain mri
 
Integration of poses to enhance the shape of the object tracking from a singl...
Integration of poses to enhance the shape of the object tracking from a singl...Integration of poses to enhance the shape of the object tracking from a singl...
Integration of poses to enhance the shape of the object tracking from a singl...
 
AI Personal Trainer Using Open CV and Media Pipe
AI Personal Trainer Using Open CV and Media PipeAI Personal Trainer Using Open CV and Media Pipe
AI Personal Trainer Using Open CV and Media Pipe
 
JBSC_online
JBSC_onlineJBSC_online
JBSC_online
 
A Hand Gesture Based Interactive Presentation System Utilizing Heterogeneous ...
A Hand Gesture Based Interactive Presentation System Utilizing Heterogeneous ...A Hand Gesture Based Interactive Presentation System Utilizing Heterogeneous ...
A Hand Gesture Based Interactive Presentation System Utilizing Heterogeneous ...
 
A STOCHASTIC STATISTICAL APPROACH FOR TRACKING HUMAN ACTIVITY
A STOCHASTIC STATISTICAL APPROACH FOR TRACKING HUMAN ACTIVITYA STOCHASTIC STATISTICAL APPROACH FOR TRACKING HUMAN ACTIVITY
A STOCHASTIC STATISTICAL APPROACH FOR TRACKING HUMAN ACTIVITY
 
A Methodology for Extracting Standing Human Bodies From Single Images
A Methodology for Extracting Standing Human Bodies From Single ImagesA Methodology for Extracting Standing Human Bodies From Single Images
A Methodology for Extracting Standing Human Bodies From Single Images
 
Human Detection and Tracking System for Automatic Video Surveillance
Human Detection and Tracking System for Automatic Video SurveillanceHuman Detection and Tracking System for Automatic Video Surveillance
Human Detection and Tracking System for Automatic Video Surveillance
 

More from Liz Adams

What Is Creative Writing. Essay Topics And Example
What Is Creative Writing. Essay Topics And ExampleWhat Is Creative Writing. Essay Topics And Example
What Is Creative Writing. Essay Topics And ExampleLiz Adams
 
Free Printable Spider-Shaped Writing Templates. This PD
Free Printable Spider-Shaped Writing Templates. This PDFree Printable Spider-Shaped Writing Templates. This PD
Free Printable Spider-Shaped Writing Templates. This PDLiz Adams
 
Find Out How To Earn 398Day Using Essay Wri
Find Out How To Earn 398Day Using Essay WriFind Out How To Earn 398Day Using Essay Wri
Find Out How To Earn 398Day Using Essay WriLiz Adams
 
Fish - All-Day Primary. Online assignment writing service.
Fish - All-Day Primary. Online assignment writing service.Fish - All-Day Primary. Online assignment writing service.
Fish - All-Day Primary. Online assignment writing service.Liz Adams
 
009 Essay Outline Template Mla Format Thatsnotus
009 Essay Outline Template Mla Format Thatsnotus009 Essay Outline Template Mla Format Thatsnotus
009 Essay Outline Template Mla Format ThatsnotusLiz Adams
 
Rain Text Effect - Photoshop Tutorial - Write On Foggy Window - YouTube
Rain Text Effect - Photoshop Tutorial - Write On Foggy Window - YouTubeRain Text Effect - Photoshop Tutorial - Write On Foggy Window - YouTube
Rain Text Effect - Photoshop Tutorial - Write On Foggy Window - YouTubeLiz Adams
 
New Year Writing Paper By Burst Into First TPT
New Year Writing Paper By Burst Into First TPTNew Year Writing Paper By Burst Into First TPT
New Year Writing Paper By Burst Into First TPTLiz Adams
 
Prudential Center Events New Jersey Live Entertain
Prudential Center Events New Jersey Live EntertainPrudential Center Events New Jersey Live Entertain
Prudential Center Events New Jersey Live EntertainLiz Adams
 
College Essay Literary Criticism Essay Outline
College Essay Literary Criticism Essay OutlineCollege Essay Literary Criticism Essay Outline
College Essay Literary Criticism Essay OutlineLiz Adams
 
Paper With Writing On It - College Homework Help A
Paper With Writing On It - College Homework Help APaper With Writing On It - College Homework Help A
Paper With Writing On It - College Homework Help ALiz Adams
 
Free Clipart Pencil And Paper 10 Free Cliparts
Free Clipart Pencil And Paper 10 Free ClipartsFree Clipart Pencil And Paper 10 Free Cliparts
Free Clipart Pencil And Paper 10 Free ClipartsLiz Adams
 
Hamburger Writing By Food For Taught Teachers Pay
Hamburger Writing By Food For Taught Teachers PayHamburger Writing By Food For Taught Teachers Pay
Hamburger Writing By Food For Taught Teachers PayLiz Adams
 
How To Avoid Plagiarism In Writing Research - Essay Hel
How To Avoid Plagiarism In Writing Research - Essay HelHow To Avoid Plagiarism In Writing Research - Essay Hel
How To Avoid Plagiarism In Writing Research - Essay HelLiz Adams
 
Writing An Academic Essay. Online assignment writing service.
Writing An Academic Essay. Online assignment writing service.Writing An Academic Essay. Online assignment writing service.
Writing An Academic Essay. Online assignment writing service.Liz Adams
 
Writing An Introduction To A Research Paper
Writing An Introduction To A Research PaperWriting An Introduction To A Research Paper
Writing An Introduction To A Research PaperLiz Adams
 
School Essay Essays For Kids In English. Online assignment writing service.
School Essay Essays For Kids In English. Online assignment writing service.School Essay Essays For Kids In English. Online assignment writing service.
School Essay Essays For Kids In English. Online assignment writing service.Liz Adams
 
Importance Of Exercise In Daily Life Essay. Importan
Importance Of Exercise In Daily Life Essay. ImportanImportance Of Exercise In Daily Life Essay. Importan
Importance Of Exercise In Daily Life Essay. ImportanLiz Adams
 
Vocabulary For Essay Writing Essay Writing Skills, Acade
Vocabulary For Essay Writing Essay Writing Skills, AcadeVocabulary For Essay Writing Essay Writing Skills, Acade
Vocabulary For Essay Writing Essay Writing Skills, AcadeLiz Adams
 
NEW LeapFrog LeapReader Deluxe. Online assignment writing service.
NEW LeapFrog LeapReader Deluxe. Online assignment writing service.NEW LeapFrog LeapReader Deluxe. Online assignment writing service.
NEW LeapFrog LeapReader Deluxe. Online assignment writing service.Liz Adams
 
System Proposal Sample. Online assignment writing service.
System Proposal Sample. Online assignment writing service.System Proposal Sample. Online assignment writing service.
System Proposal Sample. Online assignment writing service.Liz Adams
 

More from Liz Adams (20)

What Is Creative Writing. Essay Topics And Example
What Is Creative Writing. Essay Topics And ExampleWhat Is Creative Writing. Essay Topics And Example
What Is Creative Writing. Essay Topics And Example
 
Free Printable Spider-Shaped Writing Templates. This PD
Free Printable Spider-Shaped Writing Templates. This PDFree Printable Spider-Shaped Writing Templates. This PD
Free Printable Spider-Shaped Writing Templates. This PD
 
Find Out How To Earn 398Day Using Essay Wri
Find Out How To Earn 398Day Using Essay WriFind Out How To Earn 398Day Using Essay Wri
Find Out How To Earn 398Day Using Essay Wri
 
Fish - All-Day Primary. Online assignment writing service.
Fish - All-Day Primary. Online assignment writing service.Fish - All-Day Primary. Online assignment writing service.
Fish - All-Day Primary. Online assignment writing service.
 
009 Essay Outline Template Mla Format Thatsnotus
009 Essay Outline Template Mla Format Thatsnotus009 Essay Outline Template Mla Format Thatsnotus
009 Essay Outline Template Mla Format Thatsnotus
 
Rain Text Effect - Photoshop Tutorial - Write On Foggy Window - YouTube
Rain Text Effect - Photoshop Tutorial - Write On Foggy Window - YouTubeRain Text Effect - Photoshop Tutorial - Write On Foggy Window - YouTube
Rain Text Effect - Photoshop Tutorial - Write On Foggy Window - YouTube
 
New Year Writing Paper By Burst Into First TPT
New Year Writing Paper By Burst Into First TPTNew Year Writing Paper By Burst Into First TPT
New Year Writing Paper By Burst Into First TPT
 
Prudential Center Events New Jersey Live Entertain
Prudential Center Events New Jersey Live EntertainPrudential Center Events New Jersey Live Entertain
Prudential Center Events New Jersey Live Entertain
 
College Essay Literary Criticism Essay Outline
College Essay Literary Criticism Essay OutlineCollege Essay Literary Criticism Essay Outline
College Essay Literary Criticism Essay Outline
 
Paper With Writing On It - College Homework Help A
Paper With Writing On It - College Homework Help APaper With Writing On It - College Homework Help A
Paper With Writing On It - College Homework Help A
 
Free Clipart Pencil And Paper 10 Free Cliparts
Free Clipart Pencil And Paper 10 Free ClipartsFree Clipart Pencil And Paper 10 Free Cliparts
Free Clipart Pencil And Paper 10 Free Cliparts
 
Hamburger Writing By Food For Taught Teachers Pay
Hamburger Writing By Food For Taught Teachers PayHamburger Writing By Food For Taught Teachers Pay
Hamburger Writing By Food For Taught Teachers Pay
 
How To Avoid Plagiarism In Writing Research - Essay Hel
How To Avoid Plagiarism In Writing Research - Essay HelHow To Avoid Plagiarism In Writing Research - Essay Hel
How To Avoid Plagiarism In Writing Research - Essay Hel
 
Writing An Academic Essay. Online assignment writing service.
Writing An Academic Essay. Online assignment writing service.Writing An Academic Essay. Online assignment writing service.
Writing An Academic Essay. Online assignment writing service.
 
Writing An Introduction To A Research Paper
Writing An Introduction To A Research PaperWriting An Introduction To A Research Paper
Writing An Introduction To A Research Paper
 
School Essay Essays For Kids In English. Online assignment writing service.
School Essay Essays For Kids In English. Online assignment writing service.School Essay Essays For Kids In English. Online assignment writing service.
School Essay Essays For Kids In English. Online assignment writing service.
 
Importance Of Exercise In Daily Life Essay. Importan
Importance Of Exercise In Daily Life Essay. ImportanImportance Of Exercise In Daily Life Essay. Importan
Importance Of Exercise In Daily Life Essay. Importan
 
Vocabulary For Essay Writing Essay Writing Skills, Acade
Vocabulary For Essay Writing Essay Writing Skills, AcadeVocabulary For Essay Writing Essay Writing Skills, Acade
Vocabulary For Essay Writing Essay Writing Skills, Acade
 
NEW LeapFrog LeapReader Deluxe. Online assignment writing service.
NEW LeapFrog LeapReader Deluxe. Online assignment writing service.NEW LeapFrog LeapReader Deluxe. Online assignment writing service.
NEW LeapFrog LeapReader Deluxe. Online assignment writing service.
 
System Proposal Sample. Online assignment writing service.
System Proposal Sample. Online assignment writing service.System Proposal Sample. Online assignment writing service.
System Proposal Sample. Online assignment writing service.
 

Recently uploaded

Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 

Recently uploaded (20)

Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 

3D Human Pose And Shape Estimation From Multi-View Imagery

  • 1. 3D Human Pose and Shape Estimation from Multi-view Imagery Atul Kanaujia ObjectVideo, Inc. akanaujia@objectvideo.com Niels Haering ObjectVideo, Inc. nhaering@objectvideo.com Graham Taylor New york University gwtaylor@cs.nyu.edu Chris Bregler New York University chris.bregler@nyu.edu Abstract In this study we present robust solution for estimating 3D pose and shape of human targets from multiple, syn- chronized video streams. The objective is to automatically estimate physical attributes of the targets that would allow us to analyze its behavior non-intrusively. Proposed system estimates the anthropometric skeleton, pose and shape of the human target from the 3D visual hull reconstructed from multiple silhouettes of the target. Discriminative (bottom- up) method is used to first initialize 3D pose of the tar- gets using low-level features extracted from the 2D image. The pose is refined using generative (top-down) method that also estimates the optimal skeleton of the target us- ing anthropometric prior models learned from the CAESAR dataset. Statistical shape models are also learned from the CAESAR dataset and are used to model both global and lo- cal shape variability of human body parts. We also propose a novel optimization scheme to fit 3D shape by searching in the parametric space of local parts model and constraining the overall shape using a global shape model. The system provides a useful framework for automatically identifying dispropotionate body parts, estimating size of backpacks and inferring attributes like gender, age and ethnicity of the human target. 1. Introduction With rapid advancements in computer vision technology and the emergence of matured technologies for detection and tracking of human targets from a significant stand-off point, there is a greater need for cognitive video analytics with the ability to infer subtle attributes of humans and ana- lyze human behavior. Tremendous progress has been made in sensor technology in recent years, thus enabling develop- ment of advanced video sensors with gigapixel resolution, that are capable of providing sufficient image resolution for detailed analysis of human targets from a large distance. In this paper we propose a framework for inferring var- ious human attributes from multi-view video based 3D hu- man pose and shape estimates. Detailed 3D human shape estimation from multi-view imagery is still a difficult prob- lem that does not have satisfactory solution. Our fully auto- mated system estimates the skeleton, 3D pose and shape of human targets from multi-view images obtained from syn- chronized and calibrated sensors, in a non-intrusive way. Concealed objects typically manifests as local artifacts on human body surface and are difficult to fit using global hu- man shape models. In order to accurately fit local bulges on the human body, we use an iterative mechanism to locally fit a body part shapes to the sensor data. The estimated 3D shape of the target is used to classify its gender, whether it is carrying backpack or not, concealing an object and to infer dimensions of various body parts. Contributions: Our approach combines the strengths of discriminative (bottom-up) approach with model-based generative (top-down) algorithms for efficient estimation of 3D pose and shape of the target. In that respect, our work is similar to [19]. However, as discussed in section 3, our human modeling technique is significantly differ- ent in representation compared to theirs. In addition, we use local, parts-based shape models for fitting 3D shapes to the data. Local parts-based shape models provide richer and more flexible representations of human body shapes en- abling improved fitting to the abnormal target shapes. Fol- lowing are the key contributions of our approach: (1) We propose coarse-to-fine, 3D pose and shape fitting algorithm that uses an intermediate step of pose refinement using a cylindrical parts based human shape model. This allows us to easily enforce anthropometric constraints like non-self penetration of body parts, which is costly to impose when modeling finer surface deformation ; (2) To efficiently infer both skeleton and shape of the human target in any pose, we model human body using a joint subspace of anthropo- metric skeletons and 3D shapes ; (3) We have developed a novel parts-based 3D shape and pose optimization scheme that fits the part shapes locally to the observation at the same time constraining the overall shape globally ; (4) We extend 3D shape model to fit the shapes of humans carrying acces- sories such as backpack. Related work: Initial work on marker-less motion-capture focused on accurate 3D pose estimation from single and 49
  • 2. multi-view imagery. A comprehensive survey of existing state of the art techniques in vision-based motion capture is provided in [15]. Bregler and Malik [4] proposed a rep- resentation for articulated human models using twists that has been widely employed in a number of single and multi- ple camera based motion capture systems [9, 17, 21, 8, 7]. Compared to earlier approaches [15] that modeled hu- man shapes with cylindrical or superquadrics parts, current methods use more accurate modeling of 3D human shapes using SCAPE body models [2] or CAESAR dataset [1]. A number of recent multi-camera based systems proposed by Balan and Sigal [2, 3, 19] employed SCAPE data to model variability in 3D human shapes due to anthropometry and pose. They have used these shape models to estimate hu- man body shape under loose clothing and also efficiently track across multiple frames. Guan et. al[12] used SCAPE based shape model to perform height-constrained estima- tion of body shape. These approaches however lack ar- ticulated skeleton underlying the human body shape. The 3D shape deformation of body surface is captured by track- ing the 3D mesh surfaces directly. Deforming the 3D mesh while maintaining the surface smoothness is not only com- putationally demanding but also ill-constrained, occasion- ally causing poor surface deformation due to noisy silhou- ettes (or visual hull). In parallel to above approaches, Mun- derman et. al [16] developed a SCAPE model with an un- derlying skeleton to track 3D shapes of a human target in multi-view image sequences using an extension of Itera- tive Closest Point (ICP) algorithm. Our proposed system resembles more closely to the work proposed by Gall et. al[8],[21]. In addition to proposing combined skeleton and 3D shape based human models, they fit 3D pose to multi- view image data using a combined local and global opti- mization scheme.An extension of the above work [9] used action based priors to improve pose tracking and 3D shape estimation from multi-view image data. Moll et. al [17] proposed a multi-modal system to improve 3D human pose and shape estimation from multi-view imagery by using both visual cues and global orientation information from in- ertial sensors. Chen et. al [5] developed a non-linear mani- fold representation of 3D shape variability of humans due to pose and anthropometry. They use non-linear optimization to search in the low-dimensional parameter space of shape and camera parameters to optimally fit 3D shape to the sil- houette. 2. System Overview Fig. 1 shows the overview of the system. The system uses synchronized streams of multi-view image sequences of human target from a set of calibrated cameras as inputs. It generates a 3D volumetric reconstruction (visual hull) of the target using space carving (fig. 1(a)) from the target sil- houettes. We use bottom-up predictors to generate initial Figure 1. Overview of the proposed system for 3D human pose and shape estimation;(a) 3D Data is acquired as volumetric reconstruc- tion using space carving;(b) Human shape is modeled by register- ing 3D template mesh to laser scans of human subjects and using it to learn PCA subspace ; (c) Bottom-up predictors are used to gen- erate initial human pose hypotheses using features extracted from the image; (d) Pose predictions are refined by top-down search in pose and shape space using a coarse 3D human shape model ; (e) Detailed 3D shape is estimated by searching in the parametric space of shape models of individual parts; (f) Estimated pose and shape is used for inferring human attributes and anomalous shapes hypotheses of the articulated 3D pose of the human inde- pendently from each sensor and fuse them at the semantic 3D pose level (fig. 1(c)). The 3D pose is refined by top-down (generative) meth- ods that uses Markov Chain Monte Carlo (MCMC) based search to efficiently fit a coarse 3D human shape model (with cylindrical body parts) to the extracted visual hull (fig. 1(d)). The top-down models are used to search in both pose and parametric space of skeleton and coarse 3D human shapes to maximize the overlap with the visual hull. We model the space of detailed human shape vari- ation using Principal Component Analysis (PCA). Human 3D shape model is learned by first establishing one-to-one correspondence between a hole-filled, template 3D mesh model and a corpus of human body scans from CAESAR Dataset [18] (fig. 1(b)). The registered 3D mesh data is used to learn low-dimensional models for local parts-based and global shape variability in humans. Detailed 3D shape of a target human is obtained by searching in the PCA-based low-dimensional parametric shape space for the best fitting match (fig. 1(e)). The developed system is used for analyzing 3D human shapes and inferring attributes of the human target such as 50
  • 3. Figure 2. 3D mesh surface and underlying skeleton of a template human model is iteratively deformed to align it to the human body scan data (CAESAR dataset). All scans have 73 landmark points on body surface that are used for 3D shape registration gender and dimensions of their body parts. In all of our ex- periments we employed 4 calibrated cameras placed along directions to maximally capture the entire viewing sphere around the target. Although using fewer cameras introduces ambiguity, we overcome this problem by using efficient an- thropometric priors for searching in both pose and shape space. 3. 3D Human Pose and Shape Modeling We model human body as combination of an articulated skeleton and 3D shape. The shape is modeled both coarsely (using cylindrical parts) and finely (using detailed 3D sur- face mesh). We learn the 3D shape models for both entire human body and individual body parts (15 components). We make the assumption that the human body shape gets deformed only due to the underlying skeleton (and not due to other factors such as clothing). Use of skeleton in de- forming a 3D mesh surface is more robust to noisy silhou- ettes compared to skeleton free shape estimation [2] as it puts additional constraints to the shape fitting by searching in the parametric space of human shape models. 3D Data Acquisition: Targets are localized using change detection. We model background pixel intensity distribu- tion as non-parametric kernel density estimate to extract silhouettes of moving targets. Image streams from multi- ple calibrated sensors are used to reconstruct 3D volumetric representation (visual hull) of the human target using space carving. We use octree-based fast iterative space carving algorithm to extract volumetric reconstruction of the target. A single volume (cube) that completely encloses the work- ing space of the acquisition system is defined. Based on the projection to the camera image plane each voxel is classi- fied as inside, outside or on the boundary of the visual hull using the target silhouette. The boundary voxels are itera- tively subdivided into eight parts (voxels) until the size of voxels is less than the threshold size. As 2D shapes of the silhouette play a critical role in dis- criminative 3D pose prediction (see section 4), visual hull is back projected to obtain clean silhouettes of the target us- ing Z-buffering. The improved silhouettes generate cleaner shape descriptors for improved 3D pose estimation using bottom-up methods. Human 3D Shape Registration: Laser scans of human body from CAESAR dataset are used to learn parametric models for 3D human shapes. Human body scans are first registered to a perfect, hole-filled, reference template hu- man model composed of both 3D mesh surface and accu- rately aligned skeleton. We use a detailed template model of standard anthropometry, in order to capture subtle and wide range of variations in human 3D shapes. The CAE- SAR dataset has 73 landmark points on various positions, and these are used to guide the 3D shape registration. The deformation is an iterative process that gradually brings the template surface mesh vertices (and the skeleton) close to the laser scan data points by translating them along surface normal while maintaining the surface smoothness. Anthropometric Prior and Coarse Human Shape Mod- eling: We learn parametric models for the space of human skeletons and coarse representation of 3D shape of the hu- man body L using cylindrical parts (see fig. 3). Princi- pal Component Analysis (PCA) is used to learn the space of human skeletons and variability of dimensions of the cylindrical body parts from the registered CAESAR dataset [18](see fig. 2). The space of human skeletons is parame- terized using 5 dimensional PCA subspace, capturing 94% of the variability in length of skeletal links. The coarse 3D human shape model parameters L = [l r1 r2] include the length and the two radii of the tapered cylindrical human parts. Global and Part-based Shape Modeling: We characterize the space of human body shapes and the individual body parts using Principal Component Analysis(PCA). Global 3D human shape models are excessively restrictive in cap- turing shape variabilities due to a concealed object and dis- proportionate or abnormally sized body part. In compar- ison, parts-based 3D shape models are richer in model- ing asymmetries and surface protrusion arising due to ob- ject concealment. We use PCA to learn subspace for each of the body parts from the parts vertices of the registered shape, that are in one-to-one correspondence with the pre- segmented template mesh model. Detailed Parts Shapes from Coarse Human Model: In order to efficiently initialize the detailed 3D parts from the coarse cylindrical body parts, we employ approach simi- lar to [1], for learning relation between the PCA coeffi- cients of the ith body part and dimensions of its corre- sponding cylindrical shape model (L(i) = l(i) r (i) 1 r (i) 2 ). Specifically, we learn linear regression map from the PCA coefficients [P]Nxk of the N data points in k- dimensional PCA subspace. For the regression function : 51
  • 4. Figure 3. (Top left) Space of articulated human skeletons; (Top right) Coarse human shape model used in our system; (Bottom left) Average detailed human shape model ; (Bottom right) Coarse human shape model with size of parts estimated from the detailed 3D shape M l(i) r (i) 1 r (i) 2 1 T = P (i) 1 · · · P (i) k T . The mapping is learned as a pseudo-inverse: M = P(LLT + λI)−1 (1) where λ is the regularization constant of the ridge regres- sion. The PCA coefficients of the detailed 3D shape of the ith body part can be directly computed from the dimensions of cylindrical body part as M[l(i) r (i) 1 r (i) 2 1]. 4. Bottom-up 3D Pose Estimation Due to high degree of articulation of human body, searching in high dimensional pose space is prone to lo- cal optima. We overcome this problem by initializing the search near the global optima using discriminative (bottom- up) methods. To this end, we employ a regression based framework to directly predict multiple plausible 3D poses (obtained as probabilistic distribution over pose space) us- ing the visual cues extracted from individual sensors. The predictive distribution from multiple sensors are then ob- tained by simple summing these distributions.Inferring 3D pose using only 2D visual observation is an ill-posed prob- lem, due to loss of depth information from perspective pro- jection. Learning therefore involves modeling inverse per- spective mapping that is one-to-many, as several 3D human configurations can generate similar 2D visual observations. We therefore model these relations as multi-valued map- pings using Bayesian Mixture of Experts (BME)[20] model. Formally, the BME model is p(x|r) = M i=1 gi(r)pi(x|r) (2) gi(r) = exp(λ⊤ i r) k exp(λ⊤ k r) (3) pi(x|r) = G(x|Wir, Ω−1 i ) (4) where r is the input or predictor variable(image descrip- tors), x is the output or response(3D pose parameters), and gi is the input-dependent positive gate functions. Gates gi output value between [0, 1] and are computed using (3). For a particular input r, gates output the probability of the expert function that should be used to map r to the out- put pose x. In the model, pi refers to Gaussian distribu- tions with covariances Ω−1 i centered at different ”expert” predictions. BME is learned in Sparse Bayesian Learning (SBL) paradigm that uses Automatic Relevance Determina- tion(ARD) mechanism to train sparse (less parameterized) models of regression. We use accelerated training algorithm based on forward basis selection[6] to train our discrimina- tive models on a large database of labeled poses observed from different viewpoints. In multi-camera settings, visual cues can be fused at fea- ture level to train a single discriminative model to predict 3D pose using concatenated feature vector obtained from multiple sensors. However, such a model will be depen- dent on the camera configurations. Rather, we train a sin- gle mixture of expert model to predict 3D pose from sin- gle camera input but with training examples captured from multiple viewpoints. We use this model to predict poses from each of the viewpoints independently. The combined predictive distribution is obtained by simply summing the mixture of Gaussian distributions obtained from each of the sensor models C = {C1, · · · , CN } with gate weights re- weighted to sum to one: p(x|r, W, Ω, λ) = N Cj M i=1 gij(r|λij)pij(x|r, Wij, Ω−1 ij ) (5) where N is the number of sensors and M are the experts in each of the Mixture of Experts model used to learn the mapping. 5. Top-down 3D Pose Refinement and Coarse Shape Estimation Generative(top-down) model based feedback stage is used to further refine the 3D pose estimates obtained from bottom-up methods. Our generative model consist of a coarse 3D human shape model with each body part rep- resented using simple geometric primitive shapes such as tapered cylinders. Geometric shapes allow fast image like- lihood computation and enforcing non-self penetration con- straint for the body parts. The top-down search fits the hu- man model to the visual hull by optimizing the parameters of the human skeleton model (5 dimensional), coarse 3D 52
  • 5. Figure 4. (left) Top-down model fitting is initialized by aligning the root joint and the centroid of the visual hull(shown in blue) (right) Overlap cost is computed as number of voxels(visual hull elements) lying inside the cylindrical body part. Parts self inter- section is penalized by adding an additional cost proportional to (R1 + R2 − D) for every self-penetrating part. shapes (5 dimensional) and joints angles (≈ 15 after vari- ance based pruning). We use predictive distribution from the feed-forward methods to prune the joint angles having low variance. The likelihood cost is computed as sum of degree of overlap of each part to the visual hull with an added cost for each pair of intersecting parts (see fig. 4). In computing the self-penetration cost, we compute the short- est distance D between the two axes of the cylindrical body parts of radii R1 and R2. For the two intersecting parts, we add a penalty term proportional to (R1 + R2 − D) in the likelihood function. Stochastic Optimization using MCMC: We use Markov Chain Monte Carlo (MCMC) simulation for searching in the parameter space of the human skeletal links(L), the coarse shape models (S) and 3D pose (θ). MCMC is a suitable methodology for computing a maximum a pos- terior(MAP) solution of the posterior argmaxxp(x|r) by drawing samples from the proposal density (that approxi- mates the posterior) using a random walk based Metropolis algorithm[14]. At the tth iteration, a candidate xi is sam- pled from a proposal distribution q(x′ |xt−1) and accepted as the new state with a probability a(xt−1 → x′ ) where: a(xt−1 → x′ ) = min{1, p(x′ |r)q(xt−1|x′ ) p(xt−1|r)q(x′|xt−1) } (6) where x′ = {L, S, θ} are the parameters which are op- timized to maximize the overlap between the coarse 3D human model and visual hull. Here S denotes the low- dimensional PCA coefficients of anthropometric prior. In order to avoid local optima, we use simulated annealing that gradually introduces global optima in the distribution to be maximized p(x|r)1/Ti . The parameter Ti is gradually de- creased under the assumption that p(x|r)∞ mostly concen- trates around the global maxima[10]. Proposal Map Computation: The proposal distribution plays critical role in MCMC search and is assumed to be independent for shape and pose parameters. We adopt Metropolis algorithm for sampling our proposal map that are not conditioned on the current state xt−1. The proposal distribution q(θ) is obtained as mixture of Gaussians from the bottom-up predictors (5) and are ill-suited for searching in the joint angle space. Sampling from the angular pri- ors of the joints higher in the skeletal hierarchy (such as shoulder and femur joints) may produce larger spatial mo- tion compared to the lower joints (such as elbow and knee joints). Optimizing simultaneously in the entire 3D pose space may cause instability and more iterations for conver- gence. This problem may be resolved by fitting joints higher in the skeletal hierarchy first. We adopt a more principled approach [13] whereby we sample from the spatial prior as opposed to angular prior. Specifically, for the ith skele- tal link, we sample from the p(θi, Σθi ) = N(F(θi), ΣF ) and F(θi) = F(θ (p) i ) ∗ R(θi) + T (θi) where F(θi) is the end location of the ith joint link and θ (p) i is its parent joint. Sampling from F(θi) is not straight forward as unlike θi, it spans non-linear manifold M. In order to compute the covariance, we linearly approximate the manifold at a point by the tangent space at that point. We compute the jacobian J and use it to compute covariance as ΣF = Jθi Σθi JT θi . At tth iteration, sampling from the distribution N(F(θi), ΣF ) generates locations of end-effectors of the joints that is used to compute the angle by minimization of the function: θ (t) i = minθi ||F′ (t) − F(θi)||2 s.t. θmin i ≤ θi ≤ θmax i , (7) The minimization is performed using standard Levenberg- Marquardt optimization algorithm. 6. Detailed 3D Shape Estimation 3D pose and coarse shape, estimated from top-down method, is used to initialize the search in parameter space of detailed 3D human shapes. We model 3D shape of humans using polygonal 3D mesh surfaces skinned to an underly- ing skeleton. We assume that the 3D mesh surface under- goes deformation only under the influence of the skeleton attached to it. Shape of human body can vary both due to anthropometry or the pose of the target. Anthropometric variability is modeled by the learned 3D shape models for humans. The shape deformation due to pose is obtained by first skinning the 3D mesh to the skeleton and transforming the vertices under the influence of associated skeletal joints. Skinning 3D Mesh to the Skeleton: We use Linear Blend Skinning (LBS) for efficient non-rigid deformation of skin as a function of underlying skeleton. LBS is achieved by as- sociating the vertices to two nearest joints. The transforma- tion is computed as weighted sum of the transformation due to each of the joints where weights are computed as inverse distance from the joints. Fig. 5 illustrates the computation 53
  • 6. Figure 5. Linear Blend Skinning is used to deform the 3D mesh under the influence of the skeleton,(left) Rigidly deforming human body parts causes artifacts around the joints ;(middle) Vertices are transformed using weighted sum of transformation due to multi- ple associated joints ; (right) Shape deformation with backpack accessory attached to the torso of the transformation of vertices associated to different body segments. Although rich in terms of representation, global 3D hu- man shape representation cannot model 3D shapes with dis- proportionately sized body parts. In order to support rich set of human shapes we use a combined local part-based and global optimization scheme that first searches in the lo- cal subspace of human body parts to match the observation, followed by constraining the whole shape using global hu- man shape model. Fitting body parts independently causes discontinuities along the joints and generates unrealistic shapes (see fig. 6). Constraining the shape to lie in the global shape space therefore ensures it to be a valid shape. For linear PCA based shape models, this is efficiently done by ensuring the PCA coefficients of the shape (when pro- jected to the subspace) to lie within a range of variance. Stochastic Search in Local and Global Shape Space: Our algorithm does alternate search in the parameter space of 3D human pose (θ) and shape (S) to simultaneously re- fine the pose and fit detailed 3D shape to the observation. The search is performed using Data Driven MCMC with metropolis-hasting method wherein the proposal map does not use the predictive distribution obtained from bottom- up methods but rather is modeled as Gaussian distribu- tion conditioned on the current state q(x′ |xt−1) where xt−1 = {θt−1, St−1}.The likelihood distribution is mod- eled as symmetrical chamfer distance map[2] to match the 2D projection of the model to the observed image silhou- ettes from multiple sensors. For optimizing the 3D pose, we use the current 3D shape to search in the parameter space of articulated human pose. The regression function M (1), that maps the coarse human shape model to the detailed shape PCA coefficients, is used to initialize the search. Plausi- ble 3D shapes are sampled from the Gaussian distributions that the PCA based subspace represents for each of the body Figure 6. Detailed 3D shape fitting by sampling from PCA based shape models of various body components, (left) Average human shape model, (middle) Shape with each body part sampled from the parts shape model, (right) 3D shape obtained after constraining the shape using global shape model Figure 7. Accurate 3D surface reconstruction of human body is provided for all the poses in I3DPost [11] dataset. 3D shape fitting algorithms are evaluated by matching the fitted 3D shape (shown as red colored vertices) with the ground truth surface reconstruc- tion(shown as blue colored vertices). parts. The search is performed by alternately fitting the 3D pose first, followed by optimization of the shape parame- ters of the individual body parts. At every iteration, the 3D shape of human body is constrained using global shape model to ensure a valid shape (see fig. 6). 7. Experimental Evaluation We conducted experiments on both publically available datasets and those captured at our motion capture facility. In all our experiments, we used 4 synchronized image streams from calibrated sensors to estimate 3D pose and shape of the human targets. 3D motion capture data was used to train our bottom-up predictors. BME model was trained with 3 ex- perts For training bottom-up methods, we used vector quan- tized, shape context histograms computed over both outer contour and the internal edges of the foreground object as the inputs for regression. Fig. 8 illustrates the results of our framework on walking sequences with and without back- pack. I3DPost data[11] also provide accurate 3D surface reconstruction of subjects in different walking poses. We evaluate the accuracy of our shape fitting algorithms using this as a groundtruth. Error is computed as sum of distance of the surface vertex to the nearest vertex of the fitted 3D shape. Fig. 7 illustrates the technique on an example image 54
  • 7. Figure 8. 3D Pose and shape fitting results for different sequences. Three columns on the right show the results with backpack accessory from walking sequence. 7.1. Shape Fitting to Accessories Our system also supports automatic estimation of size of an accessory bag carried by humans. Backpack is modeled as a trapezoidal shape and is assumed to be rigidly attached to the torso such that the translation and orientation of the backpack can be directly computed using that of torso. The two parameters of the trapezoid (thickness and orientation of non-perpendicular face) are iteratively estimated during the 3D shape fitting. The shape of the accessory is initial- ized to mean thickness of human torso. The framework functions as a generative classifier to identify whether a hu- man is carrying backpack or not. Improvement in the likeli- hood of fit for the model with the attached accessory implies presence of backpack. This is illustrated in the fig. 9(b) whereby use of model with an attached accessory improved the likelihood of fit from 1.043 to 1.3441. 7.2. Human Attribute Inference Using 3D Shape Analysis The estimated 3D shape of the human target can be used for inference of a variety of human attributes that are use- ful for identifying a potentially hostile behavior. Demo- graphic features such as gender and ethnicity, physical at- tributes such as height, weight and body appearance can be inferred either by computing spatial statistics of different regions of the fitted 3D shape or by determining anthropo- metric variations that characterizes these features.Various anthropometric measurements can be directly inferred from the 3D shape fitting to the observed multi-sensor data. Fig. 9(c) shows the measurements of different body parts esti- mated from the 3D shapes fitted to the observations. Gender Classification: We use linear discriminant analysis (LDA) to find the feature projections that best discriminate the shape profiles of the two gender classes. Linear Dis- criminant Analysis (LDA) essentially learns a linear clas- sification boundary between the two classes under the as- sumption that the samples from each of the two classes are normally distributed. The LDA vector can be used to clas- sify a person’s gender based on the fitted 3D shape. Similar to gender classification, age and ethnicity attributes of a per- son can be inferred depending on the body stature. Fig. 9(a) shows the gender classification results using LDA. Here the threshold for gender classification is set to 0 and negative LDA coefficients denote female shapes. 8. Conclusions We have proposed an integrated approach that combines bottom-up and top-down methods for 3D pose and shape estimation of human targets from multi-view imagery. We 55
  • 8. Figure 9. Human attribute inference using shape analysis,(a) Gender classification (b) 3D shape fitting without and with backpack in middle and bottom row respectively. The observation matching cost (using chamfer distance) without and with backpack model were 1.3441 and 1.043 respectively. (c) 3D shape estimation can be used estimate dimensions of various body parts limit the number of sensors used in our framework to 4. To overcome ambiguity and ill-constrained nature of the prob- lem, we use efficient anthropometric priors of human shape and pose learned from the CAESAR dataset. Accurate 3D pose and shape estimated from our framework can be used for inferring attributes like gender, age, ethnicity and body weight. Currently our framework does not use tracking, but fits pose and shape for every frame independently. Pose and surface tracking will be employed in future to obtain smoother 3D shape deformation in a video. Acknowledgements: We thank George Williams, Peter Birdsall and Kirill Smoleskiy for assisting us in data collec- tion. We thank Asaad Hakeem for discussions and useful comments on the work. This work was supported by Air Force Research Lab, contract number FA8650-10-M-6094. References [1] B. Allen, B. Curless, and Z. Popovic. The space of human body shapes: recon- struction and parameterization from range scans. ACM SIGGRAPH, 2003. 50, 51 [2] A. Balan, L. Sigal, M. Black, J. Davis, and H. Haussecker. Detailed human shape and pose from images. CVPR, 2007. 50, 51, 54 [3] A. O. Balan and M. J. Black. The naked truth: Estimating body shape under clothing. In ECCV (2), pages 15–29, 2008. 50 [4] C. Bregler, J. Malik, and K. Pullen. Twist based acquisition and tracking of animal and human kinematics. International Journal of Computer Vision, 56(3):179–194, 2004. 50 [5] Y. Chen, T.-K. Kim, and R. Cipolla. Inferring 3d shapes and deformations from single views. In ECCV (3), pages 300–313, 2010. 50 [6] A. C. Faul and M. E. Tipping. Analysis of sparse bayesian learning. Proc. Neural Information Processing Systems, pages 383–389, 2001. 52 [7] J. Gall, B. Rosenhahn, and H.-P. Seidel. Drift-free tracking of rigid and articu- lated objects. In CVPR. IEEE Computer Society, 2008. 50 [8] J. Gall, C. Stoll, E. de Aguiar, C. Theobalt, B. Rosenhahn, and H.-P. Seidel. Motion capture using joint skeleton tracking and surface estimation. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1746–1753, 2009. 50 [9] J. Gall, A. Yao, and L. J. V. Gool. 2d action recognition serves 3d human pose estimation. In ECCV (3), pages 425–438, 2010. 50 [10] S. Geman and D. Geman. Stochastic relaxation, gibbs distributions and the bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6):721–741, 2010. 53 [11] N. Gkalelis, H. Kim, A. Hilton, N. Nikolaidis, and I. Pitas. The i3dpost multi- view and 3d human action/interaction. In Proc. Conference on Visual Media Production, 1(1):159–168, 2009. 54 [12] P. Guan, A. Weiss, A. O. Balan, and M. J. Black. Estimating human shape and pose from a single image. In ICCV, pages 1381–1388. IEEE, 2009. 50 [13] S. Hauberg, S. Sommer, and K. S. Pedersen. Gaussian-like spatial priors for articulated tracking. ECCV, 2010. 53 [14] M. Lee and I. Cohen. Proposal maps driven mcmc for estimating human body pose in static images. Proc. Computer Vision and Pattern Recognition Conf., pages 334–341, 2004. 53 [15] T. B. Moeslund, A. Hilton, and V. Krüger. A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understand- ing, 104(2-3):90–126, 2006. 50 [16] L. Mündermann, S. Corazza, and T. P. Andriacchi. Accurately measuring hu- man movement using articulated icp with soft-joint constraints and a repository of articulated models. In CVPR. IEEE Computer Society, 2007. 50 [17] G. Pons-Moll, A. Baak, T. Helten, M. Müller, H.-P. Seidel, and B. Rosenhahn. Multisensor-fusion for 3d full-body human motion capture. In CVPR, pages 663–670, 2010. 50 [18] K. Robinette and H. Daanen. The caesar project: A 3-d surface anthropometry survey. Second International Conference on 3-D Imaging and Modeling, 1999. 50, 51 [19] L. Sigal, A. O. Balan, and M. J. Black. Combined discriminative and genera- tive articulated pose and non-rigid shape estimation. In J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, editors, NIPS. MIT Press, 2007. 49, 50 [20] C. Sminchisescu, A. Kanaujia, Z. Li, and D. N. Metaxas. Discriminative density propagation for 3d human motion estimation. In Proc. Computer Vision Pattern Recognition, 2005. 52 [21] C. Stoll, J. Gall, E. de Aguiar, S. Thrun, and C. Theobalt. Video-based re- construction of animatable human characters. ACM Trans. Graph., 29(6):139, 2010. 50 56