MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Filtering and Optimization Strategies for Markerless Human Motion Capture with Skeleton-based Shape Models

Juergen Gall
Max-Planck-Institut für Informatik - D4
Promotionskolloquium

TBA
AG 1, AG 4, RG1, MMCI, AG 3, AG 5, SWS  
AG Audience
English

Date, Time and Location

Tuesday, 7 July 2009
14:00
60 Minutes
E1 4
019
Saarbrücken

Abstract

Since more than 2000 years, people have been interested in understanding and analyzing the movements of animals and humans which lead to the development of advanced computer systems for motion capture. Although marker-based systems for motion analysis are commercially successful, capturing the performance of a human or an animal from a multi-view video sequence without the need for markers is still a challenging task. The most popular methods for markerless human motion capture are model-based approaches that rely on a surface model of the human with an underlying skeleton. In this context, markerless motion capture seeks for the pose, i.e., the position, orientation, and configuration of the human skeleton that is best explained by the image data. In order to address this problem, we discuss the two questions:

1. What are good cues for human motion capture?

Typical cues for motion capture are silhouettes, edges, color, motion, and texture. In general, a multi-cue integration is necessary for tracking complex objects like humans since all these cues come along with inherent drawbacks. Besides the selection of the cues to be combined, reasonable information fusion is a common challenge in many computer vision tasks. Ideally, the impact of a cue should be large in situations when its extraction is reliable, and small, if the information is likely to be erroneous. To this end, we propose an adaptive weighting scheme that combines complementary cues, namely silhouettes on one side and optical flow as well as local descriptors on the other side. Whereas silhouette extraction works best in case of homogeneous objects, optical flow computation and local descriptors perform better on sufficiently structured objects. Besides image-based cues, we also propose a statistical prior on anatomical constraints that is independent of motion patterns.

Relying only on image features that are tracked over time does not prevent the accumulation of small errors which results in a drift away from the target object. The error accumulation becomes even more problematic in the case of multiple moving objects due to occlusions. To solve the drift problem for tracking, we propose an analysis-by-synthesis framework that uses reference images to correct the pose. It comprises an occlusion handling and is successfully applied to crash test video analysis.

2. Is human motion capture a filtering or an optimization problem?

Model-based human motion capture can be regarded as a filtering or an optimization problem. While local optimization offers accurate estimates but often looses track due to local optima, particle filtering can recover from errors at the expense of a poor accuracy due to overestimation of noise. In order to overcome the drawbacks of local optimization, we introduce a novel global stochastic optimization approach for markerless human motion capturing that is derived from the mathematical theory on interacting particle systems. We call the method \emph{interacting simulated annealing} (ISA) since it is based on an interacting particle system that converges to the global optimum similar to simulated annealing. It estimates the human pose without initial information, which is a challenging optimization problem in a high dimensional space. Furthermore, we propose a tracking framework that is based on this optimization technique to achieve both the robustness of filtering strategies and a remarkable accuracy.

In order to benefit from optimization and filtering, we introduce a multi-layer framework that combines stochastic optimization, filtering, and local optimization. While the first layer relies on interacting simulated annealing, the second layer refines the estimates by filtering and local optimization such that the accuracy is increased and ambiguities are resolved over time without imposing restrictions on the dynamics.

In addition, we propose a system that recovers not only the movement of the skeleton, but also the possibly non-rigid temporal deformation of the 3D surface. While large scale deformations or fast movements are captured by the skeleton pose and approximate surface skinning, true small scale deformations or non-rigid garment motion are captured by fitting the surface to the silhouette. In order to make automatic processing of large data sets feasible, the skeleton-based pose estimation is split into a local one and a lower dimensional global one by exploiting the tree structure of the skeleton.

Our experiments comprise a large variety of sequences for qualitative and quantitative evaluation of the proposed methods, including a comparison of global stochastic optimization with several other optimization and particle filtering approaches.

Contact

Thorsten Thormählen
+49 681 9325-417
--email hidden
passcode not visible
logged in users only

Thorsten Thormählen, 06/30/2009 13:44 -- Created document.