Human motion reconstruction fom video sequences with MPEG-4 compliant animation parameters.
The ability to track articulated human motion in video sequences is essential for applications ranging from biometrics, virtual reality, human-computer interfaces and surveillance. The work presented in this thesis focuses on tracking and analysing human motion in terms of MPEG-4 Body Animation Parameters, in the context of a model-based coding scheme. Model-based coding has emerged as a potential technique for very low bit-rate video compression. This study emphasises motion reconstruction rather than photorealistic human body modelling, consequently a 3-D skeleton with 31 degrees-of-freedom was used to model the human body. Compression is achieved by analysing the input images in terms of the known 3-D model and extracting parameters that describe the relative pose of each segment. These parameters are transmitted to the decoder which synthesises the output by transforming the default model into the correct posture. The problem comprises two main aspects: 3-D human motion capture and pose description. The goal of the 3-D human motion capture component is to generate 3-D locations of key joints on the human body without the use of special markers or sensors placed on the subject. The input sequence is acquired by three synchronised and calibrated CCD cameras. Digital image matching techniques including cross-correlation and least squares matching are used to find spatial correspondences between the multiple views as well as temporal correspondences in subsequent frames with sub-pixel accuracy. The tracking algorithm automates the matching process examining each matching result and adaptively modifying matching parameters. Key points must be manually selected in the first frame, following which the tracking commences without the intervention of the user, employing the recovered 3-D motion of the skeleton model for prediction of future states. Epipolar geometry is exploited to verify spatial correspondences in each frame before the 3-D locations of all joints are computed through triangulation to construct the 3-D skeleton. The pose of the skeleton is described by the MPEG-4 Body Animation Parameters. The subject's motion is reconstructed by applying the animation parameters to a simplified version of the default MPEG-4 skeleton. The tracking algorithm may be adapted to 2-D tracking in monocular sequences. An example of 2-D tracking of facial expressions demonstrates the flexibility of the algorithm. Further results involving tracking separate body parts demonstrate the advantage of multiple views and the benefit of camera calibration, which simplifies the generation of 3-D trajectories and the estimation of epipolar geometry. The overall system is tested on a walking sequence where full body motion capture is performed and all 31 degrees-of freedom of the tracked model are extracted. Results show adequate motion reconstruction (i.e. convincing to most human observers), with slight deviations due to lack of knowledge of the volumetric property of the human body.