A structure from motion solution to head pose recovery for model-based video coding.
Heathcote, Jonathan Michael.
MetadataShow full item record
Current hybrid coders such as H.261/263/264 or MPEG-l/-2 cannot always offer high quality-to-compression ratios for video transfer over the (low-bandwidth) wireless channels typical of handheld devices (such as smartphones and PDAs). Often these devices are utilised in videophone and teleconferencing scenarios, where the subjects of inte:est in the scene are peoples faces. In these cases, an alternative coding scheme known as Model-Based Video Coding (MBVC) can be employed. MBVC systems for face scenes utilise geometrically and photorealistically accurate computer graphic models to represent head !md shoulder views of people in a scene. High compression ratios are achieved at the encoder by extracting and transmitting only the parameters which represent the explicit shape and motion changes occurring on the face in the scene. With some a priori knowledge (such as the MPEG-4 standard for facial animation parameters), the transmitted parameters can be used at the decoder to accurately animate the graphical model and a synthesised version of the scene (originally appearing at the encoder) can be output. Primary components for facial re-animation at the decoder are a set of local and global motion parameters extracted from the video sequence appearing at the encoder. Local motion describes the changes in facial expression occurring on the face. Global motion describes the three-dimensional motion· of the entire head as a rigid object. Extraction of this three-dimensional global motion is often called head tracking. This thesis focuses on the tracking of rigid head pose in a monocular video sequence. The system framework utilises the recursive Structure from Motion (SfM) method of Azarbayejani and Pentland. Integral to the SfM solution are a large number of manually selected two-dimensional feature points, which are tracked throughout the sequence using an efficient image registration technique. The trajectories of the feature points are simultaneously processed by an extended Kalman filter (EKF) to stably recover camera geometry and the rigid three-dimensional structure and pose of the head. To improve estimation accuracy and stability, adaptive estimation is harnessed within the Kalman filter by dynamically varying the noise associated with each of the feature measurements. A closed loop approach is used to constrain feature tracking in each frame. The Kalman filter's estimate of motion and structure of the face are used to predict the trajectory of the features, thereby constraining the search space for the next frame in the video sequence. Further robustness in feature tracking is achieved through the integration of a linear appearance basis to accommodate variations in illumination or changes in aspect on the face. Synthetic experiments are performed for both the SfM and the feature tracking algorithm. The accuracy of the SfM solution is evaluated against synthetic ground truth. Further experimentation demonstrates the stability of the framework to significant noise corruption on arriving measurement data. The accuracy of obtained pixel measurements in the feature tracking algorithm is also evaluated against known ground truth. Additional experiments confirm feature tracking stability despite significant changes in target appearance. Experiments with real video sequences illustrate robustness of the complete head tracker to partial occlusions on the face. The SfM solution (including two-dimensional tracking) runs near real time at 12 Hz. The limits of Pitch, Yaw and Roll (rotational) recovery are 45°,45° and 90° respectively. Large translational recovery (especially depth) is also demonstrated. The estimated motion trajectories are validated against (publically available) ground truth motion captured using a commercial magnetic orientation tracking system. Rigid reanimation of an overlayed wire frame face model is further used as a visually subjective analysis technique. These combined results serve to confirm the suitability of the proposed head tracker as the global (rigid) motion estimator in an MBVC system.