Markerless pose tracking of a human subject.
High capacity wireless and xed-line broadband services have a relatively small footprint over South Africa's vast expanse. This results in many rural areas, as well as military communication when deployed, relying on low-bandwidth communication networks instead, making live video communication over these links impractical. Traditional and advanced data compression methods cannot produce the payload reduction required for video use over these bandwidths. Instead, a model-based vision system is used to address this problem. This is not video compression but rather image understanding and representation in the context of prior models of the observed object. Markerless human tracking and pose recovery are the specific interests of this research. Markerless human pose tracking is a relatively new and growing field of image processing. It has many potential areas of application apart from low-bandwidth video communication, including the medical field, sporting arena, security and surveillance and human-machine interaction. As multimedia technologies continue to grow and improve, pose tracking systems have the potential to be used more and more. While a few markerless tracking devices are beginning to emerge, many currently available commercial motion capture systems require the use of a special suit and markers or sensors. This makes them very impractical for easy everyday, anywhere use. Current research in computer vision and image processing incorporates a significant focus on the development of markerless approaches to human motion capture. This dissertation looks at a complete markerless human pose tracking system which can be split into four distinct but interlinking stages: the image capture, image processing, body model and optimisation stages. After video data from multiple camera views is captured, the processing stage extracts image cues such as silhouettes, 2-D edges and 3-D colour volumetric reconstruction. Following the basic principle of a model-based approach, a 24 degree-of-freedom superellipsoid body model is fitted to the observed image cue data. An objective function is used to measure the closeness of this match. A number of different optimisation approaches are examined for use in refining and finding the best fitting body pose for each image frame. These approaches are all based around Stochastic Meta Descent (SMD) optimisation with SMD by itself, SMD in a hierarchical approach, SMD with pose prediction and Smart Particle Filtering, SMD inside a particle filter framework, all explored. The performance of the system with the various optimisation approaches is tested using the HumanEvaII datasets. These datasets contain a number of different subjects performing a variety of actions while wearing ordinary clothes. They contain markerbased ground-truth data obtained using a ViconPeak motion capture system. This allows a relative error measurement of the predicted poses to be calculated. With its robustness to clutter and occlusion, the Smart Particle Filter approach is shown to give the best results.