|dc.description.abstract||High capacity wireless and xed-line broadband services have a relatively small footprint
over South Africa's vast expanse. This results in many rural areas, as well as
military communication when deployed, relying on low-bandwidth communication networks
instead, making live video communication over these links impractical. Traditional
and advanced data compression methods cannot produce the payload reduction
required for video use over these bandwidths. Instead, a model-based vision system
is used to address this problem. This is not video compression but rather image understanding
and representation in the context of prior models of the observed object.
Markerless human tracking and pose recovery are the specific interests of this research.
Markerless human pose tracking is a relatively new and growing field of image processing.
It has many potential areas of application apart from low-bandwidth video
communication, including the medical field, sporting arena, security and surveillance
and human-machine interaction. As multimedia technologies continue to grow and
improve, pose tracking systems have the potential to be used more and more. While
a few markerless tracking devices are beginning to emerge, many currently available
commercial motion capture systems require the use of a special suit and markers or
sensors. This makes them very impractical for easy everyday, anywhere use. Current
research in computer vision and image processing incorporates a significant focus on
the development of markerless approaches to human motion capture.
This dissertation looks at a complete markerless human pose tracking system which
can be split into four distinct but interlinking stages: the image capture, image processing,
body model and optimisation stages. After video data from multiple camera views
is captured, the processing stage extracts image cues such as silhouettes, 2-D edges and
3-D colour volumetric reconstruction. Following the basic principle of a model-based
approach, a 24 degree-of-freedom superellipsoid body model is fitted to the observed
image cue data. An objective function is used to measure the closeness of this match.
A number of different optimisation approaches are examined for use in refining and
finding the best fitting body pose for each image frame. These approaches are all based
around Stochastic Meta Descent (SMD) optimisation with SMD by itself, SMD in a
hierarchical approach, SMD with pose prediction and Smart Particle Filtering, SMD
inside a particle filter framework, all explored.
The performance of the system with the various optimisation approaches is tested
using the HumanEvaII datasets. These datasets contain a number of different subjects
performing a variety of actions while wearing ordinary clothes. They contain markerbased
ground-truth data obtained using a ViconPeak motion capture system. This
allows a relative error measurement of the predicted poses to be calculated. With its
robustness to clutter and occlusion, the Smart Particle Filter approach is shown to give
the best results.||en