Data classification using genetic programming.
Date
2015
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Genetic programming (GP), a field of artificial intelligence, is an evolutionary algorithm
which evolves a population of trees which represent programs. These programs
are used to solve problems. This dissertation investigates the use of genetic programming
for data classification. In machine learning, data classification is the process
of allocating a class label to an instance of data. A classifier is created in order to
perform these allocations. Several studies have investigated the use of GP to solve
data classification problems. These studies have shown that GP is able to create
classifiers with high classification accuracies. However, there are certain aspects
which have not previously been investigated.
Five areas were investigated in this dissertation. The first was an investigation
into how discretisation could be incorporated into a GP algorithm. An adaptive
discretisation algorithm was proposed, and outperformed certain existing methods.
The second was a comparison of GP representations for binary data classification.
The findings indicated that from the representations examined (arithmetic trees,
decision trees, and logical trees), the decision trees performed the best. The third
was to investigate the use of the encapsulation genetic operator and its effect on
data classification. The findings revealed that an improvement in both training and
test results was achieved when encapsulation was incorporated. The fourth was an
investigative analysis of several hybridisations of a GP algorithm with a genetic algorithm
in order to evolve a population of ensembles. Four methods were proposed and
these methods outperformed certain existing GP and ensemble methods. Finally,
the fifth area was to investigate an ensemble construction method for classification.
In this approach GP evolved a single ensemble. The proposed method resulted in
an improvement in training and test accuracy when compared to the standard GP
algorithm.
The methods proposed in this dissertation were tested on publicly available data
sets, and the results were statistically tested in order to determine the effectiveness
of the proposed approaches.
Description
Master of Science in Computer Science.
Keywords
Big data--Classification., Genetic programming (Computer science), Theses--Computer science.