Browsing by Author "Pillay, Nelishia."
Now showing 1 - 10 of 10
- Results Per Page
- Sort Options
Item Automated design of genetic programming of classification algorithms.(2018) Nyathi, Thambo.; Pillay, Nelishia.Over the past decades, there has been an increase in the use of evolutionary algorithms (EAs) for data mining and knowledge discovery in a wide range of application domains. Data classification, a real-world application problem is one of the areas EAs have been widely applied. Data classification has been extensively researched resulting in the development of a number of EA based classification algorithms. Genetic programming (GP) in particular has been shown to be one of the most effective EAs at inducing classifiers. It is widely accepted that the effectiveness of a parameterised algorithm like GP depends on its configuration. Currently, the design of GP classification algorithms is predominantly performed manually. Manual design follows an iterative trial and error approach which has been shown to be a menial, non-trivial time-consuming task that has a number of vulnerabilities. The research presented in this thesis is part of a large-scale initiative by the machine learning community to automate the design of machine learning techniques. The study investigates the hypothesis that automating the design of GP classification algorithms for data classification can still lead to the induction of effective classifiers. This research proposes using two evolutionary algorithms,namely,ageneticalgorithm(GA)andgrammaticalevolution(GE)toautomatethe design of GP classification algorithms. The proof-by-demonstration research methodology is used in the study to achieve the set out objectives. To that end two systems namely, a genetic algorithm system and a grammatical evolution system were implemented for automating the design of GP classification algorithms. The classification performance of the automated designed GP classifiers, i.e., GA designed GP classifiers and GE designed GP classifiers were compared to manually designed GP classifiers on real-world binary class and multiclass classification problems. The evaluation was performed on multiple domain problems obtained from the UCI machine learning repository and on two specific domains, cybersecurity and financial forecasting. The automated designed classifiers were found to outperform the manually designed GP classifiers on all the problems considered in this study. GP classifiers evolved by GE were found to be suitable for classifying binary classification problems while those evolved by a GA were found to be suitable for multiclass classification problems. Furthermore, the automated design time was found to be less than manual design time. Fitness landscape analysis of the design spaces searched by a GA and GE were carried out on all the class of problems considered in this study. Grammatical evolution found the search to be smoother on binary classification problems while the GA found multiclass problems to be less rugged than binary class problems.Item A computer mediated system for distance education.(1996) Pillay, Nelishia.; Dempster, Robert.A problem currently facing South Africa is the large number of poorly educated or uneducated people in many parts of the country. Distance education has proven to be an apt solution to this problem However, one of the numerous constraints associated with studying at a distance is insufficient communication between students and lecturers and the lack of peer interaction. The integration of Computer Mediated Communications (CMC) in the delivery of distance education courses world-wide has proved to be a means of alleviating this communication problem. The study presented in this thesis examines the technical feasibility of implementing CMC in the delivery of South African distance education courses as a solution to the communication problems experienced by distance learners in this country. For this purpose a system was developed and implemented at a South African distance education institution namely, Natal College of Education in Pietermaritzburg. Based on this implementation a technical evaluation of the feasibility of CMC in the instruction of distance education courses within a South African infrastructure was examined. As a result of this study we have been able to: • Determine the technical problems associated with the implementation of a CMC system in a South African distance education environment. • Identify possible solutions to these technical problems • Define a set of criteria, which if met by a CMC system would ensure the technical feasibility of the system as a solution to the communication problems experienced by South African distance learners. • Determine the effects of students' attitudes towards computers on their use of the CMC system. • Determine the effect of CMC on students' attitudes towards computers. • Identify any additional factors, besides technical issues, which need to be taken into account when implementing a CMC system.Item Data classification using genetic programming.(2015) Dufourq, Emmanuel.; Pillay, Nelishia.Genetic programming (GP), a field of artificial intelligence, is an evolutionary algorithm which evolves a population of trees which represent programs. These programs are used to solve problems. This dissertation investigates the use of genetic programming for data classification. In machine learning, data classification is the process of allocating a class label to an instance of data. A classifier is created in order to perform these allocations. Several studies have investigated the use of GP to solve data classification problems. These studies have shown that GP is able to create classifiers with high classification accuracies. However, there are certain aspects which have not previously been investigated. Five areas were investigated in this dissertation. The first was an investigation into how discretisation could be incorporated into a GP algorithm. An adaptive discretisation algorithm was proposed, and outperformed certain existing methods. The second was a comparison of GP representations for binary data classification. The findings indicated that from the representations examined (arithmetic trees, decision trees, and logical trees), the decision trees performed the best. The third was to investigate the use of the encapsulation genetic operator and its effect on data classification. The findings revealed that an improvement in both training and test results was achieved when encapsulation was incorporated. The fourth was an investigative analysis of several hybridisations of a GP algorithm with a genetic algorithm in order to evolve a population of ensembles. Four methods were proposed and these methods outperformed certain existing GP and ensemble methods. Finally, the fifth area was to investigate an ensemble construction method for classification. In this approach GP evolved a single ensemble. The proposed method resulted in an improvement in training and test accuracy when compared to the standard GP algorithm. The methods proposed in this dissertation were tested on publicly available data sets, and the results were statistically tested in order to determine the effectiveness of the proposed approaches.Item Evolving dynamic fitness measures for genetic programming.(2018) Ragalo, Anisa Waganda.; Pillay, Nelishia.This research proposes dynamic fitness measure genetic programming (DFMGP). DFMGP modifies the conventional genetic programming (GP) approach: rather than applying a single fitness measure individually throughout GP, a different fitness measure (or combination of fitness measures) is applied on each GP generation. A detailed review of the fitness measures used in GP is presented. The review demonstrates that different fitness measures were introduced to overcome different shortcomings, e.g. escaping local optima, reducing bloat, thereby improving on the performance of the GP algorithm. A subsequent analysis of the fitness measures shows that there is no universal “best” fitness measure; rather, different fitness measures are appropriate for different problems. The literature also anticipates that applying different fitness measures at different points of the GP problem solving process should be more effective then applying a single fitness measure throughout the algorithm. Hence the case for DFMGP. Selecting the fitness measures to apply on each GP generation is in itself a combinatorial optimization problem: the study investigates two approaches to serve this purpose, namely, a genetic algorithm and genetic programming. The genetic algorithm (GA) derives a sequence of fitness measures to be applied, while GP produces an arithmetic function combining the fitness measures. The performance of DFMGP applying the evolved fitness measure sequences and DFMGP applying the evolved fitness measure combinations is compared to the conventional GP approach on a number of benchmark and complex, real-world problems. DFMGP is found to be more effective than standard GP. The study also reveals that both the sequences and arithmetic combinations of the fitness measures are effective when applied to problem instances different from those used to derive them. Hence, the sequences and arithmetic combinations are reusable, whereby simpler problems are used for derivation, and DFMGP applying the derived fitness measures is then used to solve more complex problems. Therefore the time necessary for the derivations is reduced. An analysis of the evolved sequences and arithmetic combinations of the fitness measures shows that fitness measures applied in the preliminary DFMGP generations support exploration while those applied in later DFMGP generations support exploitation. GP search is a constant balance between exploration and exploitation, with the former being more suited to the preliminary generations, and the latter, later generations. DFMGP’s performance advantage over standard GP is therefore justified by the premise that the fitness measure used on each generation supports the more suitable search in the on-going phase of GP. DFMGP applying the fitness measure combinations derived by GP is also found to perform better than DFMGP applying the fitness measure sequences derived by the GA. The former approach facilitates combining explorative and exploitative fitness measures on some of the DFMGP generations, whereby rather than simply switching between exploration and exploitation, the fitness measure can drive the two processes to occur simultaneously when required. Hence it follows that GP searching the space of fitness measure combinations is the preferred approach to generating dynamic fitness measures for DFMGP. Overall, the study reveals the effectiveness of DFMGP when applied to benchmark and real-world problems. Future work will look at a priori detecting the properties of complex problems, such that simpler problems with similar properties can be used to derive better dynamic fitness measures for DFMGP.Item An investigation into the use of genetic programming for the induction of novice procedural programming solution algorithms in intelligent programming tutors.(2004) Pillay, Nelishia.; Sartori-Angus, Alan G.Intelligent programming tutors have proven to be an economically viable and effective means of assisting novice programmers overcome learning difficulties. However, the large-scale use of intelligent programming tutors has been impeded by the high developmental costs associated with building intelligent programming tutors. The research presented in this thesis forms part of a larger initiative aimed at reducing these costs by building a generic architecture for the development of intelligent programming tutors. One of the facilities that must be provided by the generic architecture is the automatic generation of solutions to programming problems. The study presented in the thesis examines the use of genetic programming as means of inducing solution algorithms to novice programming problems. The scope of the thesis is limited to novice procedural programming paradigm problems requiring the use of arithmetic, string manipulation, conditional, iterative and recursive programming structures. The methodology employed in the study is proof-by-demonstration. A genetic programming system for the induction of novice procedural solution algorithms was implemented and tested on randomly chosen novice procedural programming problems. The study has identified the standard and advanced genetic programming features needed for the successful generation of novice procedural solution algorithms. The outcomes of this study include the derivation of an internal representation language for representing procedural solution algorithms and a high-level programming problem specification format for describing procedural problems, in the generic architecture. One of the limitations of genetic programming is its susceptibility to converge prematurely to local optima and not find a solution in some cases. The study has identified fitness function biases against certain structural components that are needed to find a solution, as an additional cause of premature convergence in this domain. It presents an iterative structure-based algorithm as a solution to this problem. This thesis has contributed to both the fields of genetic programming and intelligent programming tutors. While genetic programming has been successfully implemented in various domains, it is usually applied to a single problem within that domain. In this study the genetic programming system must be capable of solving a number of different programming problems in different application domains. In addition to this, the study has also identified a means of overcoming premature convergence caused by fitness function biases in a genetic programming system for the induction of novice procedural programming algorithms. Furthermore, although a number of studies have addressed the student modelling and pedagogical aspects of intelligent programming tutors, none have examined the automatic generation of problem solutions as a means of reducing developmental costs. Finally, this study has contributed to the ongoing research being conducted by the artificial intelligence in education community, to test the effectiveness of using machine learning techniques in the development of different aspects of intelligent tutoring systems.Item Network intrusion detection using genetic programming.(2018) Chareka, Tatenda Herbert.; Pillay, Nelishia.Network intrusion detection is a real-world problem that involves detecting intrusions on a computer network. Detecting whether a network connection is intrusive or non-intrusive is essentially a binary classification problem. However, the type of intrusive connections can be categorised into a number of network attack classes and the task of associating an intrusion to a particular network type is multiclass classification. A number of artificial intelligence techniques have been used for network intrusion detection including Evolutionary Algorithms. This thesis investigates the application of evolutionary algorithms namely, Genetic Programming (GP), Grammatical Evolution (GE) and Multi-Expression Programming (MEP) in the network intrusion detection domain. Grammatical evolution and multi-expression programming are considered to be variants of GP. In this thesis, a comparison of the effectiveness of classifiers evolved by the three EAs within the network intrusion detection domain is performed. The comparison is performed on the publicly available KDD99 dataset. Furthermore, the effectiveness of a number of fitness functions is evaluated. From the results obtained, standard genetic programming performs better than grammatical evolution and multi-expression programming. The findings indicate that binary classifiers evolved using standard genetic programming outperformed classifiers evolved using grammatical evolution and multi-expression programming. For evolving multiclass classifiers different fitness functions used produced classifiers with different characteristics resulting in some classifiers achieving higher detection rates for specific network intrusion attacks as compared to other intrusion attacks. The findings indicate that classifiers evolved using multi-expression programming and genetic programming achieved high detection rates as compared to classifiers evolved using grammatical evolution.Item Structure based partial solution search for the examination timetabling problem.(2021) Rajah, Christopher Bradley.; Pillay, Nelishia.The aim of this work is to present a new approach, namely, Structure Based Partial Solution Search (SBPSS) to solve the Examination Timetabling Problem. The success of the Developmental Approach in this problem domain suggested that the strategy of searching the spaces of partial timetables whilst constructing them is promising and worth pursuing. This work adopts a similar strategy. Multiple timetables are incrementally constructed at the same time. The quality of the partial timetables is improved upon by searching their partial solution spaces at every iteration during construction. Another key finding from the literature survey revealed that although timetables may exhibit the same behaviour in terms of their objective values, their structures or exam schedules may be different. The challenge with this finding is to decide on which regions to pursue because some regions may not be worth investigating due to the difficulty in searching them. These problematic areas may have solutions that are not amenable to change which makes it difficult to improve them. Another reason is that the neighbourhoods of solutions in these areas may be less connected than others which may restrict the ability of the search to move to a better solution in that neighbourhood. By moving to these problematic areas of the search space the search may stagnate and waste expensive computational resources. One way to overcome this challenge is to use both structure and behaviour in the search and not only behaviour alone to guide the search. A search that is guided by structure is able to find new regions by considering the structural components of the candidate solutions which indicate which part of the search space the same candidates occupy. Another benefit to making use of a structure-based search is that it has no objective value bias because it is not guided by only the objective value. This statement is consistent with the literature survey where it is suggested that in order to achieve good performance the search should not be guided by only the objective value. The proposed method has been tested on three popular benchmark sets for examination timetabling, namely, the Carter benchmark set; the benchmark set from the International Timetabling competition in 2007 and the Yeditepe benchmark set. The SBPSS found the best solutions for two of the Carter problem instances. The SBPSS found the best solutions for four of the competition problem instances. Lastly, the SBPSS improved on the best results for all the Yeditepe problem instances.Item A study of evoluntionary perturbative hyper-heuristics for the nurse rostering problem.(2017) Rae, Christopher Stephen William Exter.; Pillay, Nelishia.Hyper-heuristics are an emerging field of study for combinatorial optimization. The aim of a hyper-heuristic is to produce good results across a set of problems rather than producing the best results. There has been little investigation of hyper-heuristics for the nurse rostering problem. The majority of hyper-heuristics for the nurse rostering problem fit into a single type of hyper-heuristic, the selection perturbative hyper-heuristic. There is no work in using evolutionary algorithms employed as selection perturbative hyper-heuristics for the nurse rostering problem. There is also no work in using the generative perturbative type of hyper-heuristic for the nurse rostering problem. The first objective of this dissertation is to investigate the selection perturbative hyper-heuristic for the nurse rostering problem and the effectiveness of employing an evolutionary algorithm (SPHH). The second objective is to investigate a generative perturbative hyper-heuristic to evolve perturbation heuristics for the nurse rostering problem using genetic programming (GPHH). The third objective is to compare the performance of SPHH and GPHH. SPHH and GPHH were evaluated using the INRC2010 benchmark data set and the results obtained were compared to available results from literature. The INRC2010 benchmark set is comprised of sprint, medium and long instance types. SPHH and GPHH produced good results for the INRC2010 benchmark data set. GPHH and SPHH were found to have different strengths and weaknesses. SPHH found better results than GPHH for the medium instances. GPHH found better results than SPHH for the long instances. SPHH produced better average results. GPHH produced results that were closer to the best known results. These results suggest future research should investigate combining SPHH and GPHH to benefit from the strengths of both perturbative hyper-heuristics.Item A study of genetic algorithms for solving the school timetabling problem.(2013) Raghavjee, Rushil.; Pillay, Nelishia.The school timetabling problem is a common optimization problem faced by many primary and secondary schools. Each school has its own set of requirements and constraints that are dependent on various factors such as the number of resources available and rules specified by the department of education for that country. There are two objectives in this study. In previous studies, genetic algorithms have only been used to solve a single type of school timetabling problem. The first objective of this study is to test the effectiveness of a genetic algorithm approach in solving more than one type of school timetabling problem. The second objective is to evaluate a genetic algorithm that uses an indirect representation (IGA) when solving the school timetabling problem. This IGA approach is then compared to the performance of a genetic algorithm that uses a direct representation (DGA). This approach has been covered in other domains such as job shop scheduling but has not been covered for the school timetabling problem. Both the DGA and IGA were tested on five school timetabling problems. Both the algorithms were initially developed based on findings in the literature. They were then improved iteratively based on their performance when tested on the problems. The processes of the genetic algorithms that were improved were the method of initial population creation, the selection methods and the genetic operators used. Both the DGA and the IGA were found to produce timetables that were competitive and in some cases superior to that of other methods such as simulated annealing and tabu search. It was found that different processes (i.e. the method of initial population creation, selection methods and genetic operators) were needed for each problem in order to produce the best results. When comparing the performance of the two approaches, the IGA outperformed the DGA for all of the tested school timetabling problems.Item A study of genetic programming and grammatical evolution for automatic object-oriented programming.(2016) Igwe, Kevin Chizoba.; Pillay, Nelishia.Manual programming is time consuming and challenging for a complex problem. For efficiency of the manual programming process, human programmers adopt the object-oriented approach to programming. Yet, manual programming is still a tedious task. Recently, interest in automatic software production has grown rapidly due to global software demands and technological advancements. This study forms part of a larger initiative on automatic programming to aid manual programming in order to meet these demands. In artificial intelligence, Genetic Programming (GP) is an evolutionary algorithm which searches a program space for a solution program. A program generated by GP is executed to yield a solution to the problem at hand. Grammatical Evolution (GE) is a variation of genetic programming. GE adopts a genotype-phenotype distinction and maps from a genotypic space to a phenotypic (program) space to produce a program. Whereas the previous work on object-oriented programming and GP has involved taking an analogy from object-oriented programming to improve the scalability of genetic programming, this dissertation aims at evaluating GP and a variation thereof, namely, GE, for automatic object-oriented programming. The first objective is to implement and test the abilities of GP to automatically generate code for object-oriented programming problems. The second objective is to implement and test the abilities of GE to automatically generate code for object-oriented programming problems. The third objective is to compare the performance of GP and GE for automatic object-oriented programming. Object-Oriented Genetic Programming (OOGP), a variation of OOGP, namely, Greedy OOGP (GOOGP), and GE approaches to automatic object-oriented programming were implemented. The approaches were tested to produce code for three object-oriented programming problems. Each of the object-oriented programming problems involves two classes, one with the driver program and the Abstract Data Type (ADT) class. The results show that both GP and GE can be used for automatic object-oriented programming. However, it was found that the ability of each of the approaches to automatically generate code for object-oriented programming problems decreases with an increase in the problem complexity. The performance of the approaches were compared and statistically tested to determine the effectiveness of each approach. The results show that GE performs better than GOOGP and OOGP.