Unsupervised feature selection for anomaly-based network intrusion detection using cluster validity indices.
Date
2016
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In recent years, there has been a rapid increase in Internet usage, which has in turn led to a
rise in malicious network activity. Network Intrusion Detection Systems (NIDS) are tools
that monitor network traffic with the purpose of rapidly and accurately detecting malicious
activity. These systems provide a time window for responding to emerging threats and
attacks aimed at exploiting vulnerabilities that arise from issues such as misconfigured
firewalls and outdated software.
Anomaly-based network intrusion detection systems construct a profile of legitimate or
normal traffic patterns using machine learning techniques, and monitor network traffic for
deviations from the profile, which are subsequently classified as threats or intrusions. Due
to the richness of information contained in network traffic, it is possible to define large
feature vectors from network packets. This often leads to redundant or irrelevant features
being used in network intrusion detection systems, which typically reduces the detection
performance of the system.
The purpose of feature selection is to remove unnecessary or redundant features in a feature
space, thereby improving the performance of learning algorithms and as a result the
classification accuracy. Previous approaches have performed feature selection via optimization
techniques, using the classification accuracy of the NIDS on a subset of the data
as an objective function. While this approach has been shown to improve the performance
of the system, it is unrealistic to assume that labelled training data is available in operational
networks, which precludes the use of classification accuracy as an objective function
in a practical system.
This research proposes a method for feature selection in network intrusion detection that
does not require any access to labelled data. The algorithm uses normalized cluster validity
indices as an objective function that is optimized over the search space of candidate
feature subsets via a genetic algorithm. Feature subsets produced by the algorithm are
shown to improve the classification performance of an anomaly{based network intrusion
detection system over the NSL-KDD dataset. Despite not requiring access to labelled
data, the classification performance of the proposed system approaches that of efective
feature subsets that were derived using labelled training data.
Description
Master of Science in Computer Engineering. University of KwaZulu-Natal, Durban 2016.
Keywords
Anomaly detection (Computer security), Intrusion detection systems (Computer security), Firewalls (Computer security), Computer networks., Theses -- Computer engineering., Network Intrusion Detection Systems (NIDS).