Machine Learning for Network-based Intrusion Detection Systems: an Analysis of the CIDDS-001 Dataset
With the increasing amount of reliance on digital data and computer networks by corporations and the public in general, the occurrence of cyber attacks has become a great threat to the normal functioning of our society. Intrusion detection systems seek to address this threat by preemptively detecting attacks in real time while attempting to block them or minimizing their damage. These systems can function in many ways being some of them based on artificial intelligence methods. Datasets containing both normal network traffic and cyber attacks are used for training these algorithms so that they can learn the underlying patterns of network-based data. The CIDDS-001 is one of the most used datasets for network-based intrusion detection research. Regarding this dataset, in the majority of works published so far, the Class label was used for training machine learning algorithms. However, there is another label in the CIDDS-001, AttackType, that seems very promising for this purpose and remains considerably unexplored. This work seeks to make a comparison between two machine learning models, K-Nearest Neighbours and Random Forest, which were trained with both these labels in order to ascertain whether AttackType can produce reliable results in comparison with the Class label.
READ FULL TEXT