Network-based Anomaly Detection for SCADA Systems : Traffic Generation and Modeling

Abstract: Supervisory Control and Data Acquisition (SCADA) systems control and monitor critical infrastructure in society, such as electricity transmission and distribution systems. Modern SCADA systems are increasingly adopting open standards and being connected to the Internet to enable remote control. A boost in sophisticated attacks against SCADA systems makes SCADA security a pressing issue. An Intrusion Detection System (IDS) is a security countermeasure that monitors a network and tracks unauthenticated activities inside the network. Most commercial IDSs used in general IT systems are signature-based, by which an IDS compares the system behaviors with known attack patterns. Unfortunately, recent attacks against SCADA systems exploit zero-day vulnerabilities which are undetectable by signature-based IDSs. This thesis aims to enhance SCADA system monitoring by network-based anomaly detection that models normal behaviors and finds deviations from the model. With network-based anomaly detection, zero-day attacks are possible to detect. There are two main challenges for network-based anomaly detection. The first challenge is the potentially large number of false positives coming from benign traffic that just deviates from the trained model due to the noises. To address this challenge, this thesis proposes several traffic modeling approaches based on statistics and machine learning techniques for the regular communication patterns in SCADA traffic. The second challenge is the lack of open datasets to evaluate the proposed approaches. Consequently, this thesis proposes a traffic generation framework. For traffic modeling, this thesis first categorises SCADA traffic into two groups, request-response and non-requested traffic, and studies data collected in a diverse set of protocol for-mats (Modbus, Siemens S7, S7+, MMS, IEC-60870-5-104). The request-response traffic is generated by a polling mechanism. For this type of traffic, we model the inter-arrival times for each request and response pair with a statistical approach. Results presented in this thesis show that request-response traffic exists in several SCADA traffic sets collected from systems with different sizes and settings. The proposed statistical approach for request-response traffic can detect attacks having subtle changes in timing. The non-requested traffic is generated by remote terminal units at predefined times or when they see significant changes in measurement values. For this type of traffic, we first use a pattern mining approach to find the timing characteristics of the data. Then, we model the suggested attributes with machine learning approaches. We test our anomaly detection model with two types of attacks. One causes persistent anomalies and another only causes intermittent ones. Our anomaly detector exhibits a 100% detection rate with at most 0.5% false positive rate for the attacks with persistent anomalies. For the attacks with intermittent anomalies, we find our approach effective when anomalous patterns last for a longer period (over 30 minutes). For traffic generation, this thesis conducts a comparative analysis between network traces collected from testbeds and a real power utility. The analysis shows that the testbed traffic may be prone to overly regular patterns. This is considered to be the result of lack of plausible human interactions within the testbed. Therefore, this thesis proposes a traffic generation framework built upon a virtual testbed. The framework provides programmable BOTs to mimic human activities such as commands from the operators and attacks. 

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.