ASNM-CDX-2009 Dataset

The ASNM-CDX-2009 dataset (Advanced Security Network Metrics & CDX 2009 dataset) consists of ASNM features [1, 2] extracted from tcpdump capture of malicious and legitimate TCP communications on network services which are vulnerable to buffer overflow attacks and are included in CDX-2009 dataset of network traffic dumps, that was introduced in [3]. The final composition of the dataset is depicted in Table 1.



Filtering of CDX 2009 Dataset

The CDX 2009 dataset was created during network warfare competition, in which one of the goal was to generate labeled dataset [3]. CDX-2009 dataset is available from [here]. For the purpose of ASNM extraction, we considered:

data capture by NSA,

data capture outside of west point network border (in tcpdump format),

SNORT intrusion prevention log (as source of ground truth).

We focused only on buffer overflow attacks found in the SNORT log and we performed a match with the packets contained in west point network border capture. Note that buffer overflow attacks were performed only on two services - Postfix Email and Apache Web Server. However, network infrastructure contained 4 servers with 4 vulnerable services (one per each server). These services with internal and external IP addresses of their hosted servers are listed in Table 2. Two types of IP addresses are shown in this table:

external IP addresses – correspond to SNORT log,

internal IP addresses – correspond to tcpdump network capture outside of west point network border.

Specific versions of services (and their vulnerabilities) described in [3] were not announced. We found out that SNORT log can be associated only with data capture outside of west point network border and only with significant timestamps differences – approximately 930 days. At the time of creating the ASNM-CDX-2009 dataset and experiments with it [3], we did not find any association between SNORT log and data capture performed by National Security Agency.






In the SNORT log, we matched exactly 44 buffer overflow attacks with tcpdump capture (of all count of 65). To correctly match SNORT entries, it was necessary to remap IP addresses of external network to internal network (based on Table 2), as SNORT was deployed in external network and tcpdump data capture contains entries from internal network. Buffer overflow attacks, which were matched with data capture, have their content only in two tcpdump files:

2009-04-21-07-47-35.dmp,

2009-04-21-07-47-35.dmp2.

Due to the enormous count of all packets (approx. 4 mil.) in all dump files, we decided to consider only these two files, which contain 1 538 182 packets. We also noticed that network data density was increased in the time when attacks were performed. Consequently, we made another reduction of packets, which filtered enough temporal neighborhood of attacks' occurrences. In the result, we used 204 953 packets for ASNM feature extraction. The final composition of the dataset is depicted in Table 1.

Labeling

The ASNM-CDX-2009 dataset contains 2 types of labels which are enumerated by increasing order of their granularities in the following listing :

The two-class label, denoted as label_2, says whether an actual record represent network buffer overflow attack or not.

The next label, denoted as label_poly, is composed of 2 parts: a) two-class label where legitimate and malicious communications are represented by symbols 0 and 1, respectivelly, and b) acronym of network service. This label represents type of communication on particular network service.

Introducing Paper

ASNM-CDX-2009 dataset was introduced and used in paper [1]. Some unique experiments using this dataset were also performed in Section 6.3 of the dissertation thesis [2].

Download

ASNM-CDX-2009 dataset in CSV format can be downloaded [here].

References

  1. HOMOLIAK Ivan, BARABAS Maros, CHMELAR Petr, DROZD Michal a HANACEK Petr.: ASNM: Advanced Security Network Metrics for Attack Vector Description. In: Proceedings of the 2013 International Conference on Security & Management. Las Vegas: Computer Science Research, Education, and Applications Press, 2013, s. 350-358. ISBN 1-60132-259-3. Download link.

  2. HOMOLIAK Ivan.: Intrusion Detection in Network Traffic. Dissertation thesis, University of Technology Brno, Faculty of Information Technology, 2016. Download link.

  3. SANGSTER Benjamin, O'CONNOR T. J. , COOK Thomas, FANELLI Robert, DEAN Erik, ADAMS William J., MORRELL Chris, CONTI Gregory.: Toward instrumenting network warfare competitions to generate labeled datasets. In: Proceedings of the 2nd Workshop on Cyber Security Experimentation and Test (CSET’09), 2009. Download link.