From classical methods to neural networks: Exploring the potential of Deep Learning in identifying obfuscated traffic

October 30, 2024
Telecom
From classical methods to neural networks: Exploring the potential of Deep Learning in identifying obfuscated traffic
Network traffic analysis and classification have become essential for maintaining the resilience and security of contemporary computer networks. With the rapid increase in data volumes and the growing complexity of encryption methods, the need for effective network flow classification continues to rise. By identifying, categorizing, and analyzing network traffic accurately, organizations can detect potential threats, optimize network performance, and ensure compliance with security protocols.

Traditional Methods for Network Traffic Analysis

Classification of network traffic using traditional methods involves various approaches, each with its own strengths and weaknesses. Let’s examine the main methods and their limitations when dealing with obfuscated and encrypted traffic.

1. Server Name Indication (SNI) Method

The SNI method is based on analyzing the domain information that an encrypted connection transmits in plaintext when establishing a TLS session. Since the domain name is specified in the “Server Name” header during the TLS handshake, this method enables the identification of servers and services even if subsequent traffic is encrypted.

Limitations of the SNI Method:

  • Insufficient accuracy with port obfuscation and address translation: When IP addresses and ports are modified or obfuscated, accuracy decreases because the link between the SNI and a specific application can be disrupted.
  • Inability to identify when using VPNs: The SNI header becomes unavailable for analysis if traffic passes through a VPN, as it is hidden by tunnel encryption.
  • Lack of data for all protocols: Not all protocols and applications transmit data over TLS, making SNI-based analysis inapplicable to them.

2. Payload Inspection

Payload inspection involves a detailed analysis of packet contents to identify patterns and characteristics specific to a protocol or application. This method provides high accuracy in determining data types and classifying them based on content.

Limitations of Payload Inspection:

  • Computational resource costs: Payload inspection requires significant resources due to the need to examine each packet’s content.
  • Privacy issues: Full access to packet data raises privacy concerns, especially when working with personal or corporate data.
  • Inability to analyze encrypted traffic: Encryption of traffic (TLS or VPN) makes payload inspection impossible, reducing the effectiveness of this method in modern environments where a significant portion of traffic is encrypted.

3. Statistical Machine Learning Methods

Statistical machine learning methods classify traffic based on various metrics and characteristics (such as packet sizes, frequency, and time intervals). Models can be trained on statistical data, allowing for effective identification of certain types of traffic in some cases.

Limitations of Statistical Machine Learning Methods:

  • Need for clean and labeled data: For successful operation, statistical learning models require high-quality labeled data, which is challenging to collect, especially for less common protocols.
  • Resource-intensive: This method requires significant computational resources, slowing down the analysis in cases of large data volumes.
  • Low effectiveness in the presence of traffic obfuscation: Protocols that mask their metadata or continuously change traffic patterns can complicate analysis, leading to low accuracy from statistical models.

As a result, although traditional methods may exhibit high accuracy in some cases, they face numerous limitations, making it challenging to classify modern traffic types.


Neural Network Approach to Identifying Obfuscated Network Traffic

Our research explores deep learning as a more accurate and flexible alternative to traditional methods. We implemented models based on convolutional neural networks (CNN) and the ResNet architecture, adapting them for high-precision classification of encrypted VPN and proxy traffic.

Data

A Netflow 10 (IPFIX) dataset was used for traffic classification, designed to standardize the transmission of IP information from exporter to collector, supported by manufacturers such as Cisco, Solera, VMware, and Citrix. IPFIX specifications are provided in RFCs 7011–7015 and RFC 5103.

Data Collection

Data was collected using a device with a deep packet inspection (DPI) system connected to other devices generating traffic over various VPNs. This approach captured unique IPs and ports generated by VPNs with dynamic assignments under restrictions, resulting in a rich array of unique IP and port combinations for training the neural network model.

The collected data included the following parameters:

Data Type Description
octet_delta_count Incoming counter of length N x 8 bits for the number of bytes associated with the IP flow.
packet_delta_count Incoming packet counter of length N x 8 bits for the number of packets associated with the IP flow.
protocol_identifier IP protocol byte.
ip_class_of_service IP class or service.
source_port Sender’s port.
source_ipv4 Sender’s IPv4.
destination_port Recipient’s port.
destination_ipv4 Recipient’s IPv4.
bgp_source_as_number Source BGP autonomous system number (N can be 2 or 4).
bgp_destination_as_number Destination BGP autonomous system number (N can be 2 or 4).
input_snmp Virtual LAN identifier associated with the incoming interface.
output_snmp Virtual LAN identifier associated with the outgoing interface.
ip_version IPv4 or IPv6 protocol version.
post_nat_source_ipv4 Source NAT IPv4.
post_nat_source_port Source NAT port.
frgmt_delta_packs Delta of fragmented packets.
repeat_delta_pack Delta of retransmissions.
packet_deliver_time Delay (RTT/2), ms.
protocol_code Protocol code using autonomous system class for the neural network.

Data Processing Before Training

The data was split into training (80%) and testing (20%) sets. Class balancing adjustments and IPFIX data labeling were applied to highlight specific classes.

Training

The neural networks were trained using two architectures with hyperparameter tuning. The protocol class ratios in the training sample were:

Protocol Ratio
DNS 18.67%
HTTP 1.38%
HTTPS 16.27%
DoH 2.66%
ICMP 4.83%
Bittorrent 24.73%
AdGuard VPN 2.34%
VPN Unlimited 12.18%
Psiphon 3 12.41%
Lantern 4.53%

Testing

Models were evaluated on the test set using precision, recall, and F1 score metrics:

Recall=TPTP+FN\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}

Precision=TPTP+FP\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}

F1 Score=2×Recall×PrecisionRecall+Precision\text{F1 Score} = \frac{2 \times \text{Recall} \times \text{Precision}}{\text{Recall} + \text{Precision}}

where TP denotes true positives, FN false negatives, and FP false positives.

The experiment was conducted on VPNs with a wide IP range to enhance result objectivity. The ResNet architecture model demonstrated higher accuracy in classifying VPN protocols.

Results

Classical Convolutional Neural Network
Protocol TP FP FN F1 Score
AdGuard VPN 28 9 50 0.49
VPN Unlimited 3 3 22 0.21
Psiphon 3 8455 160 399 0.97
ResNet Architecture
Protocol TP FP FN F1 Score
AdGuard VPN 60 5 18 0.84
VPN Unlimited 5 9 20 0.26
Psiphon 3 8847 1030 7 0.95

The ResNet architecture showed higher efficiency in identifying VPN traffic and can serve as a reliable foundation for encrypted traffic classification tasks.

Conclusion

In this article, we examined obfuscated traffic identification methods, covering both classical and neural network approaches. While traditional methods provide basic capabilities, they have limitations in dynamic traffic and encryption environments. Modern neural networks offer greater accuracy and flexibility, effectively identifying obfuscated traffic even when traditional methods prove ineffective. Thus, the shift to neural network approaches marks a significant step forward in network security.

We use cookies to optimize site functionality and give you the best possible experience. To learn more about the cookies we use, please visit our Cookies Policy. By clicking ‘Okay’, you agree to our use of cookies. Learn more.