Traffic Classification and Deep Packet Inspection

October 4, 2016
DPI
Traffic Classification and Deep Packet Inspection
Network traffic is complex, consisting of a multitude of applications, protocols, and services. Many of these applications have unique requirements for network characteristics such as speed, latency, and jitter. Meeting these requirements is essential for applications to run quickly and reliably, and for users to be satisfied with the quality of service.

While there are no problems with local area networks (LANs) due to their high bandwidth, the limited bandwidth of Internet access channels (WANs) requires fine tuning.

Classification and marking

The provision of Internet access services is user-oriented, and the most important thing is the user’s perception of the quality of service: the Internet should not “slow down”, applications should respond quickly to commands, files should download quickly, and voice calls should not stutter, otherwise the user will start looking for another service provider. Internet access channel traffic management, restriction settings, and priorities must meet user requirements. Information about protocols and applications also allows the administrator to implement security policies to protect network users.

Traffic classification is the first step in identifying the various applications and protocols transmitted over the network. The second step is to manage, optimize, and prioritize this traffic. After classification, all packets are marked according to their belonging to a specific protocol or application, which allows network devices to apply service policies (QoS) based on these labels and flags.

Key concepts: classification – identification of applications or protocols; marking – the process of tagging packets for the application of service policies on equipment.

classification methods

There are two main methods of traffic classification:

  • Payload-based classification. This is based on data block fields such as OSI Layer 4 ports (sender and recipient or both). This method is the most common, but it does not work with encrypted and tunneled traffic.
  • Statistical-based classification. This is based on analyzing traffic behavior (time between packets, session time, etc.).

A universal approach to traffic classification is based on information in the IP packet header—usually the IP address (Layer 3), MAC address (Layer 2), and protocol used. This approach has limited capabilities, since information is taken only from the IP header, just as Layer 4 methods are limited—after all, not all applications use standard ports.

A more sophisticated classification can be achieved through deep packet inspection (DPI). This method is the most accurate and reliable, so let’s take a closer look at it.

Deep Packet Inspection

Deep packet inspection systems allow you to classify applications and protocols that cannot be identified at Layer 3 and Layer 4, such as URLs within packets, messenger message content, Skype voice traffic, and BitTorrent p2p packets.

The main mechanism for identifying applications in DPI is signature analysis . Each application has its own unique characteristics, which are entered into a signature database. Comparing a sample from the database with the analyzed traffic allows the application or protocol to be accurately identified. However, since new applications appear periodically, the signature database must also be updated to ensure high identification accuracy.

signatures

There are several methods of signature analysis:

  • Pattern analysis.
  • Numerical analysis.
  • Behavioral analysis.
  • Heuristic analysis.
  • Protocol/state analysis.

Pattern analysis

Some applications contain specific samples (bytes/characters/strings) in the packet data block that can be used for identification and classification. These samples can be located anywhere in the data block; this does not affect the identification process. However, since not every packet contains an application sample, this method does not always work.

Numerical analysis

Numerical analysis studies the quantitative characteristics of packets, such as data block size, packet response time, and the interval between packets. For example, the old version of Skype (prior to version 2.0) was well suited to this type of analysis because the request from the client was 18 bytes in size and the response it received was 11 bytes. Since the analysis can be distributed across the packets of a chain of stores, the classification decision could take longer. Simultaneous analysis of multiple packets takes quite a long time, which makes this method less than ideal.

Behavioral and heuristic analysis

This method is based on the behavior of the traffic generated by a running application. While the application is running, it generates dynamic traffic that can also be identified and marked. For example, BitTorrent generates traffic with a specific sequence of packets that have the same characteristics (incoming and outgoing port, packet size, number of sessions opened per unit of time), which can be classified according to a behavioral (heuristic) model.

Behavioral and heuristic analysis are usually used together, and many antivirus programs use these methods to identify viruses and worms.

Protocol/status analysis

The protocols of some applications are a sequence of specific actions. Analyzing such sequences allows you to identify the application with sufficient accuracy. For example, a GET request from an FTP client must be followed by a corresponding response from the server.

More and more applications on the Internet are starting to use traffic encryption mechanisms, which creates major problems for any classification method. The DPI system cannot look inside an encrypted packet to analyze its contents, so the main methods for identifying such traffic are behavioral and heuristic analysis, but even these cannot identify all applications. The latest mechanism, which uses both of these methods simultaneously, is called clustering, and only it allows encrypted traffic to be identified.

Since none of the methods described above provides 100% traffic classification on its own, the best practice is to use all of them simultaneously.

Traffic classification with the subsequent application of quality of service policies is one of the most important tasks for any telecommunications operator. The use of modern DPI systems allows this task to be performed with maximum efficiency and productivity.