In this article we will look at two key components of network monitoring, Stingray Service Gateway (a platform based on Deep Packet Inspection technology) and the QoE (Quality of Experience) module, and examine a practical case study on finding network problems.
Stingray Service Gateway (SSG) provides the ability to analyze traffic in real time. It allows not only to monitor the connection establishment rate and the number of losses for TCP connections, but also to identify bottlenecks that are difficult to detect with standard monitoring tools.
SSG can be installed in the gap (Inline mode) or on the traffic mirror (Mirror mode).
Inline mode provides connection of the device in the gap of active links between two routers and is the recommended installation scheme providing full functionality of the system,
Inline mode
while the mirror installation mode limits functionality to filtering traffic to comply with legislation and collecting statistics.
Mirror mode
The QoE analytics module is aimed at evaluating end-user quality of service based on data received from DPI. It evaluates various aspects of the user experience, such as latency, data rate, application protocol traffic patterns and packet loss rates, allowing you to react quickly to problems and optimize network performance to ensure a high level of quality of service.
The QoE module collects the following metrics:
- Round Trip Time (RTT) metrics;
- Number of re-requests metrics;
- Number of sessions, devices, agents, IP addresses per subscriber;
- Traffic distribution by application and transport protocols;
- Distribution of traffic by direction and AS;
- Clickstream for each subscriber (SNI, CN, URL).
How is the network monitored with SSG and QoE?
Network monitoring using SSG and QoE module technologies is a procedure that includes several steps:
1. Traffic capture and balancing
Traffic is extracted from the network core and sent to the SSG Load Balancer, which distributes the load evenly among several SSG servers. The balancer can handle up to 800 Gbps of mirrored traffic.
You can learn more about Load Balancer operation in our knowledge base
2. Collecting and analyzing statistics
IPFIX statistics (NetFlow v10) from each SSG server is collected and transmitted to a cluster of QoE servers, where the information is accumulated and stored with the ability to customize storage time.
These statistics contain information about IPDR (Internet Protocol Detail Record), which includes RTT (round-trip time) and number of retransmits.
Retransmits – retransmission of packets in case of packet loss
This data is used to monitor and determine the quality of communication services.
After collection and analysis of statistics, reports are generated and the results are visualized for easy interpretation and decision-making by network administrators. The obtained data allows to promptly identify and solve network problems, optimize its operation and improve the level of service for end users.
The results of monitoring and analyzing statistics can also be used to forecast the load on the network and plan its scaling in the future.
Testing QoE efficiency – a practical case study
Let’s consider a real scenario of statistics monitoring with the QoE module.
Description of QoE connection
Traffic from the customer’s virtual infrastructure was mirrored to the BareMetal server port with SSG.
The only task of the SSG was to collect Netflow v10 statistics using custom fields including RTT and Retransmit information for TCP sessions. These statistics were delivered to a standalone virtual machine on which the QoE Stor statistics collection module was deployed.
Tests performed
1. Checking the correspondence between the percentage of retransmits and the actual configured losses on the host
For the test, a host was used on which Linux TC was used to set parameters that artificially drop 30% of the incoming traffic on the interface.
#tc qdisc add dev eth0 root netem loss 30% #tc qdisc show dev eth0 qdisc netem 8003: root refcnt 2 limit 1000 loss 30%
The test produced QoE analytics results indicating a 30% retransmit rate and also matching on RTT.
These results indicate correct operation of the QoE analytics module.
By default, the time of statistics aggregation is 15 minutes, this parameter can be reduced to 1 minute, but then the volume of accumulated data will increase.
2. Improving connectivity with the host by changing the traffic route
One of the hosts was experiencing retransmits on a TCP session.
At 13:00, a rerouting of the outgoing traffic for the problematic prefix to a different path was performed and the switch resulted in the disappearance of retransmits and improved RTT. These metrics were also confirmed by the statistics obtained from the QoE.
Let’s take a visual look at the traffic path changes:
“BEFORE”
“AFTER”
These results also indicate the correct operation of the QoE analytics module and the ability to detect connectivity problems on the Internet.
Conclusions
The tested functionality allows to identify bottlenecks in the network and can be a monitoring tool for timely detection and proactive work with problem areas.
Using API it is possible to integrate SSG into the current monitoring system to control the moments of connectivity degradation on the network.