1Performance Analysis
2====================
3
4There are many potential causes for for performance issues. In this section we
5will guide you through some options. The first part will cover basic steps and
6introduce some helpful tools. The second part will cover more in-depth
7explanations and corner cases.
8
9System Load
10-----------
11
12The first step should be to check the system load. Run a top tool like **htop**
13to get an overview of the system load and if there is a bottleneck with the
14traffic distribution. For example if you can see that only a small number of
15cpu cores hit 100% all the time and others don't, it could be related to a bad
16traffic distribution or elephant flows like in the screenshot where one core
17peaks due to one big elephant flow.
18
19.. image:: analysis/htopelephantflow.png
20
21If all cores are at peak load the system might be too slow for the traffic load
22or it might be misconfigured. Also keep an eye on memory usage, if the actual
23memory usage is too high and the system needs to swap it will result in very
24poor performance.
25
26The load will give you a first indication where to start with the debugging at
27specific parts we describe in more detail in the second part.
28
29Logfiles
30--------
31
32The next step would be to check all the log files with a focus on **stats.log**
33and **suricata.log** if any obvious issues are seen. The most obvious indicator
34is the **capture.kernel_drops** value that ideally would not even show up but
35should be below 1% of the **capture.kernel_packets** value as high drop rates
36could lead to a reduced amount of events and alerts.
37
38If **memcap** is seen in the stats the memcap values in the configuration could
39be increased. This can result to higher memory usage and should be taken into
40account when the settings are changed.
41
42Don't forget to check any system logs as well, even a **dmesg** run can show
43potential issues.
44
45Suricata Load
46-------------
47
48Besides the system load, another indicator for potential performance issues is
49the load of Suricata itself.  A helpful tool for that is **perf** which helps
50to spot performance issues. Make sure you have it installed and also the debug
51symbols installed for Suricata or the output won't be very helpful. This output
52is also helpful when you report performance issues as the Suricata Development
53team can narrow down possible issues with that.
54
55::
56
57    sudo perf top -p $(pidof suricata)
58
59If you see specific function calls at the top in red it's a hint that those are
60the bottlenecks. For example if you see **IPOnlyMatchPacket** it can be either
61a result of high drop rates or incomplete flows which result in decreased
62performance. To look into the performance issues on a specific thread you can
63pass **-t TID** to perf top. In other cases you can see functions that give you
64a hint that a specific protocol parser is used a lot and can either try to
65debug a performance bug or try to filter related traffic.
66
67.. image:: analysis/perftop.png
68
69In general try to play around with the different configuration options that
70Suricata does provide with a focus on the options described in
71:doc:`high-performance-config`.
72
73Traffic
74-------
75
76In most cases where the hardware is fast enough to handle the traffic but the
77drop rate is still high it's related to specific traffic issues.
78
79Basics
80^^^^^^
81
82Some of the basic checks are:
83
84- Check if the traffic is bidirectional, if it's mostly unidirectional you're
85  missing relevant parts of the flow (see **tshark** example at the bottom).
86  Another indicator could be a big discrepancy between SYN and SYN-ACK as well
87  as RST counter in the Suricata stats.
88
89- Check for encapsulated traffic, while GRE, MPLS etc. are supported they could
90  also lead to performance issues. Especially if there are several layers of
91  encapsulation.
92
93- Use tools like **iftop** to spot elephant flows. Flows that have a rate of
94  over 1Gbit/s for a long time can result in one cpu core peak at 100% all the
95  time and increasing the droprate while it might not make sense to dig deep
96  into this traffic.
97
98- Another approach to narrow down issues is the usage of **bpf filter**. For
99  example filter all HTTPS traffic with **not port 443** to exclude traffic
100  that might be problematic or just look into one specific port **port 25** if
101  you expect some issues with a specific protocol. See :doc:`ignoring-traffic`
102  for more details.
103
104- If VLAN is used it might help to disable **vlan.use-for-tracking** in
105  scenarios where only one direction of the flow has the VLAN tag.
106
107Advanced
108^^^^^^^^
109
110There are several advanced steps and corner cases when it comes to a deep dive
111into the traffic.
112
113If VLAN QinQ (IEEE 802.1ad) is used be very cautious if you use **cluster_qm**
114in combination with Intel drivers and AF_PACKET runmode. While the RFC expects
115ethertype 0x8100 and 0x88A8 in this case (see
116https://en.wikipedia.org/wiki/IEEE_802.1ad) most implementations only add
1170x8100 on each layer. If the first seen layer has the same VLAN tag but the
118inner one has different VLAN tags it will still end up in the same queue in
119**cluster_qm** mode. This was observed with the i40e driver up to 2.8.20 and
120the firmare version up to 7.00, feel free to report if newer versions have
121fixed this (see https://suricata-ids.org/support/).
122
123
124If you want to use **tshark** to get an overview of the traffic direction use
125this command:
126
127::
128
129    sudo tshark -i $INTERFACE -q -z conv,ip -a duration:10
130
131The output will show you all flows within 10s and if you see 0 for one
132direction you have unidirectional traffic, thus you don't see the ACK packets
133for example. Since Suricata is trying to work on flows this will have a rather
134big impact on the visibility. Focus on fixing the unidirectional traffic. If
135it's not possible at all you can enable **async-oneside** in the **stream**
136configuration setting.
137
138Check for other unusual or complex protocols that aren't supported very well.
139You can try to filter those to see if it has any impact on the performance.  In
140this example we filter Cisco Fabric Path (ethertype 0x8903) with the bpf filter
141**not ether proto 0x8903** as it's assumed to be a performance issue (see
142https://redmine.openinfosecfoundation.org/issues/3637)
143
144Elephant Flows
145^^^^^^^^^^^^^^
146
147The so called Elephant Flows or traffic spikes are quite difficult to deal
148with. In most cases those are big file transfers or backup traffic and it's not
149feasible to decode the whole traffic. From a network security monitoring
150perspective it's often enough to log the metadata of that flow and do a packet
151inspection at the beginning but not the whole flow.
152
153If you can spot specific flows as described above then try to filter those. The
154easiest solution would be a bpf filter but that would still result in a
155performance impact. Ideally you can filter such traffic even sooner on driver
156or NIC level (see eBPF/XDP) or even before it reaches the system where Suricata
157is running. Some commercial packet broker support such filtering where it's
158called **Flow Shunting** or **Flow Slicing**.
159
160Rules
161-----
162
163The Ruleset plays an important role in the detection but also in the
164performance capability of Suricata. Thus it's recommended to look into the
165impact of enabled rules as well.
166
167If you run into performance issues and struggle to narrow it down start with
168running Suricata without any rules enabled and use the tools again that have
169been explained at the first part. Keep in mind that even without signatures
170enabled Suricata still does most of the decoding and traffic analysis, so a
171fair amount of load should still be seen. If the load is still very high and
172drops are seen and the hardware should be capable to deal with such traffic
173loads you should deep dive if there is any specific traffic issue (see above)
174or report the performance issue so it can be investigated (see
175https://suricata-ids.org/support/).
176
177Suricata also provides several specific traffic related signatures in the rules
178folder that could be enabled for testing to spot specific traffic issues. Those
179are found the **rules** and you should start with **decoder-events.rules**,
180**stream-events.rules** and **app-layer-events.rules**.
181
182It can also be helpful to use :doc:`rule-profiling` and/or
183:doc:`packet-profiling` to find problematic rules or traffic pattern. This is
184achieved by compiling Suricata with **--enable-profiling** but keep in mind
185that this has an impact on performance and should only be used for
186troubleshooting.
187