1Performance Analysis 2==================== 3 4There are many potential causes for for performance issues. In this section we 5will guide you through some options. The first part will cover basic steps and 6introduce some helpful tools. The second part will cover more in-depth 7explanations and corner cases. 8 9System Load 10----------- 11 12The first step should be to check the system load. Run a top tool like **htop** 13to get an overview of the system load and if there is a bottleneck with the 14traffic distribution. For example if you can see that only a small number of 15cpu cores hit 100% all the time and others don't, it could be related to a bad 16traffic distribution or elephant flows like in the screenshot where one core 17peaks due to one big elephant flow. 18 19.. image:: analysis/htopelephantflow.png 20 21If all cores are at peak load the system might be too slow for the traffic load 22or it might be misconfigured. Also keep an eye on memory usage, if the actual 23memory usage is too high and the system needs to swap it will result in very 24poor performance. 25 26The load will give you a first indication where to start with the debugging at 27specific parts we describe in more detail in the second part. 28 29Logfiles 30-------- 31 32The next step would be to check all the log files with a focus on **stats.log** 33and **suricata.log** if any obvious issues are seen. The most obvious indicator 34is the **capture.kernel_drops** value that ideally would not even show up but 35should be below 1% of the **capture.kernel_packets** value as high drop rates 36could lead to a reduced amount of events and alerts. 37 38If **memcap** is seen in the stats the memcap values in the configuration could 39be increased. This can result to higher memory usage and should be taken into 40account when the settings are changed. 41 42Don't forget to check any system logs as well, even a **dmesg** run can show 43potential issues. 44 45Suricata Load 46------------- 47 48Besides the system load, another indicator for potential performance issues is 49the load of Suricata itself. A helpful tool for that is **perf** which helps 50to spot performance issues. Make sure you have it installed and also the debug 51symbols installed for Suricata or the output won't be very helpful. This output 52is also helpful when you report performance issues as the Suricata Development 53team can narrow down possible issues with that. 54 55:: 56 57 sudo perf top -p $(pidof suricata) 58 59If you see specific function calls at the top in red it's a hint that those are 60the bottlenecks. For example if you see **IPOnlyMatchPacket** it can be either 61a result of high drop rates or incomplete flows which result in decreased 62performance. To look into the performance issues on a specific thread you can 63pass **-t TID** to perf top. In other cases you can see functions that give you 64a hint that a specific protocol parser is used a lot and can either try to 65debug a performance bug or try to filter related traffic. 66 67.. image:: analysis/perftop.png 68 69In general try to play around with the different configuration options that 70Suricata does provide with a focus on the options described in 71:doc:`high-performance-config`. 72 73Traffic 74------- 75 76In most cases where the hardware is fast enough to handle the traffic but the 77drop rate is still high it's related to specific traffic issues. 78 79Basics 80^^^^^^ 81 82Some of the basic checks are: 83 84- Check if the traffic is bidirectional, if it's mostly unidirectional you're 85 missing relevant parts of the flow (see **tshark** example at the bottom). 86 Another indicator could be a big discrepancy between SYN and SYN-ACK as well 87 as RST counter in the Suricata stats. 88 89- Check for encapsulated traffic, while GRE, MPLS etc. are supported they could 90 also lead to performance issues. Especially if there are several layers of 91 encapsulation. 92 93- Use tools like **iftop** to spot elephant flows. Flows that have a rate of 94 over 1Gbit/s for a long time can result in one cpu core peak at 100% all the 95 time and increasing the droprate while it might not make sense to dig deep 96 into this traffic. 97 98- Another approach to narrow down issues is the usage of **bpf filter**. For 99 example filter all HTTPS traffic with **not port 443** to exclude traffic 100 that might be problematic or just look into one specific port **port 25** if 101 you expect some issues with a specific protocol. See :doc:`ignoring-traffic` 102 for more details. 103 104- If VLAN is used it might help to disable **vlan.use-for-tracking** in 105 scenarios where only one direction of the flow has the VLAN tag. 106 107Advanced 108^^^^^^^^ 109 110There are several advanced steps and corner cases when it comes to a deep dive 111into the traffic. 112 113If VLAN QinQ (IEEE 802.1ad) is used be very cautious if you use **cluster_qm** 114in combination with Intel drivers and AF_PACKET runmode. While the RFC expects 115ethertype 0x8100 and 0x88A8 in this case (see 116https://en.wikipedia.org/wiki/IEEE_802.1ad) most implementations only add 1170x8100 on each layer. If the first seen layer has the same VLAN tag but the 118inner one has different VLAN tags it will still end up in the same queue in 119**cluster_qm** mode. This was observed with the i40e driver up to 2.8.20 and 120the firmare version up to 7.00, feel free to report if newer versions have 121fixed this (see https://suricata-ids.org/support/). 122 123 124If you want to use **tshark** to get an overview of the traffic direction use 125this command: 126 127:: 128 129 sudo tshark -i $INTERFACE -q -z conv,ip -a duration:10 130 131The output will show you all flows within 10s and if you see 0 for one 132direction you have unidirectional traffic, thus you don't see the ACK packets 133for example. Since Suricata is trying to work on flows this will have a rather 134big impact on the visibility. Focus on fixing the unidirectional traffic. If 135it's not possible at all you can enable **async-oneside** in the **stream** 136configuration setting. 137 138Check for other unusual or complex protocols that aren't supported very well. 139You can try to filter those to see if it has any impact on the performance. In 140this example we filter Cisco Fabric Path (ethertype 0x8903) with the bpf filter 141**not ether proto 0x8903** as it's assumed to be a performance issue (see 142https://redmine.openinfosecfoundation.org/issues/3637) 143 144Elephant Flows 145^^^^^^^^^^^^^^ 146 147The so called Elephant Flows or traffic spikes are quite difficult to deal 148with. In most cases those are big file transfers or backup traffic and it's not 149feasible to decode the whole traffic. From a network security monitoring 150perspective it's often enough to log the metadata of that flow and do a packet 151inspection at the beginning but not the whole flow. 152 153If you can spot specific flows as described above then try to filter those. The 154easiest solution would be a bpf filter but that would still result in a 155performance impact. Ideally you can filter such traffic even sooner on driver 156or NIC level (see eBPF/XDP) or even before it reaches the system where Suricata 157is running. Some commercial packet broker support such filtering where it's 158called **Flow Shunting** or **Flow Slicing**. 159 160Rules 161----- 162 163The Ruleset plays an important role in the detection but also in the 164performance capability of Suricata. Thus it's recommended to look into the 165impact of enabled rules as well. 166 167If you run into performance issues and struggle to narrow it down start with 168running Suricata without any rules enabled and use the tools again that have 169been explained at the first part. Keep in mind that even without signatures 170enabled Suricata still does most of the decoding and traffic analysis, so a 171fair amount of load should still be seen. If the load is still very high and 172drops are seen and the hardware should be capable to deal with such traffic 173loads you should deep dive if there is any specific traffic issue (see above) 174or report the performance issue so it can be investigated (see 175https://suricata-ids.org/support/). 176 177Suricata also provides several specific traffic related signatures in the rules 178folder that could be enabled for testing to spot specific traffic issues. Those 179are found the **rules** and you should start with **decoder-events.rules**, 180**stream-events.rules** and **app-layer-events.rules**. 181 182It can also be helpful to use :doc:`rule-profiling` and/or 183:doc:`packet-profiling` to find problematic rules or traffic pattern. This is 184achieved by compiling Suricata with **--enable-profiling** but keep in mind 185that this has an impact on performance and should only be used for 186troubleshooting. 187