Loggly’s anomaly detection allows you to find significant changes in event frequency. Anomalies often indicate new problems that require attention, or they can confirm that you fixed a pre-existing problem. For example, you may want to see if there is a big increase in errors after a new code deployment.
Accessing Anomaly Detection
You can access this tool on the search page by selecting the Trends tab on the toolbar, and then selecting Anomalies in the dropdown menu for chart type.
You can also access this tool from the Field Explorer. First select a field to display the values. At the top of the values pane there are a set of field actions. Select “Find Anomalies” from the field actions list.
Using Anomaly Detection
The anomalies trend chart allows you to pick a field to analyze. It then shows you which values of that field have increased or decreased in frequency. It compares a change between your current search time range and a background time range. It also identifies field values that have had the most significant changes and brings them to the top of the list.
The gray part of the bar shows the expected count in your current time range using the average count over the background time range. The actual values are plotted as deltas on the expected bar, and they are colored to show increases in green and decreases in red.
In the example screenshot above, we’re analyzing the Apache status code field. We see that in the last hour the 200 and 500 codes have decreased, whereas the 503 code has increased. The increased 503 code indicates that the server became unavailable. This would prevent the viewers from seeing the website. If this were a popular web store, it would be losing a significant amount of revenue.
This is the time range that will be used to calculate the expected count. We will compare changes relative to the actual count, which is based on your current search time range. The default is one day, but you can also select other ranges in the dropdown. It might be useful to select a different time range here if there are irregularities or cyclical patterns in your data you want to take into account.
You can also split or group the values by another value. In the example below, we are grouping by log level. This helps us see which Java classes have the biggest changes in errors or info messages. We can see that the error rate on the inventoryService is down, and info messages are up. That indicates we recently resolved a problem.
There are also a variety of sort options. The default is significance which picks the values with the biggest changes that also had larger counts overall. You can also sort by percent difference between the actual and expected counts, by the actual count in your current time range, or the expected count from your background time range.
The settings menu is shown as a gear icon. It allows you to control how many bars are displayed, whether to show as log scale, and whether to show the legend.
Common Error Messages
Time range out of bounds – the search period is not contained with the compare against, or background, time range.
Cardinality too high – the split by field has too many values. Currently we can only split by a field with less than 25 unique values.