Ultimate Guide to Logging

Your open-source resource for understanding, analyzing, and troubleshooting system logs

Troubleshoot with Apache Logs

There are several ways for a website to break, and when they do, you’ll want to have the information needed to quickly bring it back online. If you or your users experience an error, you may want to find out the following.

  • What page the user was on
  • What the status code was (i.e., 404)
  • How much load the server was under at the time
  • How long it took the page to load
  • Any unusual traffic patterns
  • What browser (or user agent) the user had

In this section, we’ll show how you can use your Apache logs to answer these questions and troubleshoot your websites faster.

Are There Too Many Errors?

Apache logs have errors from two sources: the error log and error status codes in the access log. HTTP status codes that are errors are 400 or above (see the overview in the What To Log In Apache section). Most often people want to see the count of errors, especially in proportion to the overall traffic. If the proportion of errors is too high, you know something is wrong. First, you have to centralize the logs from all relevant hosts. Then, if you are using Unix tools, you can run this type of command to parse out the status code, count them, and then sort them descending by count.

$ cat access.log | cut -d ' ' -f 9 | sort | uniq -c | sort -nr
     941 200
     292 500
     290 404
     50 401
     20 400

Log management systems can give you these counts in a single click, so you don’t have to worry about parsing and counting them yourself. Instead of the above work, it should give quick summaries and visualizations of the error counts in many different formats, such as tables or pie charts. This screenshot was generated from the SolarWinds® Loggly® dynamic field explorer and chart creator. Here we can quickly see a large percentage of 403 and 404 responses, which is too many errors.

Frequency of HTTP status codes shown in the Loggly dynamic field explorer, and as a pie chart.

What is Causing 404s?

A 404 error is defined as a missing file or resource. Looking at the request URI will tell you which one it is. You can then check your deployment to make sure a file hasn’t accidentally been deleted or removed. If it has, you can add it back in and redeploy the site. If you’re using unix command line tools, you can use grep to find the 404s, then cut to extract the URL, then uniq, and sort to summarize the list.

$ grep " 404 " access.log | cut -d ' ' -f 7 | sort | uniq -c | sort -nr
     17 /manager/html
      3 /robots.txt
      3 /phpMyAdmin/scripts/setup.php
      3 //myadmin/scripts/setup.php
      3 /favicon.ico

Log management systems will automatically parse the Apache log lines, allowing you to search or filter down on 404 errors, and then summarize a count of the results. In the example below, you can see this as a table or bar chart.

URLs resulting in an HTTP 404 status code as shown in the Loggly dynamic field explorer and as a bar chart.

If you’re running a live site, a feature will often work fine for some browsers but not others. It can be helpful to see breakdown of errors by browser. For example, if you know you’re dealing with a problem that occurs only with Internet Explorer, you will prioritize it differently. You can also focus on that browser when troubleshooting. You can get this summary of errors using the Loggly dynamic field explorer. Just filter your logs on a specific status code (404 or 500, for example) and click on the userAgent field. If you are running an API, it’s helpful to see which client libraries may have issues. The Apache logs can aid in troubleshooting issues with client libraries or agents, and even show you which are most popular. When you’re using unix command line tools, you can extract the top user agents using this type of command:

$ cat access.log.1 | cut -d '"' -f 6 | sort | uniq -c | sort -nr
     51 -
     47 ZmEu
     31 () { :;};/usr/bin/perl -e 'print 
     14 Apache/2.2.22 (Ubuntu) (internal dummy connection)
     11 Mozilla/5.0

Log management systems will automatically centralize, parse, and analyze these counts for you. You can show them in a table or bar chart format. You can see the top user agent here is ZmEu, which is a vulnerability scanner looking for weaknesses in PHP. If we were concerned, a quick solution would be to block that IP in our firewall.

List and frequency of user agents shown in the dynamic field explorer, and as a bar chart.

Site Loading Too Slowly?

Many users won’t tolerate even a minor slowdown; they’ll get frustrated and complain, or stop using your service. You should continuously monitor response times to make sure your server is running fast and that its performance is consistent over time. You can do this by tracking the response time captured in your Apache logs. If you have a REST API, the same approach can be used to track the performance of API calls, in terms of both speed and consistency. This can help you track and meet SLAs for internal teams or apps that depend on your service. When you’re using unix command line tools, you can extract the request time field, and then use a tool like awk to calculate the average. In this example, the average is 362k microseconds, or 0.362 seconds.

ubuntu@ip-172-31-11-241:/var/log/apache2$ cat access.log | cut -d ' ' -f 1 | sort | uniq -c | sort -nr
     31 91.142.209.68
     29 94.102.49.11
     28 173.236.125.26
     19 222.74.212.77
     14 127.0.0.1
362935

It’s helpful to visualize performance as a time series. With many log management systems, you can set this up as a time series chart, which can display statistics like average and maximum over any time window. This helps you see if the issue was a temporary spike. In the example below, you can see the maximum time in blue and average time in pink. Other options for splitting chart include by host, so you can see if one host is trending slower than others, or by URI to show your slowest pages or API calls.

Timeline of the average and maximum request time.

If you have an internal SLA or a maximum response time you want your responses to be served in, you can use a numeric range search to find responses over that threshold. You can build a regular expression to find these in grep, or search on it directly if you are using a log management system. For example, here is what the search would look like in Loggly:

apache.requestTimeMillis:>500

Too Much Load From One Source?

When your site is under a heavy load, you should know whether the load is from real users or something else like:

  • A configuration or system problem
  • A client app or bot hitting your site too fast
  • A denial of service attack

It’s pretty straightforward to find an answer from your Apache fields.

  • The useragent field can give you a hint as to whether an app or bot is hitting your site. Many simple bots label themselves as such.
  • The remoteaddr field can tell you if specific IPs are generating a significant proportion of traffic.

An IP address you don’t recognize may be a client with a problem or an attacker. If you don’t want to allow this type of use, you might want to consider blocking or rate limiting this particular IP address. To get this information from unix command line tools, you can use a command like this to extract the first field which is the remote IP address:

ubuntu@ip-172-31-11-241:/var/log/apache2$ cat access.log.1 | cut -d ' ' -f 1 | sort | uniq -c | sort -nr
     31 91.142.219.68
     29 94.102.49.11
     28 173.236.125.26
     19 222.74.212.77
     14 127.0.0.1

A great way to visualize if you’re getting too many requests from one source is a pie chart. Here, you can see that one IP is generating more than half of the site’s traffic, which is unusual.

Requests according to client IP address as shown in the Loggly Dynamic Field Explorer and as a pie chart.

Unusual Traffic Patterns?

You’ll want to keep a monitor up and running on your site to look for unusual behavior that could be a security problem, or even potential attacks. For example, game developers need to look out for people who are trying to cheat or interfere with other players. In this scenario, you want to try to identify behavior patterns that users wouldn’t exhibit in real life. There are two big approaches to finding unusual events: top-down and bottom-up. For top-down, look at high level metrics to see if there are unusual traffic patterns that could compromise your site (such as too many requests from a particular IP). You can watch these on a dashboard or set alerts when critical thresholds are reached.

Dashboard of Apache performance metrics, status, traffic, and errors. © 2019 SolarWinds, Inc. All rights reserved.

For a bottom-up analysis, start by subtracting the legitimate traffic you already know about. Drill down to just the errors. Then within the errors, look at each one to determine the cause. Oftentimes, 80% of the errors are caused by a small number of known problems, so subtract those from your search. Now you can easily see unusual things like odd user agents or URLs that aren’t legitimate. Make sure your site is secure against each of these vulnerabilities. For instance, above we saw an example of the ZmEu user agent hunting for PHP vulnerabilities. We should ensure each of the URLs are returning errors so the scanner is blocked.

$ grep ZmEu access.log.1 | cut -d ' ' -f 9 | sort | uniq -c
     47 404

Log management systems will make this type of analysis as easy as clicking on the ZmEu user agent to drill down on it, then displaying a summary of status codes.

Drilling down to logs related to a specific user agent in Loggly.