LoggingThe Ultimate Guide

your open-source resource for understanding, analyzing, and troubleshooting system logs

curated byloggly


Troubleshoot with Apache Logs

Here are common questions people want to answer using the Apache logs:

Are There Too Many Errors?

Apache logs have errors from two sources: the error log and error status codes in the access log. HTTP status codes that are errors are 400 or above (see the overview in the What To Log In Apache section). Most often people want to see the count of errors, especially in proportion to the overall traffic. If the proportion of errors is too high, you know that something is wrong. First, you have to centralize the logs from all relevant hosts. Then if you are using Unix tools, you can run this type of command to parse out the status code, count them, and then sort them descending by count.

Log management systems can give you these counts in a single click so you dont have to worry about parsing and counting them yourself. Instead of the above work, it should give you get quick summaries and visualizations of the error counts in many different formats such as table or pie chart. These screenshots were generated from Loggly’s trend view. Here you can quickly see 40% of responses are 400 or higher, which is too many errors.


What is Causing 404s?

A 404 error is defined as a missing file or resource. Looking at the request URI will tell you which one it is. You can then check your deployment to make sure a file hasn’t accidentally been deleted or removed. If it has, you can add it back in and redeploy the site. If you’re using unix command line tools, you can use grep to find the 404s, then cut to extract the URL, then uniq and sort to summarize the list.

Log management systems will automatically parse the Apache log lines, allow you to search or filter down on 404 errors, and then summarize a count of the results. In the example below, you can see this as a table or bar chart.


If you’re running a live site, often a feature will work fine for some browsers but not others. It’s often helpful to see breakdown of errors by browser. If you know you’re dealing with a problem that occurs only with Internet Explorer, you will prioritize it differently. You can also focus on that browser when troubleshooting. You can get this summary of errors using Loggly Dynamic Field Explorer. Just filter your logs on a specific status code (e.g. 404 or 500) and click on the userAgent field. If you are running an API, it’s helpful to see which client libraries may have issues. The Apache logs can aid in troubleshooting issues with client libraries or agents, and even show you which are most popular. When you’re using unix command line tools, you can extract the top user agents using this type of command:

Log management systems will automatically centralize, parse, and analyze these counts for you. You can show them in a table or bar chart format. You can see the top user agent here is ZmEu which is a vulnerability scanner looking for weaknesses in PHP. PHP is not installed on this server so we are safe. If we were concerned a quick solution would be to block that IP in our firewall.


Site Loading Too Slowly?

Many users won’t tolerate even a minor slowdown; they’ll get frustrated and complain or stop using your service. You should continuously monitor response times to make sure your server is running fast and that its performance is consistent over time. You can do this by tracking the response time captured in your Apache logs. If you have a REST API, the same approach can be use to track the performance of API calls, in terms of both speed and consistency. This can help you track and meet SLAs for internal teams or apps that depend on your service. When you’re using unix command line tools, you can extract the request time field, and then use a tool like awk to calculate the average. In this example, the average is 362k microseconds, or 0.362 seconds.

It’s helpful to visualize performance as a time series. With many log management systems, you can set this up as a time series chart, which can display statistics like average and maximum over any time window. This helps you see if the issue was a temporary spike. In the example below, you can see the maximum time in dark blue and the average time in green. Other options for splitting chart include by host so you can see if one host is trending slower than others, or by URI to show your slowest pages or API calls.


If you have an internal SLA or a maximum response time you want your responses to be served in, you can use a numeric range search to find responses that are over that threshold. You can build an regular expression to find these in grep, or if you are using a log management systems you can search on it directly. For example, here is what the search would look like in Loggly:

Too Much Load From One Source?

When your site is under a heavy load, you should know whether the load is from real users or something else like:

  • A configuration or system problem
  • A client app or bot hitting your site too fast
  • A denial of service attack

It’s pretty straightforward to find an answer from your Apache fields.

  • The userAgent field can give you a hint as to whether an app or bot is hitting your site. A lot of simple bots label themselves as such.
  • The remoteAddr can tell you if specific IPs are generating a significant proportion of traffic.

An IP address you don’t recognize might be a client with a problem or an attacker. If you don’t want to allow this type of use, you might want to consider blocking or rate limiting this particular IP address. To get this information from unix command line tools, you can use a command like this to extract the first field which is the remote IP address:

A great way to visualize if you’re getting too many requests from one source is as a pie chart. Here you can see that one IP is generating more than half of the site’s traffic, which is unusual.


Unusual Traffic Patterns?

You’ll want to keep a monitor up and running on your site to look for unusual behavior that could be a security problem or even potential attacks. For example, game developers need to look out for people who are trying to cheat or interfere with other players. In this scenario, you would want to try to identify behavior patterns that users wouldn’t exhibit in real life. There are two big approaches to finding unusual events: top down and bottom up. For top down, look at high level metrics to see if there are unusual traffic patterns that could compromise your site such as too many requests from a particular IP. You can watch these on a dashboard or set alerts when critical thresholds are reached.


For a bottom up analysis, start by subtracting the legitimate traffic you already know about. Drill down to just the errors. Then within the errors look at each one to determine the cause. Oftentimes 80% of the errors are caused by a small number of known problems, so subtract those from your search. Now you can easily see unusual things like odd user agents or URLs that aren’t legitimate. Make sure your site is secure against each of these vulnerabilities. For example, above we saw an example of the ZmEu user agent hunting for PHP vulnerabilities. We should make sure each of the URLs are returning errors so the scanner is blocked.

Log management systems will make this type of analysis as easy as clicking on the ZmEu user agent to drill down on it, then displaying a summary of status codes.


Written & Contributed by


This guide will help software developers and system administrators become experts at using logs to better run their systems. This is a vendor-neutral, community effort featuring examples from a variety of solutions

Meet Our Contributors Become a contributor