Ultimate Guide to Logging

Your open-source resource for understanding, analyzing, and troubleshooting system logs

Parsing Apache Logs

Apache reports extensive data about your website, server, and users, but the trick is extracting that data from logs. Parsing Apache logs converts the raw text produced by Apache into fields that can be indexed, searched, and analyzed. This makes it easier to oversee Apache, drill down into specific problems, or look at broader trends.

This section shows how to parse Apache logs using common command line tools, as well as log management solutions. When demonstrating these solutions, we’ll use the example of parsing out the HTTP status code (500) from the following log message:

10.185.248.71 - - [09/Jan/2015:19:12:06 +0000] 808840 "GET /inventoryService/inventory/purchaseItem?userId=20253471&itemId=23434300 HTTP/1.1" 500 17 "-" "Apache-HttpClient/4.2.6 (java 1.5)"

Using Unix Command Line Tools

You can use Unix command line tools to parse out fields like the status code. Some people prefer to use a tool like grep which can extract data based on regular expressions. The example below extracts patterns of three digits surrounded by spaces. The spaces prevent a match to spurious data such as a timestamp. This will mostly be the status codes, but could include false positives.

$ grep -o " [0-9]{3} " /var/log/apache2/access.log

Which outputs:

 404 
 404 
 200 
 200 
 200 
...

Another example is to use cat, which prints each line, and then use the cut command to take the ninth block of characters that are delimited by spaces. This might have fewer false positives given that you don’t change the Apache log format.

$ cat access.log | cut -d ' ' -f 9

This output looks similar:

200
404
404
404
404
...

Using Log Management Systems

Good log management systems can parse Apache logs, and some can do it automatically while others require configuration. Tools like Logstash or Fluentd require configuration for parsing. They use Grok filters, which are regular expressions to extract each field from the log line. Grok also includes a library of regular expressions to extract many formats, but you’ll have the find the right one for you. Here is an example configuration for Logstash.

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }
  date {
    match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
}

Services like SolarWinds® Loggly® can automatically recognize and parse Apache logs. They’ll do it without any configuration if you use one of the common formats. Here is what you’ll see in their expanded event view. Each field has been parsed out and labeled. It’s now ready for the next step, which is analysis!

A parsed Apache log sent to SolarWinds Loggly via syslog.