Parsing Apache Logs

Apache reports extensive data about your website, server, and users, but extracting that data from logs is the trick. Parsing Apache logs convert the raw text produced by Apache into fields that can be indexed, searched, and analyzed. This makes it easier to oversee Apache, drill down into specific problems, or look at broader trends.

This section shows how to parse Apache logs using common command line tools, as well as log management solutions. When demonstrating these solutions, we’ll use the example of parsing out the HTTP status code (500) from the following log message:

10.185.248.71 - - [09/Jan/2015:19:12:06 +0000] 808840 "GET /inventoryService/inventory/purchaseItem?userId=20253471&itemId=23434300 HTTP/1.1" 500 17 "-" "Apache-HttpClient/4.2.6 (java 1.5)"

Using Linux Command Line Tools

You can use Linux command line tools to parse out information from Apache logs. For example, because Apache logs are plaintext files, you can use cat to print the contents of an Apache log to stdout. You can also use tools like grep, awk, and cut for filtering and pattern matching,

Below we’ll take a look at different methods for using Linux command line tools to parse Apache logs.

Parse for HTTP status codes

Some people prefer to use a tool like grep which can extract data based on regular expressions. The example below extracts patterns of three digits surrounded by spaces. The spaces prevent a match to spurious data such as a timestamp. This will mostly be the status codes but could include false positives.

$ grep -o " [0-9]{3} " /var/log/apache2/access.log

Which outputs:

404
404
200
200
200
...

Another example is to use cat, which prints each line, and then use the cut command to take the ninth block of characters that are delimited by spaces. This might have fewer false positives given that you don’t change the Apache log format.

$ cat access.log | cut -d ' ' -f 9

This output looks similar:

200
404
404
404
404
...

Stream the contents of a log to stdout

For real time troubleshooting and debugging, it’s often useful to stream the contents of a log to a terminal with the tail command. To stream the last 10 lines of a log file and any new lines added afterwards, we can use tail -f.

For example, to stream the contents of the Apache access_log file on Rocky Linux 8, use this command:

$ tail -f /var/log/httpd/access_log

Output should look similar to:

192.0.2.11 - - [17/Jul/2022:16:10:20 +0000] "POST /index.htm HTTP/1.1" 404 196 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"
192.0.2.11 - - [17/Jul/2022:16:10:20 +0000] "POST /FD873AC4-CF86-4FED-84EC-4BD59C6F17A7 HTTP/1.1" 404 196 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
192.0.2.11 - - [17/Jul/2022:16:45:22 +0000] "POST /boaform/admin/formLogin HTTP/1.1" 404 196 "http://68.183.24.124:80/admin/login.asp" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:71.0) Gecko/20100101 Firefox/71.0"
192.0.2.12 - - [17/Jul/2022:17:09:18 +0000] "GET / HTTP/1.1" 403 7620 "-" "Linux Gnu (cow)"
192.0.2.12 - - [17/Jul/2022:17:43:19 +0000] "GET / HTTP/1.1" 403 7620 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36"
192.0.2.111 - - [17/Jul/2022:17:53:30 +0000] "GET / HTTP/1.1" 403 7620 "-" "-"
192.0.2.13 - - [17/Jul/2022:17:53:30 +0000] "GET / HTTP/1.1" 403 7620 "-" "Mozilla/5.0 (compatible; CensysInspect/1.1; +https://about.censys.io/)"
192.0.2.42 - - [17/Jul/2022:17:53:30 +0000] "PRI * HTTP/2.0" 400 226 "-" "-"
192.0.2.218 - - [17/Jul/2022:20:00:07 +0000] "\x03" 400 226 "-" "-"
192.0.2.21 - - [17/Jul/2022:20:38:29 +0000] "GET / HTTP/1.1" 403 7620 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
::1 - - [17/Jul/2022:21:14:22 +0000] "DELETE / HTTP/1.1" 405 223 "-" "curl/7.61.1"
::1 - - [17/Jul/2022:21:14:31 +0000] "PUT / HTTP/1.1" 405 220 "-" "curl/7.61.1"
::1 - - [17/Jul/2022:21:14:44 +0000] "PATCH / HTTP/1.1" 405 222 "-" "curl/7.61.1"

You can also pipe the output into grep to find specific patterns. For example, to filter the tail output to only include lines with the word PATCH, use this command:

$ tail -f /var/log/httpd/access_log | grep "PATCH"

Output should look similar to this:

::1 - - [17/Jul/2022:21:14:44 +0000] "PATCH / HTTP/1.1" 405 222 "-" "curl/7.61.1"
::1 - - [17/Jul/2022:21:18:53 +0000] "PATCH / HTTP/1.1" 405 222 "-" "curl/7.61.1"

Count the number of lines that match a condition

On its own, the wc -l command counts the number of lines in a file. For example, this command counts the number of lines in the error.log file on a Debain/Ubuntu/Mint system:

$ wc -l /var/log/apache2/error.log

Output should look similar to:

65 /var/log/apache2/error.log

In that example, 65 is the total number of lines in the Apache error.log.

We can also pipe the output of other commands into wc -l to count the number of lines that match a specific condition. This can be useful to help administrators quickly understand how frequently a specific condition has occurred.

This command allows us to count how many lines in the Apache access log on a Rocky Linux system include the HTTP verb GET:

$ grep GET /var/log/httpd/access_log | wc -l

Output should look similar to:

57

Notice that the output is only a number in this case. That’s because we didn’t provide wc a file, instead, we piped the output of grep into wc.

Parsing Apache logs with awk

The GNU/Linux awk command provides powerful parsing capabilities for plaintext log files. The awk command uses whitespace to separate text into “columns” we can parse. $1 represents the first column, $2 the second column, and so on.

Given that Apache logs follow a specific format, that lets us use awk to reliably parse information from them. For example, this command filters the access log on a Rocky Linux system and prints all the IP addresses (1st column), a space, and the HTTP response code for each line in the file:

$ awk '{print $1, $9}' /var/log/httpd/access_log

The output should look similar to:

192.0.11.11 403
192.0.11.12 404
192.0.11.13 405
192.0.11.14 404
192.0.11.15 404
192.0.11.16 404

...

For a deeper dive on using awk, check out the Filtering and Parsing With Awk section of this guide.

Analyzing Apache logs with apachetop

Apachetop is an open-source command line tool for analyzing Apache log files in real time. While apachetop isn’t installed by default on most systems, it can be installed from reputable repositories for many popular Linux distributions.

It’s easy to get started with apachetop, simply specify an Apache log file and watch the realtime analysis in your terminal. For example, here’s how you can use apachetop to analyze the Apache access log on a Rocky Linux system:

$ apachetop /var/log/httpd/access_log

Output should look similar to:

last hit: 00:00:00         atop runtime: 0 days, 00:00:05             22:32:27
All:            0 reqs (   0.0/sec)          0.0B (    0.0B/sec)       0.0B/req
2xx:       0 ( 0.0%) 3xx:      0 ( 0.0%) 4xx:     0 ( 0.0%) 5xx:     0 ( 0.0%)
R ( 5s):       0 reqs (   0.0/sec)          0.0B (    0.0B/sec)       0.0B/req
2xx:       0 ( 0.0%) 3xx:       0 ( 0.0%) 4xx:     0 ( 0.0%) 5xx:     0 ( 0.0%)

Using Log Management Systems

Good log management systems can parse Apache logs, and some can do it automatically while others require configuration. Tools like Logstash and Fluentd require configuration for parsing. They use grok filters, which are regular expressions to extract each field from the log line. Grok also includes a library of regular expressions to extract many formats, but you’ll have the find the right one for you. Here is an example configuration for Logstash.

filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}

Services like SolarWinds® Loggly® can automatically recognize and parse Apache logs. They’ll do it without any configuration if you use one of the common formats. Here is what you’ll see in their expanded event view. Each field has been parsed out and labeled. It’s now ready for the next step, which is analysis!

Last updated 2022

Parsing Apache Logs

Ultimate Guide to Logging - Your open-source resource for understanding, analyzing, and troubleshooting system logs

Parsing Apache Logs

Using Linux Command Line Tools

Parse for HTTP status codes

Stream the contents of a log to stdout

Count the number of lines that match a condition

Parsing Apache logs with awk

Analyzing Apache logs with apachetop

Using Log Management Systems

See it. Analyze it. Inspect it. Solve it