Ultimate Guide to Logging

Your open-source resource for understanding, analyzing, and troubleshooting system logs

Parsing Java Logs

Extracting data from log files can be tricky, but it can also provide insight into the performance and usability of your application. There are a number of utilities for digesting and presenting log data in the form of lists, tables, charts, and graphs. These include basic text processing tools such as grep or sed, to more advanced desktop and web tools that can parse almost any log format.

This section explores some of these utilities and how you can use them to extract more data from your logs. Although we’ll demonstrate the use of command-line utilities, we recommend using a log management solution such as SolarWinds® Loggly® to make log parsing and analysis much easier.

Parsing JSON Logs

The JSON format makes it easy to extract data from log files, since the data’s already stored in a structured format. Each log entry includes the date and time of the entry, the name of the Logger that recorded the entry, and many other useful elements.

jq

One popular open source parser, jq, makes traversing JSON files simple and straightforward. In addition to a command-line utility, jq provides a web interface for testing commands. The following example reads a file called “myLog.json” and returns the Logger name and message for any entries that contain “Exception” in the message.

$ jq '.[] | if contains({message: "Exception"}) then .logger + " -" + .message else empty end' myLog.json

Extracting exception details from multiple JSON log messages. © 2019 the jq development team. All rights reserved.

You can then pipe the results into command-line tools such as grep and sed to format the results. You can also group and sort the results using the utilities sort and uniq. sort arranges the output, and uniq provides a count for the number of identical exceptions. For example, we can sort our results by Exception type in alphabetical order.

$ jq '.[] | if contains({message: "Exception"}) then .loggerName + "-" + .message else empty end' myLog.json | grep -o "w*-|w*Exception" | sed 'N; s/-n/ /' | sort -r | uniq -c

1 JavaLoggingBasics FileNotFoundException
2 JavaLoggingBasics RuntimeException

sort, uniq, and advanced grep options are covered in more detail in the Parsing Multiline Stack Traces section.

Log Management Tools

Log management tools process and store log data, making it easier to query, sort, and filter events. Instead of having to parse and provide the log data yourself using commands, the log management tool parses the log data for you and provide the means to search or extract logs using simple commands.

For example, imagine you want to find out which exception types are occurring most frequently. Using a log management service like SolarWinds® Loggly®, you can query your logs for messages containing “Exception”, then plot those logs in a pie chart to display their ratio. For example, we created the following chart in Loggly. Note there are more PSQLExceptions than FileNotFoundExceptions. Perhaps this will help us prioritize which bug to fix first?

Creating a pie chart of exceptions from log data. © 2019 SolarWinds, Inc. All rights reserved.

By logging additional fields such as the calling method, you can fine-tune your ability to query and traverse logs. You can find more information on log management tools in the Centralizing Java Logs section of the guide.

Parsing XML Logs

Like JSON, XML makes it easy to store and parse logs due to its structured format. The following is an example of an XML log created using java.util.logging:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE log SYSTEM "logger.dtd">
<log>
  <record>
    <date>2015-04-03T16:04:44</date>
    <millis>1428091484804</millis>
    <sequence>0</sequence>
    <logger>MyClass</logger>
    <level>SEVERE</level>
    <class>MyClass</class>
    <method>main</method>
    <thread>1</thread>
    <message>An exception occurred.</message>
  </record>
</log>

Log4j and Logback support exporting to XML via the XMLLayout Layout. To export Log4j logs as XML, set an XMLLayout as the Appender Layout in your Log4j configuration file. For example, we can write to an XML file using a FileAppender.

<File name="FileAppender" fileName="myLog.log">
  <XmlLayout complete=”true” />
</File>

While java.util.logging stores event details as elements, Log4j instead stores them as attributes. The same event recorded in Log4j generates the following XML.

<?xml version="1.0" encoding="UTF-8"?>
<Events xmlns="<a href="http://logging.apache.org/log4j/2.0/events">http://logging.apache.org/Log4j/2.0/events</a>">
  <Event xmlns="http://logging.apache.org/Log4j/2.0/events" ti  meMillis="1428091484804" thread="main" level="ERROR" loggerN  ame="MyClass" endOfBatch="false" loggerFqcn="org.apache.logging.Log4j.spi.AbstractLogger">
    <Message>An exception occurred</Message>
  </Event>
</Events>

Command-Line Tools for Parsing XML Logs

Command-line interface (or CLI) tools can also parse Java logs, but it can be challenging depending on the log format. Many command-line tools process individual lines of data at a time, whereas many formats spread a single log event across multiple lines.

For structured formats such as JSON and XML, you may be able to compress logs to a single line by enabling an attribute in your Layout configuration. For example, setting the XMLLayout compact attribute to true in Log4j will print each log on a separate line. JSONLayout also supports this attribute.

Alternatively, you can use a utility such as xml2 to convert your multiline XML logs. xml2 is a popular open source library for parsing XML files. Passing an XML log file to xml2 returns a list of nodes organized by their level in the XML document.

$ xml2 < myLog.xml
 
/log/record/date=2015-04-03T16:04:44
/log/record/millis=1428091484804
/log/record/sequence=0
/log/record/logger=MyClass
/log/record/level=SEVERE
/log/record/class=MyClass /log/record/method=main
/log/record/thread=1 /log/record/message=An exception occurred

xml2 works equally well when with parsing Log4j logs. Note that attributes begin with “@”.

/Events/@xmlns=http://logging.apache.org/Log4j/2.0/events
/Events/Event/@xmlns=http://logging.apache.org/Log4j/2.0/events
/Events/Event/@timeMillis=1428502398194
/Events/Event/@thread=main /Events/Event/@level=ERROR
/Events/Event/@loggerName=MyClass /Events/Event/@endOfBatch=false
/Events/Event/@loggerFqcn=org.apache.logging.Log4j.spi.AbstractLogger
/Events/Event/Message=An exception occurred

We can reduce this further by using the sed command to remove the repeating text at the beginning of each line. sed, or Stream EDitor, is an open-source utility for manipulating text. In this example, we’ll use sed to replace each instance of /log/record/ with an empty string. The field to replace and its replacement value are delimited by a colon, which is defined by the s parameter. The g parameter indicates that we’re performing this replacement for all instances of /log/record/, not just the first.

$ xml2 < myLog.xml | sed 's:/log/record/::g'

date=2015-04-06T14:42:40
millis=1428345760991
sequence=0
logger=MyClass
level=SEVERE c
lass=MyClass
method=main
thread=1
message=An exception occurred

While this is easier to work with, the data’s still split across multiple lines. We can fix this using tr, a Unix command for replacing individual characters in a block of text. Unlike sed, which works on only one line at a time, tr replaces characters across multiple lines. We can use tr to remove new-line characters by piping the outputs of xml2 and sed to tr:

$ xml2 < myLog.xml | sed 's:/log/record/::g' | tr 'n' ' '

date=2015-04-03T16:04:44 millis=1428091484804 sequence=0 logger=MyClass level=SEVERE class=MyClass method=main thread=1 message=An exception occurred

If xml2 displays only one entry, try setting the XMLLayout complete attribute to true in your logging framework’s configuration. If your log file contains multiple log entries, you may need to use an additional sed command to add a new line between entries, otherwise the entire log file will appear on a single line. With java.util.logging, XMLFormatter places an empty /log/record between events. You can replace that empty record with a new line by adding | sed ‘s:/log/record :n:g to the end of the command.

We’ve essentially compressed an entire XML document into a single line, with each attribute identified by its shorthand name. This makes it easier to use tools like grep to search log files based on the results of one or more fields. This also makes it easier to parse the file, since it consists entirely of key-value pairs.

Desktop Log Viewers for Parsing Logs

Desktop log viewers provide a visual interface for viewing and parsing log files. The benefit of these desktop log viewers is they can automatically parse and present log files in a way that makes them easy to sort, search, and index. These tools also make it possible to filter entries based on certain criteria or display trends in the form of graphs or charts.

Compatibility with a certain tool depends on which Layout was used to create your logs. For example, JSON logs can be used by any number of JSON-compatible tools, while a PatternLayout may require a more specific tool. Make sure to read the documentation and supported formats for each tool.

Some of the more popular log viewers include:

  • Apache Chainsaw, which was developed by members of the Log4j development community
  • OtrosLogViewer, an open-source viewer that supports standard and custom log formats from java.util.logging and Log4j
  • LogMX, a feature-rich viewer with support for live tailing logs, exporting logs, and regular expression searches

Viewing an error log in OtrosLogViewer. © 2015 OtrosLogViewer development team. All rights reserved.

Parsing Multiline Stack Traces

Stack traces add a layer of complexity to log files by splitting individual events across multiple lines. Many log parsing solutions interpret new lines as separators between events, causing stack traces to appear as multiple separate events rather than a single event. For a brief overview of exceptions, see Logging Uncaught Exceptions.

For example, the following code generates a stack trace by attempting to divide by zero:

import java.io.*;
import java.util.logging.*;
...
try {
  int i = 1 / 0;
}
catch (Exception ex) {
  logger.log(Level.SEVERE, "Exception: ", ex);
}

This results in the following output:

May 16, 2019 1:12:00 PM MyClass main
SEVERE: Exception:
java.lang.ArithmeticException: / by zero
  at MyClass.main(MyClass.java:17)

The first line contains a timestamp, the name of the class, and the name of the thread that caused the event. The second line contains the log level and message supplied to the Logger. From the third line on, we’re given the stack trace generated at the time of the exception. In order to extract data from this event, we need to treat each and every line as part of the same event.

The most effective way to handle stack traces is to log in a structured format such as XML or JSON. However, if you are reading unstructured logs from a file, you can import those files into syslog using the rsyslog imfile module. imfile has a paragraph read mode that automatically detects individual log events in a file based on their spacing from each other. This can be used to identify stack traces, which are typically indented after the first line.

Once the log and stack trace are merged into a single event, you can begin parsing the rest of the log entry.

Parsing Multiline Stack Traces Using Regular Expressions

Regular expressions (often called regex or regexp) are patterns used to match one or more characters in a string. Regular expressions are supported by countless programming and scripting languages, applications, and utilities. You can find more information on regular expressions, including guides and tutorials, at Regular-Expressions.info.

Command-line tools such as grep allow you to search and parse files using regular expressions. Using grep, we can extract log data that matches the format of a stack trace by using a regular expression.

The following example searches each line in the myLog.log file for an exception. Let’s break down this grep command: the -e flag tells grep to match a pattern based on the regular expression provided. Exception: tells grep to match any files that contain the string “Exception:”. [[:space:]] at tells grep to return any lines that begin with whitespace followed by the word “at”. | tells grep to match on either condition, which lets it return lines that begin with “Exception” or “at.”

grep -e "Exception:|[[:space:]]at" ~/myLog.log

SEVERE: Exception:
java.lang.ArithmeticException: / by zero
  at MyClass.main(MyClass.java:17)

The “Exception:” string in the regular expression is just an example. Any message logged by your logger can be used here. This also works for longer stack traces, such as the following FileNotFoundException:

SEVERE: Exception:
java.io.FileNotFoundException: notMyLog.log (No such file or directory)
  at java.io.FileInputStream.open(Native Method)
  at java.io.FileInputStream.<init>(FileInputStream.java:146)
  at java.io.FileInputStream.<init>(FileInputStream.java:101)
  at java.io.FileReader.<init>(FileReader.java:58)
  at MyClass.main(MyClass.java:20)

Similar to the example shown in Structured Stack Trace Logs, you can use the grep advanced options to analyze stack traces. While the previous example used positive lookaround, this example uses negative lookaround to find instances of “Exception” that aren’t preceded by a space. Again, this could change depending on how your log messages are structured.

$ grep -oP "(?<! )w+Exception" ~/myLog.log | uniq -c | sort -r

1 FileNotFoundException
1 ArithmeticException

Parsing Multiline Stack Traces Using Log Management Tools

Several log management tools are designed to handle stack traces seamlessly.

Logstash/Grok

Logstash is an open-source data processing pipeline capable of ingesting, parsing, converting, and exporting log data. At the heart of Logstash’s parsing capabilities is grok, which uses regex to split log events into individual tokens. Logstash provides around 120 grok patterns supporting some of the most common log formats.

In addition to grok, parsing stack traces requires the multiline plugin, which merges log entries spread across multiple lines into a single event. An alternative is to use rsyslog with the imfile module, which also supports the ability to read multiline logs as single events before passing them onto syslog.

The following is an example of multiline configuration that treats stack traces as part of a single line.

codec => multiline {
  pattern => "^[a-zA-Z]{3} [0-9]{2}"
  what => "next"
  negate => true
 }

The pattern parameter is a regular expression that tells the filter how to separate log entries. By default, the java.util.logging SimpleFormatter records log entries starting with a three-letter month followed by the day. Each time the multiline plugin comes across this pattern, it creates a new event. The what parameter tells Logstash how the line containing the pattern relates to its neighboring lines; in this case, we pair it with the next line in the log file. Finally, the negate parameter tells Logstash to treat lines that don’t match the pattern as part of the same event.

Put together, this configuration tells Logstash to look for lines that contain the pattern, and treat any of the following lines that don’t match the pattern as part of the same event.

Now that we’ve merged our stack trace into a single event, we need to tell grok how to parse it. Grok works by comparing the text of each event against predefined patterns. Logstash stores the log event in the message field, so we’ll tell grok to search for matches based on that field:

filter {
  grok {
    match => [ “message”, “(?m)%{MONTH:month} %{MONTHDAY:day}, %{YEAR:year} %{TIME:time} %{WORD:time_of_day} %{WORD:class} %{WORD:thread}n{%WORD:level}: %{DATA:msg}n{%GREEDYDATA:stacktrace}”
]
  }
}

This looks confusing, but it’s actually very straightforward. We’re breaking down each part of the event into an individual token and assigning each token a name. For example, %{MONTH:month} searches for a string that matches the pattern for a month name, then assigns the field the name “month.” Using plain regular expressions, the same search would look like this:

b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)b

The result is the same, but grok’s predefined patterns make the process much easier and much less verbose. This same process applies to each of the following fields based on where they appear in the log event. The (?m) modifier at the beginning tells grok to treat the log as a multiline string.

Once the event is parsed, we can use Logstash to export it to JSON, resulting in a much more structured event.

{
  "month": "Apr",
  "day": "09",
  "year": "2015",
  "time": "1:12:00",
  "time_of_day": "PM",
  "class": "MyClass",
  "thread": "main",
  "level": "SEVERE",
  "msg": "Exception: ",
 "stacktrace": "java.lang.ArithmeticException: / by zero at MyClass.main(MyClass.java:17)n"
 }

Much of the key information has been extracted, but the stack trace itself still contains a lot of data. Adding a few more grok filters will help break down the stack trace even further, allowing us to extract the exception type and location.

Note that Logstash might include the original log message as part of the JSON output. To suppress it, add remove_field => [ “message” ] to the grok filter.

SolarWinds Loggly

Loggly uses predefined filters to automatically detect and parse stack traces into individual tokens, similar to Logstash and grok. The benefit is Loggly does this automatically as it ingests logs from your application.

Compare the same ArithmeticException from above to the following screenshot, which shows the log as it appears in Loggly after being automatically parsed and indexed.

Viewing a stack trace in SolarWinds Loggly. © 2019 SolarWinds, Inc. All rights reserved.

Additional Resources

Command Line Tools

A Beginner’s Guide to Grep (Open Source For You) – Guide to using grep

Getting Started with Logstash (Elastic) – Guide to using Logstash

GUI Tools

LogMX (LightySoft)

OtrosLogViewer Tutorial (Otros Systems)

Regular Expressions

Regular-Expressions.info (Jan Goyvaerts) – Regular expression tutorials and examples

Using Grep & Regular Expressions to Search for Text Patterns in Linux (DigitalOcean) – Guide to using regular expressions in grep

Grep Tutorial (Regular-Expressions.info) – Introduction to using grep for regular expressions