LoggingThe Ultimate Guide

your open-source resource for understanding, analyzing, and troubleshooting system logs

curated byloggly

0

Parsing Java Logs

Extracting data from log files can be tricky, but it can also give you insight into the performance and usability of your application. There are a number of utilities for digesting and presenting log data in the form of lists, tables, charts, and graphs. This section explores some of these utilities, and you can use them to extract more data from your logs.

This section explains how to parse logs using graphical and command-line tools. For more information on extracting data from logs, see the Parsing Java Logs section.

Parsing XML Logs

The XML format makes it easy to extract data from log files, since the data is already stored in a structured format. Each log entry includes the date and time of the entry, the name of the Logger that recorded the entry, and many other useful elements.

The following is an example of an XML log created using java.util.logging:

Log4j and Logback support exporting to XML via the XMLLayout Layout. To export Log4j logs as XML, set an XMLLayout as the Appender’s Layout in your Log4j configuration file:

With Log4j, event details are stored as attributes instead of elements. The same event recorded in Log4j results in the following output:

GUI Tools for Parsing Java Logs

There are several open source GUI tools for viewing and parsing log files. One of the more popular Java log viewers is OtrosLogViewer. OtrosLogViewer imports standard and custom log formats from java.util.logging and Log4j. Other popular log analysis tools include LogMXSplunk, and Graylog.

Screenshot from OtrosLogViewer
Screenshot from OtrosLogViewer

The benefit of GUI tools is that they can automatically parse and present log files in a way that makes them easy to sort, search, and index. These tools also make it possible to filter entries based on certain criteria or display trends in the form of graphs or charts.

Some tools might not be compatible with certain Layouts, in which case you need to specify a custom format. Details on how to do that will depend on the tool used. For more information, see the documentation for your preferred tool.

Command-Line Tools for Parsing XML Logs

Parsing XML files is possible using command-line tools, but it can be difficult. Many command-line tools process individual lines of data at a time, whereas XML typically spreads data across multiple lines. For parsing logs via command-line, you can set the XMLLayout’s compact attribute to true, use another Layout such as a JSONFormatter, or use a utility such as xml2.

xml2 is a popular open source library for parsing XML files. Passing an XML log file to xml2 results in a list of nodes organized by their level in the XML document:

xml2 works equally well when with parsing Log4j logs. Note that attributes begin with @:

We can use the sed command to reduce this by removing the repeated text at the beginning of each line. sed, or Stream EDitor, is an open source utility for manipulating text. When used below, sed replaces each instance of /log/record/ with an empty string. The field to replace and its replacement value are delimited by a colon, which is defined by the s: parameter. The g parameter indicates that we’re performing this replacement for all instances of /log/record/, not just the first.

While this is easier to work with, the data is still split across multiple lines. tr is a Unix command for replacing individual characters in a block of text. Unlike sed, which works on only one line at a time, tr replaces characters across multiple lines. In this case, you can use tr to remove new-line characters by piping the outputs of xml2 and sed to tr.

If xml2 displays only one entry, try changing the XMLLayout’s complete attribute to true in your logging framework’s configuration. If your log file contains multiple log entries, you may need to use an additional sed command to add a new line between entries; otherwise, the entire log will appear on a single line. With java.util.logging, xml places an empty /log/record between events. You can replace that empty record with a new line by adding | sed ‘s:/log/record :n:g to the end of the command.

We’ve essentially compressed an entire XML document into a single line, with each attribute identified by its shorthand name. This makes it easier to use tools like grep to search log files based on the results of one or more fields.

Parsing JSON Logs

Like XML logs, JSON logs are difficult to parse using command-line tools. However, there are several utilities designed to work specifically with JSON.

jq

One popular open source parser, jq, makes traversing JSON files simple and straightforward. In addition to its command-line component, jq provides a web interface for testing commands. The following example reads the myLog.json file and returns the Logger name and message content for entries that contain “Exception” in the message:

json log view

You can then pipe the results into command-line tools such as grep and sed to format the results. You can also group and sort the results using the utilities sort and uniq. sort arranges the output, and uniq provides a count for the number of identical exceptions.

The previous command produces the following results:

sort, uniq, and advanced grep options are covered in more detail in the Parsing Java Logs section.

Log Management Tools

Log management tools create repositories of log data, allowing you to group, filter, and query events more easily. Rather than having to extract data using commands and regular expressions, you can search the contents of your logs using simple queries.

For example, imagine you want to find out which exception types are occurring most frequently. You can query your logs for messages that contain “Exception”; then plot those logs in a pie chart to display their ratio. Given the following log entries:

We can create the following chart using a service like Loggly. It shows that we get about twice as many FileNotFoundExceptions as ArithmeticExceptions. Perhaps this will help us prioritize which bug to fix first?

exceptions pie chart

By logging additional fields such as the calling method, you can fine-tune your ability to query and traverse logs. You can find more information on log management tools in the Centralizing Java Logs section of the guide.

Parsing Multi-Line Stack Traces

Stack traces add a layer of complexity to log files, since they involve multiple lines related to a single element (the exception that generated the stack trace). For a brief overview of exceptions, see Logging Uncaught Exceptions.

For example, the following code logs the result of trying to divide by zero:

This results in the following output:

On the first line, we’re given a timestamp, the name of the class, and the name of the thread that the event occurred in. One the second line, we’re given the log level and the message supplied to the Logger. From the third line on, we’re given the stack trace generated at the time of the exception. In order to extract data from this event, we need to treat each line as part of a single event. Certain Appenders, such as Spidertracks’ logglyLog4j Appender, will automatically group multi-line logs. Alternatively, you can export logs from a file to syslog using the rsyslog imfile module. imfile has a paragraph read mode that automatically detects individual log events in a file based on their spacing from each other.

Once we’ve merged the individual lines into a single event, we can begin extracting data from the log entry.

Structured Stack Trace Logs

As mentioned in the Parsing XML Logs section, logging in a format such as XML or JSON makes it easier to extract data from log entries. The same exception from above is shown here in XML form using java.util.logging.XMLFormatter:

Rather than using complicated Layouts or patterns to extract data, key information about the stack trace is already available as individual elements.

You can use xml2 to extract data from the stack trace, for instance, to list the most common exception classes. The process is similar to Parsing JSON Logs, except the syntax is slightly different. With jq, we were able to construct our output by specifically adding the logger and message fields. With xml2, we’ll have to use grep to filter out the exception class.

The following example uses a regular expression technique called lookaround. Lookaround returns a match based on surrounding characters, rather than the string itself. Lookahead matches a pattern based on the following characters, while lookbehind matches based on the preceding characters. For instance, to return the first “a” in “Java,” you can use the lookahead a(?=v) to find a letter “a” followed by a “v.” Likewise, you could use (?=v)a to find an “a” that’s preceded by a “v.” In this case, we’ll look for a string that begins with “exception/message=.”

Lookaround is an advanced regex construct. To use it in grep, add the -P flag to enable Perl regular expressions.

This results in the following output:

Parsing Multi-Line Stack Traces Using Regular Expressions

Regular expressions (often called regex or regexp) are patterns used to match one or more characters in a string. Regular expressions are supported by countless programming and scripting languages, applications, and utilities. You can find more information on regular expressions, including guides and tutorials, at Regular-Expressions.info.

Command-line tools such as grep allow you to search and parse files using regular expressions. Using grep, we can extract log data that matches the format of a stack trace by using a regular expression.

The following example searches each line in the myLog.log file for an exception. Let’s break down this grep command: the -e flag tells grep to match a pattern based on the regular expression provided. Exception: tells grep to match any files that contain the string “Exception:.” [[:space:]]at tells grep to return any lines that begin with whitespace followed by the word “at.” | tells grep to match on either condition, which lets it return lines that begin with “Exception” or “at.”

The “Exception:” string in the regular expression is just an example. Any message logged by your logger can be used here. This also works for longer stack traces, such as the following FileNotFoundException:

Similar to the example shown in Structured Stack Trace Logs, you can use grep’s advanced options to analyze stack traces. While the previous example used positive lookaround, this example uses negative lookaround to find instances of “Exception” that aren’t preceded by a space. Again, this could change depending on how your log messages are structured.

Parsing Multi-Line Stack Traces Using External Tools

Several Java log parsers are designed to handle stack traces seamlessly.

Logstash/Grok

Logstash is a suite of tools for collecting, parsing, and exporting log data. At the heart of Logstash’s parsing capabilities is grok. Grok is essentially a wrapper that simplifies regular expressions when matching against long strings of data. Grok provides over 120 regular expression patterns for multiple languages.

In addition to the grok filter, parsing stack traces requires the multi-line filter, which treats log entries spread across multiple lines as a single event. An alternative is to use rsyslog with the imfile module, which recognizes multi-line logs as individual events before passing them to a syslog.

The following code is an example of a Logstash multi-line filter configuration:

The Pattern parameter is a regular expression that tells the filter how to separate log entries. By default, java.util.logging’s SimpleFormatter records log entries starting with a three-letter month followed by the day. Each time the multi-line filter comes across this pattern, it creates a new event. The what parameter tells Logstash how to tread adjacent lines. The negate parameter tells Logstash to treat lines that don’t match the pattern as part of the same event, rather than as a new event. Put together, this configuration tells Logstash to treat a pattern match as a new event and treat the following lines that don’t match the pattern as part of the same event.

Now that Logstash knows how to interpret the log file, we need to tell grok how to parse it. Grok works by matching pre-defined patterns against blocks of text. Logstash stores the log data in the message field, so we’ll tell grok to search for matches based on that field:

This looks confusing, but it’s actually very straightforward. We’re breaking down each part of the log entry into an individual token and assigning each token a field name. For example, %{MONTH:month} searches for a field that matches the pattern for a month name and assigns the field the name “month.” Using plain regular expressions, the same search would look like this:

This same process applies to each of the following fields based on where they appear in the log event. The (?m) modifier at the beginning tells grok to treat the log as a multi-line string.

Exporting the event to JSON results in a much more structured log entry:

Much of the key information has been extracted, but the stack trace itself still contains a lot of data. Adding a few more grok filters will help break down the stack trace even further, allowing us to extract the exception type and location.

Note that Logstash might include the original log message as part of the JSON output. To suppress it, add remove_field => [ “message” ] to the grok filter.

Other Log Management Tools

Log management tools such as Loggly and Splunk can automatically parse partial and multi-line stack traces. These tools often used pre-defined filters to detect and break down log entries into distinct tokens, similar to Logstash and grok. The benefit is that this is done seamlessly and automatically as part of the import process.

Compare the same ArithmeticException from above with the output below, which has been automatically parsed and imported into a log management tool:

java arithmetic exception

Additional Resources

Command Line Tools

A Beginner’s Guide to Grep (Open Source For You) – Guide to using grep

Getting Started with Logstash (Elastic) – Guide to using Logstash

GUI Tools

LogMX (LightySoft)

OtrosLogViewer Tutorial (Otros Systems)

Regular Expressions

Regular-Expressions.info (Jan Goyvaerts) – Regular expression tutorials and examples

Using Grep & Regular Expressions to Search for Text Patterns in Linux (DigitalOcean) – Guide to using regular expressions in grep

Written & Contributed by

Andre

Tony

This guide will help software developers and system administrators become experts at using logs to better run their systems. This is a vendor-neutral, community effort featuring examples from a variety of solutions

Meet Our Contributors Become a contributor