Automated Parsing Log Types

Loggly will automatically parse many types of data for you including Apache, Nginx, JSON, and more! This allows you to use advanced features like statistical analysis on value fields, faceted search, filters, and more. Even if we don’t have automated parsing available for your log type, you will still be able to log and do full text search over your logs. As you’re searching through your data, you’ll probably notice that we’ve added a field called “logType” to your data. If we couldn’t classify and break down your data into separate fields, all searches would be full text, which would not allow you to take full advantage of Loggly search. In this section we’ll break down the topic of log types and tell you what they are, what we support, and what to do if you send us logs in a currently unsupported format.

Recognized Log Formats

Before we get started we should note that we won’t reinvent the wheel here. Most generally recognized log formats have extensive documentation of their own. When appropriate we will point you to these resources for in-depth information about a specific log format. Our purpose here is to get you familiar for how we think about logs here at Loggly.

In addition, we are constantly working to integrate more supported log types into our ecosystem. In fact, if you send us a log format we don’t currently recognize, we’ll want you to tell us about it.

Some log events may actually be classified as more than one log type. For example, if apache logs are sent over syslog, they’ll show up as logtype apache and syslog. We recognize the following classifications of log formats:

Apache

logType_apache

If you send us apache or nginx logs, we will extract the following standard apache log variables, as defined in the mod_log_config documentation:

%a - RemoteIPOrHost
%A - LocalIPOrHost
%b or %B - Size
%D - RequestUs (microseconds)
%h RemoteIPOrHost
%k - KeepAliveRequests
%l - RemoteLogname
%r - Request
%>s - HttpStatusCode
%t - eventTime
%T - ServiceTimeSeconds
%u - RemoteUser
%U - UrlPath
%v VirtualHost
%X - ConnectionStatus
%{Referer}i - Referer
%{User-agent}i - UserAgent
%{UNIQUE_ID}e - UniqueId
%{X-Forwarded-For}i - XForwardedFor
%{Host}i - Host

We have a number of pre-canned formats that we can guarantee will work, but because the apache log format may be defined however you like, your particular format may not match any of these formats. The formats we do support are:

%h %l %u %t %D \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"
%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"
%h %l %u %t \"%r\" %>s %b
%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %v %h %T %A %>s %T
%h %l %u %t \"%r\" %b \"%{Referer}i\" \"%{User-Agent}i\" %v %h %T %A %>s %T
%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"
%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %{UNIQUE_ID}e %D X-Forwarded-For=%{X-Forwarded-For}i Host=%{Host}i
%{Host}i/%v \"%{X-Forwarded-For}i/%h\" %t \"%r\" %>s %b %D %X %k \"%{Referer}i\" \"%{User-Agent}i\"
%v:%p %h %l %u %t \"%r\" %>s %D \"%{Referer}i\" \"%{User-Agent}i\"
%h %l %u %t \"%r\" %>s %D \"%{Referer}i\" \"%{User-Agent}i\"
%h %l %u %t \"%r\" %>s %D

We also support JSON logging from Apache. This might be a good option if we don’t support a field you need because you can log any fields in any order. Here is an example for the default Apache format in JSON.

LogFormat "{ \"time\":\"%t\", \"remoteIP\":\"%a\", \"host\":\"%V\", \"request\":\"%U\", \"query\":\"%q\", \"method\":\"%m\", \"status\":\"%>s\", \"userAgent\":\"%{User-agent}i\", \"referer\":\"%{Referer}i\" }”

Amazon CloudFront

We automatically parse different types of AWS CloudFront logs. We will extract the standard CloudFront variables, as defined in the CloudFront Access logs documentation

We parse the following types of Cloudfront logs.

    1. Web Distribution Log File Format
date-time,x-edge-location,sc-bytes,c-ip,cs-method,cs(Host),cs-uri-stem,sc-status,cs(Referer),cs(User-Agent),cs-uri-query,cs(Cookie),x-edge-result-type,x-edge-request-id,x-host-header,cs-protocol,cs-bytes,time-taken,x-forwarded-for,ssl-protocol,ssl-cipher,x-edge-response-result-type

cloudfront2

    1. RMTP Distribution Log File Format
date-time,x-edge-location,c-ip,x-event,sc-bytes,x-cf-status,x-cf-client-id,cs-uri-stem,cs-uri-query,c-referrer,x-page-url,c-user-agent,x-sname,x-sname-query,x-file-ext,x-sid

cloudfront

Amazon ELB

We automatically parse different types of AWS ELB Access logs. We will extract the standard ELB variables, as defined in the ELB Access logs documentation

timestamp, elb_name, client_ip, client_port, backend_ip, backend_port, request_processing_time, backend_processing_time, response_processing_time, elb_status_code, backend_status_code, received_bytes, sent_bytes, request, user_agent, ssl_cipher, ssl_protocol

elb

Custom Parsing

There are many existing tools to translate custom log formats into JSON and then send them to Loggly. Here are instructions for popular ones:

Heroku Logs

Heroku logs are sent as syslog to Loggly. Here are the field provided and where they map in syslog. More details are available in Heroku’s documentation.

  • Timestamp – The date and time recorded at the time the log line was produced by the dyno or component.
  • syslog.appName – This is the source of the logs. All of your app’s dynos (web dynos, background workers, cron) have an appName of app. All of Heroku’s system components (HTTP router, dyno manager) have a appName of heroku.
  • syslog.procid – This is the name of the dyno or component that wrote this log line.
  • Message – The content of the log line.

heroku_router_output

Http Headers

Loggly automatically parses out the Http headers from the Http request. They can be viewed as the Http fields in the field explorer. Here is an example of a Http request. We will parse out the headers content-type as contentType and X-Forwarded-For as clientHost from the following request.

curl -H "content-type:text/plain" -H "X-Forwarded-For:203.156.135.1" -d "Hello4" http://logs-01.loggly.com/inputs/TOKEN/tag/http/

Here is how it looks in Loggly’s expanded events view.
Http_Parsing

Java

Log4j:
We automatically parse Java logs that follow this conversion pattern layout in Log4j. We extract the timestamp, method, fully classified class name, thread, and log level. We have instructions on how to configure Log4j with Loggly. Field definitions can be found in Log4j documentation. The first field in the conversion pattern is the syslog appname, in this case it’s set to java. If you are using an HTTP appender, you do not need to add an appname.

java %d{"ISO8601"}{GMT} %p %t %c %M - %m%n

Here is an example of what the Java logtype looks like in Loggly’s expanded event view. You can see each of the Java fields as well as the unparsed message. The syslog header will only be included if you send logs over syslog.

Screen Shot 2014-05-21 at 9.40.26 PM

Logback:
We also automatically parse Java logs that follow this conversion pattern layout. We extract the timestamp, method, fully classified class name, thread, and log level. We have instructions on how to configure Logback with Loggly. Field definitions can be found in Logback documentation.

%d{"ISO8601", UTC}  %p %t %c %M - %m%n

Here is an example of what the Java logtype looks like in Loggly’s expanded event view. You can see each of the Java fields as well as the unparsed message.

java_logback

We automatically parse Java Logback exceptions and show message and stacktrace in individual fields.
logback

Servlets:

If you send us Log4J logs from a Servlet which are of the format –

<Timestamp> <Log4J Priority> <Category> - <msg_id> <status> <responseTimeMs>

Example:

23:51:49 INFO  com.cx.Core2Common.servlet.PostLogHandler  - 6oA4sQHUQYiAOLEB1KGIEg: RESPONSE 200 661

Java stack trace:

We can extract class name and file line number for a partial java stack trace. We also support multi-line stack traces through our syslog collectors if they are sent in a single event. However, the default SyslogAppender splits them into multiple events so you’d need to override it and send them in a single event.

Example:

at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:621)

Java Garbage Collector:
We will parse out the heap size and other statistics from java garbage collector logs. You can read more in our “How-to: Analyzing JVM garbage collection using Loggly”.

Here is an example log line:

2014-05-21 22:51:07.967 UTC 2014-05-21T22:51:07.967+0000: 763119.351: [GC [PSYoungGen: 1383698K->8412K(1387072K)] 2422839K->1047651K(4183296K), 0.0237820 secs] [Times: user=0.14 sys=0.00, real=0.02 secs]

Here is what it looks like in Loggly’s expanded events view. The syslog header will only be included if you send over syslog.

Screen Shot 2014-05-22 at 2.53.14 PM

JSON

We *highly* recommend sending your data in JSON format. It’s going to give you the highest precision of event parsing, which in turn gives you access to all of the cool features that we provide in our analytics toolset.

If you send us JSON, we will extract it provided it is valid JSON and it’s the final part of the message you send. This means you can send us serialized JSON using your normal logging frameworks, and we will find the JSON no matter what precedes it in the event. For example, if you use log4j, your standard log4j headers will precede the JSON in your logged message, but we’ll still be able to extract the JSON. If you’re logging JSON directly to us, then it is (by definition) the final part of the message you send, so all will be well.

To ensure that your JSON is valid, we recommend running a few of your events through a JSON validation tool. e.g. JSON Lint

JSON Timestamp Support

In order for us to recognize the timestamp that you send in with the event, please follow these guidelines:

  • The only timestamp format accepted is ISO 8601 (e.g. 2013-10-11T22:14:15.003Z)
  • We support microseconds/seconds fraction up to 6 digits, per the spec in RFC5424.
  • Must be a top level JSON field called either: “Timestamp”, “timestamp”, or “eventTime”.

Here’s an example of a top-level JSON field, called timestamp. It is not a child of another object, and it has no children:

{ 
"timestamp": "2013-10-11T22:14:15.003123Z",
"travel": {
        "airplane": "jumbo",
        "mileage": 2034
    }
}

JSON Schema

Be aware that even JSON needs to follow a schema when it’s indexed. For example, if an object includes a field that is typed as an integer, it can’t be retyped. Here’s an example of an object called “travel”:

{ 
"travel": {
        "airplane": "jumbo",
        "mileage": 2034
    }
}

By retyping “travel” as a string, shown below, the field will not be indexed:

{
"travel": "none"
}

JSON Field Names

We will index the JSON you send exactly as you send it with one exception: names that contain spaces or dots will be rewritten with those characters replaced by underscores.

This change is required because our search infrastructure doesn’t support either of these characters in field names.

An example:

{ 
"a": 1, 
"b c": 2, 
"d.e": 3, 
"f": {
     "g.h": "4, 
     "g": {
         "h": 5
          }
      }
}

will be rewritten to:

{ 
"a": 1, 
"b_c": 2, 
"d_e": 3, 
"f":{
    "g_h": 4, 
    "g": {
        "h": 5
         }
    }
}

One of the main reasons we do this is to ensure that dot-notation navigation is unambiguous. In the example above, because we changed the name “g.h” to “g_h”, we can now unambiguously distinguish between f.g_h and f.g.h.

A search for JSON data will look like this:

json.f.h:5

Linux System

We parse some system commands like cron jobs and extract the command string, user, and command executable name.

Here is an example log line:

2014-05-21 23:02:01.000 UTC (jon) CMD (/home/jon/sarmon.sh > /dev/null 2>&1)

Here is how it looks in Loggly’s expanded events view. The syslog header will only be included if you send events over syslog.

Screen Shot 2014-05-22 at 3.04.44 PM

MongoDB

We parse out the module and timestamp from MongoDB logs. Here is an example log line from MongoDB:

2014-10-10T05:10:13.778-0400 [clientcursormon] mem (MB) res:28 virt:458

We will parse out the timestamp as “2014-10-10T05:10:13.778-0400″ and the module as “clientcursormon”.
Here is how it looks in Loggly’s expanded events view.
mongo_parsing

MySQL

Loggly supports automated parsing for MySQL logs. We parse out the rowsExamined, lockTime, rowsSent and queryTime from MySQL logs sent via syslog. Here is an example log line from MySQL:

# Query_time: 0.000042  Lock_time: 0.000000 Rows_sent: 1  Rows_examined: 0

Here is how it looks in Loggly’s expanded events view. The syslog header is also included.
mysql_numerics

Nginx

Loggly works out of the box with the standard nginx format without compression.

log_format nginx '$remote_addr - $remote_user [$time_local] '
                       '"$request" $status $bytes_sent '
                       '"$http_referer" "$http_user_agent"';

access_log /spool/logs/nginx-access.log nginx buffer=32k;

For custom formats, nginx has a similar log format as Apache but there are some differences. In particular, they use words instead of letters in the configuration, and the request time is in microseconds in Apache and milliseconds in Nginx. If you don’t see a format you need here you can also send JSON (see Apache section for an example).
Currently supported Nginx custom formats are:

'$remote_addr - $remote_user [$time_local] "$request" $status $bytes_sent "$http_referer" "$http_user_agent"';

'$remote_addr - $remote_user [$time_local] "$request" $status $bytes_sent';

'$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" - $request_time X-Forwarded-For=$http_x_forwarded_for Host=$host';

'$remote_addr - $remote_user [$time_local] $request_time "$request" $status $bytes_sent "$http_referer" "$http_user_agent"';

'$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"'

'$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" - $request_time X-Forwarded-For=$http_x_forwarded_for Host=$host $request_id';

 

See one of the output below.

Nginx-latest

Node JS

We automatically parse Node JS exceptions with both exception fields and other JSON fields.
nodejs_exceptions

PAM

Loggly supports automated parsing for PAM logs. We parse out the timestamp, host name, application name, user and section action from PAM logs sent via syslog. Here is an example log line from PAM:

pam_unix(cron:session): session opened for user ubuntu by (uid=0)

Here is how it looks in Loggly’s expanded events view. The syslog header is also included.

pam_output

PHP

Loggly supports automated parsing for PHP logs. We parse out the method, level, timestamp and message from PHP logs sent via syslog. Here is an example log line from PHP:

PHP Warning: PHP Startup: Unable to load dynamic library '/usr/lib/php5/20090626/msql.so' - /usr/lib/php5/20090626/msql.so: cannot open shared object file: No such file or directory in Unknown on line 0

Here is how it looks in Loggly’s expanded events view. The syslog header is also included.

php_logs

We also automatically parse PHP exceptions and extract exception message and stacktrace. See sample below.
Php_Exception

Python

We automatically parse Python and Django exceptions with both syslog and exception fields
python

Rails

We parse out the process, output format and method from Rails logs sent via syslog. Here is an example log line from Rails:

Processing by PagesController#npe as HTML

Here is how it looks in Loggly’s expanded events view. The syslog header is also included.
railsCompleteParse.png
Also, we automatically parse Rails exception logs and show backtrace in a specific field.
Rails

Syslog

logType_syslog
If you use syslog, we can guarantee that your syslog headers and structured data will be extracted, independently of the actual message contents, provided you use our standard syslog configuration. These configs are based on the Syslog Protocol RFC5424, and we will extract the following fields (using RFC names):

PRIVAL, Facility, Severity, VERSION, TIMESTAMP, HOSTNAME, APP-NAME, PROCID, MSGID

Our standard config also allows you to define tags in the STRUCTURED-DATA part of the message. As an example, if your structured data looks like this:

[01234567-89ab-cdef-0123-456789abcdef@41058 tag="foo" tag=bah]

we will extract “foo” and “bah” as tags, which you can then use to refine searches using tag:foo for example, and which you can also use in the Trends tab.

Notice that we support unquoted values, even though the RFC says these values must be quoted.

What if my log type isn’t supported?

If you’d like to request a log type, please submit a request.

Automated Parsing

Thanks for the feedback! We'll use it to improve our support documentation.


Top