Support Using Loggly Automated Parsing Log Types

Automated Parsing Log Types

Loggly will automatically parse many types of data for you including Apache, Nginx, JSON, and more. This allows you to use advanced features like statistical analysis on value fields, faceted search, filters, and more. Even if we don’t have automated parsing available for your log type, you will still be able to log and do full text search over your logs. As you’re searching through your data, you’ll probably notice that we’ve added a field called “logType” to your data. If we couldn’t classify and break down your data into separate fields, all searches would be full text, which would not allow you to take full advantage of Loggly search. In this section we’ll break down the topic of log types and tell you what they are, what we support, and what to do if you send us logs in a currently unsupported format.

Recognized Log Formats

Before we get started we should note that we won’t reinvent the wheel here. Most generally recognized log formats have extensive documentation of their own. When appropriate we will point you to these resources for in-depth information about a specific log format. Our purpose here is to get you familiar for how we think about logs.

In addition, we are constantly working to integrate more supported log types into our ecosystem. In fact, if you send us a log format we don’t currently recognize, we’ll want you to tell us about it.

Some log events may actually be classified as more than one log type. For example, if Apache logs are sent over syslog, they’ll show up as logtype apache and syslog. We recognize the following classifications of log formats:

Apache

logType_apache

If you send us apache or nginx logs, we will extract the following standard apache log variables, as defined in the mod_log_config documentation:

%a - RemoteIPOrHost
%A - LocalIPOrHost
%b or %B - Size
%D - RequestUs (microseconds)
%h RemoteIPOrHost
%k - KeepAliveRequests
%l - RemoteLogname
%r - Request
%>s - HttpStatusCode
%t - eventTime
%T - ServiceTimeSeconds
%u - RemoteUser
%U - UrlPath
%v VirtualHost
%X - ConnectionStatus
%{Referer}i - Referer
%{User-agent}i - UserAgent
%{UNIQUE_ID}e - UniqueId
%{X-Forwarded-For}i - XForwardedFor
%{Host}i - Host

We have a number of pre-canned formats that we can guarantee will work, but because the apache log format may be defined however you like, your particular format may not match any of these formats. The formats we do support are:

%h %l %u %t %D "%r" %>s %b "%{Referer}i" "%{User-Agent}i"
%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"
%h %l %u %t "%r" %>s %b
%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %v %h %T %A %>s %T
%h %l %u %t "%r" %b "%{Referer}i" "%{User-Agent}i" %v %h %T %A %>s %T
%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"
%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %{UNIQUE_ID}e %D X-Forwarded-For=%{X-Forwarded-For}i Host=%{Host}i
%{Host}i/%v "%{X-Forwarded-For}i/%h" %t "%r" %>s %b %D %X %k "%{Referer}i" "%{User-Agent}i"
%v:%p %h %l %u %t "%r" %>s %D "%{Referer}i" "%{User-Agent}i"
%h %l %u %t "%r" %>s %D "%{Referer}i" "%{User-Agent}i"
%h %l %u %t "%r" %>s %D

We also support JSON logging from Apache. This might be a good option if we don’t support a field you need because you can log any fields in any order. Here is an example for the default Apache format in JSON.

LogFormat "{ "time":"%t", "remoteIP":"%a", "host":"%V", "request":"%U", "query":"%q", "method":"%m", "status":"%>s", "userAgent":"%{User-agent}i", "referer":"%{Referer}i" }”

Amazon CloudFront

We automatically parse different types of AWS CloudFront logs. We will extract the standard CloudFront variables, as defined in the CloudFront Access logs documentation

We parse the following types of Cloudfront logs.

date-time,x-edge-location,sc-bytes,c-ip,cs-method,cs(Host),cs-uri-stem,sc-status,cs(Referer),cs(User-Agent),cs-uri-query,cs(Cookie),x-edge-result-type,x-edge-request-id,x-host-header,cs-protocol,cs-bytes,time-taken,x-forwarded-for,ssl-protocol,ssl-cipher,x-edge-response-result-type
cloudfront
date-time,x-edge-location,c-ip,x-event,sc-bytes,x-cf-status,x-cf-client-id,cs-uri-stem,cs-uri-query,c-referrer,x-page-url,c-user-agent,x-sname,x-sname-query,x-file-ext,x-sid

Amazon ELB

We automatically parse different types of AWS ELB Access logs. We will extract the standard ELB variables, as defined in the ELB Access logs documentation

timestamp, elb_name, client_ip, client_port, backend_ip, backend_port, request_processing_time, backend_processing_time, response_processing_time, elb_status_code, backend_status_code, received_bytes, sent_bytes, request, user_agent, ssl_cipher, ssl_protocol
elb

Custom Parsing

There are many existing tools to translate custom log formats into JSON and then send them to Loggly. Here are instructions for popular ones:

Http Headers

Loggly automatically parses out the Http headers from the Http request. They can be viewed as the Http fields in the field explorer. Here is an example of a Http request. We will parse out the headers content-type as contentType and X-Forwarded-For as clientHost from the following request.

curl -H "content-type:text/plain" -H "X-Forwarded-For:203.156.135.1" -d "Hello4" http://logs-01.loggly.com/inputs/TOKEN/tag/http/

Here is how it looks in Loggly’s expanded events view.

Http_Parsing

Java

Log4j:
We automatically parse Java logs that follow this conversion pattern layout in Log4j. We extract the timestamp, method, fully classified class name, thread, and log level. We have instructions on how to configure Log4j with Loggly. Field definitions can be found in Log4j documentation. The first field in the conversion pattern is the syslog appname, in this case it’s set to java. If you are using an HTTP appender, you do not need to add an appname.

java %d{"ISO8601"}{GMT} %p %t %c %M - %m%n

Logback:
We also automatically parse Java logs that follow this conversion pattern layout. We extract the timestamp, method, fully classified class name, thread, and log level. We have instructions on how to configure Logback with Loggly. Field definitions can be found in Logback documentation.

%d{"ISO8601", UTC}  %p %t %c %M - %m%n

Here is an example of what the Java logtype looks like in Loggly’s expanded event view. You can see each of the Java fields as well as the unparsed message.

java_logback

We automatically parse Java Logback exceptions and show message and stacktrace in individual fields.

Servlets:

If you send us Log4J logs from a Servlet which are of the format –

<Timestamp> <Log4J Priority> <Category> - <msg_id> <status> <responseTimeMs>

Example:

23:51:49 INFO  com.cx.Core2Common.servlet.PostLogHandler  - 6oA4sQHUQYiAOLEB1KGIEg: RESPONSE 200 661

Java stack trace:

We can extract class name and file line number for a partial java stack trace. We also support multi-line stack traces through our syslog collectors if they are sent in a single event. However, the default SyslogAppender splits them into multiple events so you’d need to override it and send them in a single event.

Example:

at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:621)

Java Garbage Collector:
We will parse out the heap size and other statistics from java garbage collector logs. You can read more in our “How-to: Analyzing JVM garbage collection using Loggly”.

Here is an example log line:

2014-05-21 22:51:07.967 UTC 2014-05-21T22:51:07.967+0000: 763119.351: [GC [PSYoungGen: 1383698K->8412K(1387072K)] 2422839K->1047651K(4183296K), 0.0237820 secs] [Times: user=0.14 sys=0.00, real=0.02 secs]

Here is what it looks like in Loggly’s expanded events view. The syslog header will only be included if you send over syslog.

Screen Shot 2014-05-22 at 2.53.14 PM

JSON

We *highly* recommend sending your data in JSON format. It’s going to give you the highest precision of event parsing, which in turn gives you access to all of the cool features that we provide in our analytics toolset.

If you send us JSON, we will extract it provided it is valid JSON and it’s the final part of the message you send. This means you can send us serialized JSON using your normal logging frameworks, and we will find the JSON no matter what precedes it in the event. For example, if you use log4j, your standard log4j headers will precede the JSON in your logged message, but we’ll still be able to extract the JSON. If you’re logging JSON directly to us, then it is (by definition) the final part of the message you send, so all will be well.

To ensure that your JSON is valid, we recommend running a few of your events through a JSON validation tool, e.g., JSON Lint

JSON Timestamp Support

In order for us to recognize the timestamp that you send in with the event, please follow these guidelines:

  • The only timestamp format accepted is ISO 8601 (e.g., 2013-10-11T22:14:15.003Z)
  • We support microseconds/seconds fraction up to 6 digits, per the spec in RFC5424.
  • Must be a top level JSON field called either: “Timestamp”, “timestamp”, or “eventTime”.

Here’s an example of a top-level JSON field, called timestamp. It is not a child of another object, and it has no children:

{ 
"timestamp": "2013-10-11T22:14:15.003123Z",
"travel": {
        "airplane": "jumbo",
        "mileage": 2034
    }
}

JSON Schema

Be aware that even JSON needs to follow a schema when it’s indexed. For example, if an object includes a field that is typed as an integer, it can’t be retyped. Here’s an example of an object called “travel”:

{ 
"travel": {
        "airplane": "jumbo",
        "mileage": 2034
    }
}

By retyping “travel” as a string, shown below, the field will not be indexed:

{
"travel": "none"
}

JSON Field Names

We will index the JSON you send exactly as you send it with one exception: names that contain spaces or dots will be rewritten with those characters replaced by underscores.

This change is required because our search infrastructure doesn’t support either of these characters in field names.

An example:

{ 
"a": 1, 
"b c": 2, 
"d.e": 3, 
"f": {
     "g.h": "4, 
     "g": {
         "h": 5
          }
      }
}

will be rewritten to:

{ 
"a": 1, 
"b_c": 2, 
"d_e": 3, 
"f":{
    "g_h": 4, 
    "g": {
        "h": 5
         }
    }
}

One of the main reasons we do this is to ensure that dot-notation navigation is unambiguous. In the example above, because we changed the name “g.h” to “g_h”, we can now unambiguously distinguish between f.g_h and f.g.h.

A search for JSON data will look like this:

json.f.h:5

Linux System

We parse some system commands like cron jobs and extract the command string, user, and command executable name.

Here is an example log line:

2014-05-21 23:02:01.000 UTC (jon) CMD (/home/jon/sarmon.sh > /dev/null 2>&1)

Here is how it looks in Loggly’s expanded events view. The syslog header will only be included if you send events over syslog.

Screen Shot 2014-05-22 at 3.04.44 PM

MySQL

Loggly supports automated parsing for MySQL logs. We parse out the rowsExamined, lockTime, rowsSent and queryTime from MySQL logs sent via syslog.

Here is how it looks in Loggly’s expanded events view. The syslog header is also included.
mysql_parsing

Nginx

Loggly works out of the box with the standard nginx format without compression. Please make sure the appname contains the ‘nginx’ word otherwise it may get incorrectly recognized as other logtype.

log_format nginx '$remote_addr - $remote_user [$time_local] '
                       '"$request" $status $bytes_sent '
                       '"$http_referer" "$http_user_agent"';

access_log /spool/logs/nginx-access.log nginx buffer=32k;

For custom formats, Nginx has a similar log format as Apache but there are some differences. In particular, Nginx uses words instead of letters in the configuration, and the request time is in microseconds in Apache and milliseconds in Nginx. If you don’t see a format you need here you can also send JSON (see Apache section for an example).
Currently supported Nginx custom formats are:

'$remote_addr - $remote_user [$time_local] "$request" $status $bytes_sent "$http_referer" "$http_user_agent"';

'$remote_addr - $remote_user [$time_local] "$request" $status $bytes_sent';

'$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" - $request_time X-Forwarded-For=$http_x_forwarded_for Host=$host';

'$remote_addr - $remote_user [$time_local] $request_time "$request" $status $bytes_sent "$http_referer" "$http_user_agent"';

'$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"'

'$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" - $request_time X-Forwarded-For=$http_x_forwarded_for Host=$host $request_id';

 

See one of the outputs below.

nginx_parsing

Node JS

We automatically parse Node JS data including exceptions and other JSON fields.
nodejs_exceptions

PAM

Loggly supports automated parsing for PAM logs. We parse out the timestamp, host name, application name, user and section action from PAM logs sent via syslog. Here is an example log line from PAM:

pam_unix(cron:session): session opened for user ubuntu by (uid=0)

PHP

Loggly supports automated parsing for PHP logs. We parse out the method, level, timestamp and message from PHP logs sent via syslog. Here is how it looks in Loggly’s expanded events view. The syslog header is also included.

php_logs

We also automatically parse PHP exceptions and extract exception messages and stacktraces.

Rails

We parse out the process, output format and method from Rails logs sent via syslog.

Also, we automatically parse Rails exception logs and show backtrace in a specific field.

Syslog

logType_syslog
If you use syslog, we can guarantee that your syslog headers and structured data will be extracted, independently of the actual message contents, provided you use our standard syslog configuration. These configs are based on the Syslog Protocol RFC5424, and we will extract the following fields (using RFC names):

PRIVAL, Facility, Severity, VERSION, TIMESTAMP, HOSTNAME, APP-NAME, PROCID, MSGID

Our standard config also allows you to define tags in the STRUCTURED-DATA part of the message. As an example, if your structured data looks like this:

[01234567-89ab-cdef-0123-456789abcdef@41058 tag="foo" tag=bah]

we will extract “foo” and “bah” as tags, which you can then use to refine searches using tag:foo for example, and which you can also use in the Charts tab.

Note that we support unquoted values, even though the RFC says these values must be quoted.

Windows

We can auto parse the fields for the Windows Event Log. We have added support for the native key-value pair format that nxlog creates when it sends Windows logs.

For example, consider this event:

<13>1 2017-10-25T20:24:47.651000+01:00 i-0928d5725cbe8c59a IIS - - [45043289-61f2-4a42-bf27-d366041b1668@41058 tag="windows" tag="ami-2662935f" tag="HostingWebserverMaster"] [ EventReceivedTime="2017-10-25 20:24:48" SourceModuleName="iis_advanced" SourceModuleType="im_file" Date="2017-10-25" Time="19:24:47.651" ServerIp="10.37.0.132" Host="stdunstans.fireflycloud.net" Url="/Templates/pixel.gif" Path="C:IIS Sitesstdunstans.fireflycloud.netwwwTemplatespixel.gif" Status="200" TimeTakenInMS="0" UserAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Edge/15.15063" ClientIP="90.194.93.48" Referer="http://stdunstans.fireflycloud.net/set-tasks/8943" ELBIP="10.81.168.76" Win32Status="0" Method="HEAD"] {"EventReceivedTime":"2017-10-25 20:24:48","SourceModuleName":"iis_advanced","SourceModuleType":"im_file","Date":"2017-10-25","Time":"19:24:47.651","ServerIp":"10.37.0.132","Host":"stdunstans.fireflycloud.net","Url":"/Templates/pixel.gif","Query":null,"Path":"C:IIS Sitesstdunstans.fireflycloud.netwwwTemplatespixel.gif","Status":200,"TimeTakenInMS":0,"UserAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Edge/15.15063","ClientIP":"90.194.93.48","Referer":"http://stdunstans.fireflycloud.net/set-tasks/8943","ELBIP":"10.81.168.76","Win32Status":0,"Method":"HEAD","EventTime":"2017-10-25 20:24:47","SourceName":"IIS"}

This is what a Parsed Event will look like:

  "windows":{  
    "Path":"C:IIS Sitesstdunstans.fireflycloud.netwwwTemplatespixel.gif",
    "Status":"200",
    "SourceModuleType":"im_file",
    "TimeTakenInMS":"0",
    "ServerIp":"10.37.0.132",
    "Referer":"http://stdunstans.fireflycloud.net/set-tasks/8943",
    "Time":"19:24:47.651",
    "Host":"stdunstans.fireflycloud.net",
    "Win32Status":"0",
    "Method":"HEAD",
    "ClientIP":"90.194.93.48",
    "ELBIP":"10.81.168.76",
    "EventReceivedTime":"2017-10-25 20:24:48",
    "Date":"2017-10-25",
    "Url":"/Templates/pixel.gif",
    "SourceModuleName":"iis_advanced",
    "UserAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Edge/15.15063"
  }

Python

We automatically parse the fields when you send us Python logs. We extract the errorType, exceptionMessage, message and stacktrace. Consider the following example:

list index out of range

Traceback (most recent call last):

File "/root/jatin/harshil/views.py", line 28, in index

A[5]

IndexError: list index out of range
 Booking exception: City: Bora Bora, French Polynesia Hotel: Le Meridien

Traceback (most recent call last):

File "/root/jatin/harshil/views.py", line 16, in index

raise KeyError

KeyError

The first example will be parsed like:

  • message -> list index out of range
  • stacktrace -> File “/root/jatin/harshil/views.py”, line 28, in index#012 a5
  • errorType -> IndexError
  • exceptionMessage -> list index out of range

 

The second example will be parsed like:

  • message -> Booking exception: City: Bora Bora, French Polynesia Hotel: Le Meridien
  • stacktrace -> File “/root/jatin/harshil/views.py”, line 16, in index#012 raise KeyError
  • errorType -> KeyError
  • exceptionMessage -> N/A

 

MongoDB 3.x

We automatically parse the fields for MongoDB 3.x. We extract the the timestamp, severity, component, module, command, commandDetails, and severity. Consider the following example:

2017-09-13T09:52:43.207+0000 I COMMAND  [conn21] command test.$cmd command: delete { delete: "customers", deletes: [ { q: { first_name: "test8" }, limit: 0.0 } ], ordered: true } keyUpdates:0 writeConflicts:0 numYields:0 reslen:25 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { w: 1 } } } protocol:op_command 0ms

2017-09-27T11:41:13.299+0530 I NETWORK  [initandlisten] waiting for connections on port 27017

Under the current rules, the above examples will match the following:

  • timestamp -> 2017-09-13T09:52:43.207+0000, severity -> I, component -> COMMAND, module -> conn21, __command -> delete, commandDetails -> { delete: “customers”, deletes: [ { q: { first_name: “test8” }, limit: 0.0 } ], ordered: true } keyUpdates:0 writeConflicts:0 numYields:0 reslen:25 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { w: 1 }\ }, Collection: { acquireCount: { w: 1 } } }
  • timestamp -> 2017-09-27T11:41:13.299+0530, severity -> I, component -> NETWORK, module -> initandlisten

Heroku

We support Heroku events through an HTTP drain. Matches will occur when the syslog.appName field is equal to “heroku”.

  • If the syslog.procId field is “router” then it will be identified as a router event and the key-value pairs will be extracted (see example below).
  • If the syslog.procId field is a dyno name then the event will be seen as a heroku log type.
350 <158>1 2017-09-27T23:37:27.018826+00:00 host heroku router - at=info method=GET path="/assets/analytics/home-analytics-36cbb0bbd4caddefda4ac54f186b083d3e4e732e1a6d4d35ff42dd26fbacab86.js" host=www.bookwitty.com request_id=9b5368cd-0909-4da8-8bec-a055fd48fd97 fwd="90.214.230.12" dyno=web.1 connect=1ms service=3ms status=200 bytes=585 protocol=https

101 <45>1 2017-09-27T23:37:27.018826+00:00 host app web.1 - Stopping remaining processes with SIGKILL

Router Example

  • logtype -> syslog, heroku
  • syslog.appName -> heroku
  • syslog.procid -> router
  • heroku
  • dyno -> router
  • source -> heroku
  • heroku fields -> Key/value pairs are extracted from the example above

Generic Example

  • logtype -> syslog, heroku
  • syslog.appName -> app
  • syslog.procId -> web.1
  • heroku
  • dyno -> web.1
  • source -> app

 

What if my log type isn’t supported?

If you’d like to request a log type, please submit a request.

Automated Parsing
Thanks for the feedback! We'll use it to improve our support documentation.