Bringing Stream Filtering into Loggly with Fluentd

 

Kiyoto Tamura is Director of Developer Relations at Treasure Data, where he focuses on open source initiatives. Follow him on twitter: @kiyototamura

Introduction

There are many ways to filter your log events before you send them to Loggly, including rsyslog and Fluentd. In this post, I will talk about how to do filtering using Fluentd, which is an open source data collector.

What Fluentd Does

Fluentd is designed to solve what is often called the “MxN” problem: With M data sources and N data outputs, how can you manage the complexity of routing and processing data? What if a particular data stream needs to be filtered while another is enriched? What if you want to filter your logs before you ship them to a log management platform like Loggly?

fluentd-website-banner

Fluentd’s solution is its plugin architecture, which provides the interfaces to add a custom inputs and outputs so that ops and developers can customize Fluentd to meet their own needs. Fluentd has more than 300 plugins today, making it very versatile. From sending data to Amazon Web Services to collecting Docker container metrics, the user’s imagination and a bit of scripting is the limit.

Loggly, being a leading cloud-based log management platform, is one of the popular destinations for log data, and the community has contributed a plugin to stream data into Loggly.

In subsequent sections of this post, I assume that you already have a Loggly account (go here for a free trial otherwise) and have access to OSX or Linux machines (The example I give will be on Ubuntu Precise, but Fluentd works on Debian, CentOS/RHEL, and OSX).

Filtering Use Case: Send Only Errors from Web Server Access Logs

Web server access logs are a great source of operational intelligence. If your website is experiencing an issue, searching for Internal Server Errors in access.log in order to nail down the problematic URLs is a good first step.

Fluentd with Loggly

If you have a large site with hundreds of gigabytes per day or more, it can be more efficient to only send the errors to your log management service. Errors are likely a small percentage of your overall volume, they are often what you care most about.  We’ll show you how to configure Fluentd with a filter to only send errors in the 4xx-5xx range.

Installing Fluentd

We are using td-agent, the Fluentd package built and maintained by Treasure Data. You can see this page for various download options. Here is the installation command for Ubuntu Precise.

curl -L http://toolbelt.treasuredata.com/sh/install-ubuntu-precise.sh | sh

Installing the Loggly Output Plugin

Next, let’s install the output plugin for Loggly with the following command.

 sudo /usr/lib/fluent/ruby/bin/fluent-gem install fluent-plugin-loggly

 Configuring Fluentd

Finally, let’s configure Fluentd to send Nginx access logs to Loggly (Apache httpd is also supported). Just replace the “path” with “/var/log/apache2/access.log” and “format apache2”)

Open “/etc/td-agent/td-agent.conf” and change the configuration to the following (td-agent.conf contains some boilerplates. Feel free to study them, but the configuration below should work as-is).

<source>
type tail
format nginx
path /var/log/nginx/access.log
pos_file /etc/td-agent/nginx_access.pos
tag unfiltered.access
</source>

<match unfiltered.access>
type rewrite_tag_filter
rewriterule1 code ^4\d\d$ access.4xx
rewriterule2 code ^5\d\d$ access.5xx
</match>
<match access.{4xx,5xx}>
type loggly
loggly_url https://logs-01.loggly.com/inputs/xxx-xxxx-xxxx-xxxxx-xxxxxxxxxx # Replace xxx with your own token.
flush_interval 10s # Upload events every 10 seconds. Configurable.
</match>

Here is a quick walkthrough of the configuration:

  1. The first <source>…</source> block tells Fluentd to parse each line of Nginx access log as a JSON record and route it with the tag “unfiltered.access”.
  2. The <match unfiltered.access>…</match> block tells Fluentd to match the events with the “unfiltered.access” tag, and if the “code” field is of the form “4xx” or “5xx”, it re-routes the data with the new tags access.
  3. The last <match> block sends events with the tags access.4xx or access.5xx to Loggly.

There is one last step before starting Fluentd: we must update the permission of  /var/log/nginx/access.log:

 sudo chmod 755 /var/log/nginx/access.log

Finally, start Fluentd

 sudo /etc/init.d/td-agent start

Analyzing the Data in Loggly

Loggly is able to automatically parse and analyze your Apache logs. This helps you quickly discover what is causing your errors and then fix them. You can see your logs on Loggly’s search screen: go to https://<your_loggly_name>.loggly.com/search.  Enter in a search for logtype:apache and you should see just the log lines with 4xx and 5xx HTTP status codes. You can use Loggly’s trend analysis on these logs and create dashboard widgets that, for example, display your top errors.

Here are examples of trend charts you can create. You can see the percentage breakdown of status codes in a pie chart. You can see the top IPs that are generating these errors, what user agents they are coming from, average response times, and more.

Loggly-JSON-ApacheErrors

Fluentd can do a wide range of data aggregation and filtering and supports many protocols. Learn more about it on its website or check out its Github repo.

 


Share Your Thoughts

Top