Writing Amazon ELB Access Logs to Loggly Using AWS Lambda
Update to this blog post July 2017:
Loggly can automatically retrieve new log files added to your S3 bucket(s). This includes Amazon Elastic Load Balancer (ELB) access logs. You can read more here.
Problem: Send Amazon ELB Logs to Loggly
Recently, we started using Loggly to manage logs coming from our Amazon Web Services infrastructure. One of the reasons we decided to use Loggly was its ability to ingest JSON formatted logs via an easy-to-use RESTful interface. This capability proved especially useful when we were looking for a way to push Amazon Elastic Load Balancer (ELB) access logs into Loggly.
Most of our production servers are set up to log directly to Loggly. But we had a few legacy Windows servers sitting behind an Amazon ELB handling RESTful API requests. We wanted to log the requests hitting these servers along with a few other details such as HTTP response and client IP address. Rather than configure Loggly on each legacy server, we wanted to use the access logs generated by the Elastic Load Balancer.
Unfortunately, there is no way to configure the ELB to point to anything other than S3, and there is no way to configure Loggly to directly ingest the ELB access logs. We also did not want to run any additional EC2 instances for the sole purpose of pushing logs from S3 to Loggly. What we did want was a standalone service to push the ELB logs from S3 to Loggly with as little touch as possible.
Enter AWS Lambda
I decided to solve this problem using a new Amazon Web Service called Lambda. AWS Lambda is a compute service that runs your Node.js code in response to events, such as a new object being written to S3. You can learn more about AWS Lambda here.
AWS Lambda, along with Loggly’s RESTful API, turned out to be a simple and trouble-free way to continuously push our ELB logs into Loggly. The Lambda code I wrote to do this is called elb2loggly and can be found on GitHub.
(Side note: I was happy to have a real-world problem to solve while trying out AWS Lambda.)
How It Works
Using elb2loggly to push ELB logs from S3 is pretty easy to set up if you are familiar with AWS. At a high level, you follow these two steps:
- Configure the AWS ELBs to log to an S3 bucket. I configured them to push the logs using a five-minute interval. This minimizes the time it takes for the logs to show up in Loggly and also reduces the number of logs that need to be pushed with each Lambda invocation.
- Upload the elb2loggly Lambda code to AWS and configure it to be called when objects are placed in the S3 bucket. Once things are set up, the elb2loggly Lambda code will be notified each time a new log is written to S3. It will fetch that log file, convert it to JSON, and push it to Loggly.
The Node.js Lambda script (elb2loggly.js) is invoked whenever a new object is placed in the S3 bucket being watched. The name of the S3 object is passed into the Lambda script in an event object. The script fetches the log file from S3 and transforms each entry from its CSV format into a JSON format.
During the transformation process, I found that I had to do further manipulation on two fields in the ELB access logs. For some reason, Amazon decided not to split the client IP and client port into their own columns of the CSV file and also decided to put both the URL request method and URL into one quoted string. All of the other columns names map directly from the CSV file to a JSON field of the same name.
Finally, elb2loggly pushes the newly formatted JSON logs to Loggy using the bulk RESTful endpoint: https://logs-01.loggly.com/bulk/. The Loggly TOKEN that is used in this upload request is configured by adding a tag to the S3 bucket that contains the ELB access logs. The name of the tag needs to be “loggly-customer-token,” with the value as your Loggly customer token. You can also optionally set a tag called “loggly-tag” on the S3 bucket if you want your logs to be tagged in Loggly (e.g. “Production” or “Staging”.)
One thing to keep in mind with this solution is that the Lambda function processes each log entry line by line. A very busy site could generate more entries than the Lambda function can process. You may need to keep an eye on your CloudWatch logs and tune the Lambda memory or timeouts settings to handle the size.
Enjoy My Script!
I hope that you can make good use of my script for your own ELB access logs. The script is on GitHub. If you would like to make any improvements, please send me a pull request and I will incorporate them.
You’ll also find information in the Loggly documentation.
Chris Boscolo is Vice President of Product Development at Optimum Energy. In this role, he doesn’t often have the opportunity to write software but occasionally finds projects that help his team operate Optimum Energy’s OptiCx Platform, a cloud-based platform that delivers energy optimization and management integrating on-site operational modules, machine learning, web and mobile apps.