A few week’s ago I wrote up how to implement simple alerting with Loggly and PagerDuty. This week I’m covering how to do something very similar with the new version of Amazon’s CloudWatch
which they recently released.
Amazon doesn’t rely on a monitoring agent to collect the metrics for CW, so it’s literally a few clicks in the AWS interface to start using it. Data is collected by their pre-instrumented hypervisor and then forward to the CW service where it can be selected, displayed and alerted on by the user.
With the latest release of CW, Amazon provides new endpoints in the CW API which allow an user to send in custom metrics. These metrics can be used in combination with the hypervisor based metrics to build complex alerts and drive auto-scalability for applications based on EC2.
It’s that new functionality that I’ll be using to send data from Loggly to CloudWatch.
As always, the code for this post is parked on Loggly’s Github account. The cloudwatch.py file contains the signing bits required for talking to Amazon’s API endpoints, and some basic code for posting to the PutMetricDatamethod. You don’t need the boto library for this, but it won’t hurt if you already have it installed.
The detailed instructions for setting all up are on the Github project page. Basically all you need to do to get this running is to get syslog-ng forwarding your web logs to Loggly, configure your Loggly credentials, and then enter your AWS_ACCESS_KEY_ID and AWS_PRIVATE_ACCESS_KEY_ID in the code.
You’ll need a few cheese shop libraries installed, including httplib2, simplejson and hoover, the Loggly Python library.
import cloudwatch import httplib2, simplejson import hoover from hoover import utils # init our connection to cloudwatch cw = cloudwatch.connection('AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY_ID') # init our connection to loggly hoover.authorize('geekceo', 'kordless', 'password') # cloudwatch namespace namespace = 'Loggly' # get back the number of events for the default input geekceo = hoover.utils.get_input_by_name('default') num_results = geekceo.facets(q='*', starttime='NOW-6MINUTES', endtime='NOW-1MINUTE', buckets=1)['data'].items() # push it to cloudwatch cw.putData(namespace, "WebEventCount", num_results)
Set up a cronjob file that runs it periodically, preferably on an instance you are monitoring.
*/5 * * * * python ~/loggly-watch/main.py
The code above conducts a simple search on Loggly for all events being sent to the default input for your account. If all you are sending to that input is combined_access formatted log lines, then you’ll end up with hit counts sampled every 5 minutes from Loggly, offset by one minute to ensure we’ve indexed them properly.
The result is pretty impressive, with so little work involved. You can even do combo graphs containing metrics delivered by the AWS hypervisor.
Once the metrics are flowing in, you can set alarms to trigger if they go over (or under) a certain threshold. In the screenshot below I’m monitoring for the term ‘exception’ coming in from my crappy blog which is hosted on AppEngine and which logs with my AppEngine async logging library.
The screenshot above shows where CW triggered an alarm for exceptions, then cleared itself after the threshold dropped below 4.
Monitor the Monitor
With Loggly and CloudWatch alerting, there are a whole host of monitoring and correlation use cases you can tackle with just a little bit of hacking. You can even alarm on the cronjob itself to ensure your monitoring is functioning and healthy. Here’s how.
Start by making sure your local syslog instance is sending data to Loggly , and then change your cronjob to pipe it’s output to logger:
*/5 * * * * python /home/kord/code/loggly-watch/main.py 2>&1 | logger -t cloudwatch-cron
Next, set up a search in the same main.py file you are calling with cron to search for a successful run of the cronjob that runs the search (that’s so meta it hurts):
# check to ensure the cronjob has completed successfully geekceo = hoover.utils.get_input_by_name('titus') num_results = geekceo.facets(q='cloudwatch finished run', starttime='NOW-6MINUTES', endtime='NOW-1MINUTE', buckets=1)['data'].items() # push it to cloudwatch cw.putData("Loggly", "CronMonitor", num_results) print 'cloudwatch finished run'
Note: I’m keeping this example purposefully simple. In practice you’ll probably want to make this check little more sophisticated by ensuring the response from the Loggly server is valid or not, and that each search ran successfully.
Finally then create an alarm such that it triggers if the results number less than 1 over a 10 minute period.