Twitter for Computers: Using Loggly for Micrologging Machine Data

 

Micrologging, explained.

Trying to explain what Loggly does to non-technical people is always a bit of a challenge, but one of the analogies that seems to work pretty well is to say the Loggly is “Twitter for computers”. The analogy isn’t perfect, but the idea that we are a central clearing house for “tweets” from computers is pretty easy to understand. Obviously, we do quite a bit more with this data than just let you search it (as showcased with second generation use cases), so to (ab)use an old quote from Bill Gates “Content Is King”. And “small” content can be surprisingly useful, especially in real-time.

This post is all about “small content”, and how to use nothing more than standard *nix tools to generate it.

Since I mentioned Twitter, here is a tweet from last week…

logger `cat /proc/loadavg|awk '{split($4,p,"/");print"{\"load\":{\"1m\":"$1",\"5m\":"$2",\"15m\":"$3",\"run\":"p[1]",\"tot\":"p[2]"}}"}'`

Here is what that has morphed into in the last week, in my crontab:

# log load average every minute
* * * * * logger `awk '{split($4,p,"/");print{\"what\":\"loadavg\",\"1min\":"$1",\"5min\":"$2",\"15min\":"$3",\"run\":"p[1]",\"tot\":"p[2]"}}"}'</proc/loadavg`

For anyone familiar with cron, syslog, awk, and /proc/loadavg, this should be pretty self-explanatory. For those who are not:

  • cron runs the “logger” command every minute
    • logger writes whatever you send it to syslog
  • the back-tick’ed “awk” command reads /proc/loadavg and generates a JSON equivalent of that data
    • this JSON is the input to logger, so gets sent to syslog

Here is what “raw” /proc/loadavg looks like:

$ cat /proc/loadavg
0.00 0.01 0.05 1/69 30655

and here is what the generated output from this would look like:

{"what":"loadavg","1min":0.00,"5min":0.01,"15min":0.05,"run":1,"tot":69}

A nice, small, simple piece of JSON, packing a lot of useful information. Assuming my syslog is set up to forward to loggly, this one-liner lets me see how each of my systems are doing, at a very high level, for every minute they’re running.

 

Load average in the last hour

Load average in the last hour

 

Now, /proc is full of lots of other useful information, and this trick of croning that data into your favorite scripting language can make it really easy to get useful data with barely any effort.

Here are some more examples, to demonstrate how easy this is:

# log vmstat every minute
* * * * * logger `awk 'BEGIN{ORS="";print "{\"what\":\"vmstat\""}{print ",\""$1"\":" $2}END{print "}"}'</proc/vmstat`

# log meminfo every minute
* * * * * logger `awk 'BEGIN{ORS="";print "{\"what\":\"meminfo\""}{print ",\"" substr($1,1,length($1)-1) "\":" $2}END{print"}"}'</proc/meminfo`

# log net/dev every minute, one event per interface
* * * * * cat /proc/net/dev | awk '/:/{print "{\"what\":\"net/dev\",\"interface\":\""substr($1,1,length($1)-1)"\",\"rx\":{\"bytes\":"$2",\"pkts\":"$3",\"errs\":"$4",\"drop\":"$5"},\"tx\":{\"bytes\":"$10",\"pkts\":"$11",\"errs\":"$12",\"drop\":"$13"}} "}' | logger

Simply by adding these lines to my crontab, I now have a robust and lightweight way to get system information into my loggly account. Once there, I can graph it, search it, and compare it to the other logs I have for each system, to see whether (for example) an error or warning in one of my apps correlates with a blip in any of these system measurements.

You may have noticed that in each case, I added a “what” field in the generated JSON. This field lets me filter down to the specific type of data I’m interested in, just by adding (for example)

json.what:loadavg

to my query, or using the Filter UI in the Search page. Because I use the same field name (“what”) for each of these variants, I have a single mechanism that lets me slice and dice all of the data that is generated this way.

/proc is a goldmine for interesting data, but the beauty of *nix is that there are eleventy squillion little tools that you can use to see what is going on on your machines. For example:

# log df every minute, one line per filesystem
* * * * * df -k|awk '/\//{print "{\"what\":\"df\",\"filesystem\":\""$1"\",\"used\":"$3",\"avail\":"$4",\"used_pcnt\":"substr($5,1,length($5)-1)",\"mounted\":\""$6"\"}"}' | logger

By now, this pattern should look pretty familiar :-)

All of these things could, of course, be thrown into a shell script, and you could just cron that script. Either approach works, so use whichever you like.

There is one trap for young players with this approach that can mess things up pretty badly – The vmstat message is kind of big (a little over 2k bytes), and I had to increase the message size in my rsyslog.conf to make sure it was handled as a single event

# increase max message size
$MaxMessageSize 65000

If I was using syslog-ng, this would have looked like

# increase max message size
options { log_msg_size(65000); };

Finally, for those times when you just want to see something a little less serious in your logs…

# why so serious?
* * * * * logger {\"when\":`date +\%s`, \"fortune\":\"`/usr/games/fortune | tr '"' "'"`\"}
Telling your fortune with Loggly

Telling your fortune with Loggly


Share Your Thoughts

Top