A Logging Library for Django – How We Log at Loggly

In my last blog entry, I showed you how you can enable logging in Django 1.2. Now we are going to look at the logging library that we built for Loggly to simplify the task of logging in our own Django application, the Loggly Web interface.

Here is how we log from within our application:

That’s it. The above code creates the following log entry:

The logging call expects a dict of key-value pairs. This is to enforce key-value based log entries that make it easy for consumers to understand what a specific value means. Without the inclusion of a key, a value is more or less useless. In the example above, note that I only provided two keys: object, and action. However, the log entry contains a number of other data items. Those items are automatically added to the log entries by our logging library without burdening the developer to explicitly include them.

It is probably time to show you Loggly’s logging library:

Note that this is only an extract. Download the entire library if you want to use it in your own code. Here are some important things the code does:

  • line 17 to 29: This part of the code inspects the call stack to check whether there is an HTTP request object somewhere. The request object contains the username for the session and that is what we automatically extract . This frees the user from manually adding that information to the logging call. Automation is good!
  • line 26 and 27: We are using UNIQUE_IDs in Apache. In order to track a request from the Apache logs down into our application, we include that same ID into our Django logs. This is a huge win for associating Apache logs with our application logs.
  • line 32 to 24: All the dict entries are added as ‘key=value’ pairs to the log entry. So you can log any key you want.
  • line 39 to 47: These are the calls that you use in your code. Note that you can add a user field, which overwrites the username from the request. In some cases that is necessary and useful.

Let us know if you are using our library. I would love to hear back from you. I will post another blog entry later, where I will be talking about how to patch Django itself to do some more logging. We will be looking at how the authentication methods can be extended.

The links:
Django 1.2 Logging Patch
Loggly Logging Library

1 Comment

How to Enable Logging in Django 1.2

Django 1.2On Monday Django 1.2 was released. At Loggly, we highly anticipated this release. We have been running version 1.2 since the early Alpha releases. During the Alpha times, Simon Willison released a patch and proposal/ticket of how to enable logging in Django applications. It looked like the patch would get included in the main Django branch. Can you imagine how disappointed we were when the we realized that the patch didn’t make it into the final release?
Well, what we ended up doing was to patch Django with Simon’s patch. Took me a little while to update the patch to work with the final 1.2 release. If you want to use the patch, download it here. What you need to do then is patch your Django install with this. So, download the Django 1.2 tar ball and do the following:

This will include the patch into your Django distro and install it. As a next step, you configure your application for logging by adding a snippet similar to the following in your settings.py file:

From there you use the following code snippet to log out of your application:

More information about how to use the logging libraries you find in the Python documentation. It also seems like Django 1.3 will have logging built in. From what I can tell, it looks similar to Simon's approach, but it's not quite the same. I'll keep you posted here!

In my next blog post I will show you how we implemented a logging library for Loggly so that we can very easily log from anywhere within our application.

Django 1.2 Logging Patch

1 Comment

Suffering SaaSitash

Dave Rosenberg posted an opinion about cloud based logging yesterday on his Software, Interrupted blog. Dave starts out by mentioning Gartner predicted IT would spend more money on private cloud than public cloud through 2012. Here’s the exact quote from Gartner:

“Despite the economies of scale offered by public cloud providers, private cloud services will prevail for the foreseeable future while public cloud offerings mature, according to Gartner, Inc. Through 2012, IT organizations will spend more money on private cloud computing investments than on offerings from public cloud providers.”

This statement is a bit like NASA doing a press release announcing the moon is continuing to orbit the earth. Wow! The moon, still here next year? That’s awesome. Of course IT is going to spend more money on virutalization for the next few years. The success of the private cloud can be attributed to the fact virtualization has been around for a good while now, and is finally being pressed into mainstream use behind the firewall. Shoot, I think I was running Wine on some of my Linux boxes back in the mid-90s, which means virtualization has been commercialized for at least 15 years at the least. The idea of virtualizing an OS goes back well into the 60s. Come to think of it, so do I.

The public cloud, specifically IaaS and SaaS, is a grouping of emerging technologies. We’re just now starting to figure out how to wield it correctly for new business models. Poking holes in it at this point is simply rabble rousing by companies who’s business models are threatened by it and people who don’t understand it or have a use for it.

It’s a Complicance

Guy Churchward tries to make some good points in his talk with Dave, but at the end of the day, LogLogic is mainly an appliance vendor, and not only do they have big-time COGS to worry about, they also have to figure out how exactly a cloud customer is going to deploy their box on Amazon’s EC2 service. (Hint: They aren’t.) While you might be able to send logs back out of the cloud to an appliance behind the firewall, it’s unlikely to make economical sense to do so in the long term.

While there is a valid point in calling out cloud concerns, security itself is ALWAYS a concern, regardless of whether you run in the cloud or in your own datacenter. Frankly, with Loggly I’m likely better at storing and securing your logs than you are by yourself in your own data center, mostly due to the fact I’m under pressure by multiple people like you to provide a service which is expected at the outset to be secure. It’s no different than the pressure that Google has on them for securing your email, SalesForce for securing your leads, or Amazon securing your credit card info. We’re all culpable here for the security of your data.

Additionally, not all that cloudy data is created equal. A lot of the companies running in the cloud today are web based app companies, and the data they generate is often times very public in nature and not at all affected by compliance concerns. Do you think some user on Flickr cares if I stole all their comments? What about getting access to all those juicy tweets of mine? Oh wait, those are already in the Library of Congress. Nevermind, false alarm!

When IT Rains IT Pours

Log file data is already one of the largest sets of data on the planet. Logging alone in the public cloud is going to be absolutely staggering over the next few years. These trends are being driven by people switching to SaaS based applications, in turn who’s infrastructure either requires the elastic capabilities only the public cloud can provide, or who’s price point can’t be matched by private cloud offerings.

The elastic nature of these infrastructures means the logs which they generate need to be collected and stored in centralized location before the box that generated them disappears. There are many types of logs which are valuable to a company for understanding their business, and not so valuable for those data-thieving ruffians everyone keeps talking about.

While the security access data or net-flow information from public cloud vendors might alleviate the concerns of some consumers, I think there are much higher value adds to these offerings by being able to power availability and analytics services around a company’s application via a log file storage platform.

While the private cloud may continue to orbit peacefully for the next few years, the use of it for web based services will decay eventually, and it’ll be regulated to the more mundane stuff like storing my dental records and tracking my orders over on RadiatorBarn.com.

BTW, I’m still waiting on my radiator, Burton.

1 Comment

Visualizing your Data in the Cloud with Loggly and HighCharts

A short while into writing code for the Loggly interface we decided that we needed some eye candy. Given my background in visualization, I was keen on providing our users with an experience that helps them understand their data in an intuitive way.

Over the last few years I’ve been looking into a ton of visualization libraries for the Web. In the past, if you had asked me what library to use for generating charts on your Web site, I would have said, “Use Flash”. While there are a number of interesting Flash libraries out there, the landscape has shifted significantly in the last year. Everyone is moving to JavaScript. After some research, I opted to use a JavaScript charting library called HighCharts. I tried a bunch of other canvas-based libraries, but let me tell you without hesitation, HighCharts rocks.

I am going to show you how we are using HighCharts and how I implemented zooming to dynamically reload more event data on the fly. With any charting library, if you keep zooming in on a chart, it will not progressively load more detailed data. At detailed zoom levels you end up with a small range of data in your graph. Basically if you view a day’s data first, and then zoom into a specific minute, you would only see one data point.

To start, here’s the JavaScript I use to display a chart:

var parse_date = function(data) {
    var result = [];
    $.each(data, function(key, value) {
        var re = new RegExp(/(\d+)-(\d+)-(\d+)T(\d+):(\d+):(\d+)(?:\.(\d+))?/);
        var date = re.exec(key);
        if (date[7] == undefined) {date[7]=0;}
        var real_date = Date.UTC(date[1], parseInt(date[2])-1,date[3],date[4],date[5],date[6],date[7]);
        result.push([real_date, value]);
    });
    return result;
}

chart = new Highcharts.Chart({
    credits: { enabled: false },
    chart: {
        renderTo: 'activity',
        defaultSeriesType: 'area',
        margin: [10, 20, 40, 55],
        zoomType: "x",
            events: {
                selection: function(event) {
                    // change the time frame to be searched
                    var start = Highcharts.dateFormat('%Y-%m-%dT%H:%M:%SZ', event.xAxis[0].min);
                    var end = Highcharts.dateFormat('%Y-%m-%dT%H:%M:%SZ', event.xAxis[0].max);
                    $.ajax({ type: "GET", url: "http://subdomain.loggly.com/api/search/?" \
                        + "q=inputname:logglyapp&starttime="+start+"&endtime="+end \
                        + "&facets=True&buckets=24",
                        success: function(data) {
                             chart.xAxis[0].setExtremes();
                             chart.series[0].setData(parse_date(data));
                             // fix the reset zoom button
                             $('.highcharts-toolbar').click(resetZoom);
                        },
                        error: function(req, text, error) {
                            $("#err").html("Reload error!");
                        }
                    });
                }
        }
    },
    xAxis: { title: { text: 'Time' }, type: 'datetime' },
    yAxis: { title: { text: '# Events' }, min:0,
        plotLines: [{ value: 0, width: 1, color: '#808080' }]
    },
    tooltip: { formatter: function() {
            return Highcharts.dateFormat('%B %e %Y %H:%M:%S', this.x) + '
'+ ''+this.y+' Events' }}, plotOptions: { area: { dataParser: parse_date, } }, series: [{ id: 1, name: 'search', dataURL: 'http://subdomain.loggly.com/api/search/' + '?q=inputname:logglyapp&facets=True'}], title: { text: 'traffic last 24 hours' } }); var reset_zoom = function() { // requery for the original data: $.ajax({ type: "GET", url: "http://subdomain.loggly.com/api/search/" + "?q=inputname:logglyapp&facets=True", success: function(data) { chart.toolbar.remove('zoom'); chart.xAxis[0].setExtremes(); chart.get(1).setData(parse_date(data)); }, error: function(req, text, error) { $("#err").html("Loading error!"); } }); } });

Let’s have a quick look at the code. There are two things I want to communicate here: 1. The code I used to display a HightChart graph and 2. The way I am using Loggly’s APIs to query the data.

I mentioned the special zooming that I implemented. Take a look at lines 20 to 39. This is the function that handles zooming, and it is where I am reloading the more detailed data. I set the new start and end dates (lines 23 and 24) and then I am querying the Loggly API with the new timeframe (lines 25 to 27). Upon success – this is important – I am using the chart.series[0].setData() method to set the new data for the chart. The next line overwrites the default button or a link that lets the user zoom out again (lines 32). Note: because you are implementing your own zoom, the default “reset zoom” button from HighCharts will not work anymore and you have to implement your overwrite it with your own function to reset the chart.

The function dealing with the reset functionality is on lines 59 to 72. It does nothing else than query the Loggly API for the original data (I am passing no time parameters) and setting the data just like the previous call. The other thing you have to do is in lines 64 where you need to remove the HighCharts default “reset zoom” link and reset the extremes (line 65).

Moving on, we’ll briefly discuss the way I’m using the Loggly API. If you’d like to use it, you need an account with us. We are currently in private beta, therefore you will need us to give you access to the beta program in order to do so. Email if you want an account to play around with! Back to the code. Make sure you replace the with your actual subdomain. Now that this is out of the way, you can query the API by simply making a GET request to: /api/search. You pass the q parameter with your query. In my example I am getting all the data from my input with the name logglyapp. To get timeline data, you’ll need to pass the parameter facets=True into the call. This will give you counts for time buckets.

To make everything work together, you need one more piece: the date_parse function. You need this part because the Loggly API returns the data with real human readable timestamps and HighCharts wants UTC encoded timestamps. The function on lines 1 to 11 takes care of converting the time for you. Just copy it.

I hope this was useful. Let us know if you are having trouble with any of this. We are looking forward hearing about your graphing endeavors.

If you look at my del.icio.us feed, you’ll find a bunch more visualization and charting links.

4 Comments