<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Loggly &#187; Code</title>
	<atom:link href="http://www.loggly.com/category/code/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.loggly.com</link>
	<description>Log Management in the Cloud</description>
	<lastBuildDate>Wed, 18 Aug 2010 01:38:39 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Our Solr system</title>
		<link>http://www.loggly.com/2010/08/our-solr-system/</link>
		<comments>http://www.loggly.com/2010/08/our-solr-system/#comments</comments>
		<pubDate>Tue, 10 Aug 2010 02:42:32 +0000</pubDate>
		<dc:creator>jon</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[0MQ]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[sharding]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.loggly.com/?p=1245</guid>
		<description><![CDATA[I was one of three speakers at the Lucene/Solr meetup last month, co-sponsored by salesforce and Lucid Imagination. I don&#8217;t know how anyone at salesforce with a window gets any work done, considering the view &#8211; take a look at Grant&#8217;s photo to see what I mean. Thanks to Bill from salesforce for hosting, and the [...]]]></description>
			<content:encoded><![CDATA[<p><a rel="prettyPhoto" href="http://www.loggly.com/wp-content/uploads/2010/08/MG_5170.jpg"><img class="alignright size-full wp-image-1256" title="_MG_5170_medium" src="http://www.loggly.com/wp-content/uploads/2010/08/MG_5170_medium1.jpg" alt="Jon Gifford" width="240" height="160" /></a>I was one of three speakers at the <a href="http://www.meetup.com/SFBay-Lucene-Solr-Meetup/calendar/14124466/">Lucene/Solr meetup</a> last month, co-sponsored by salesforce and Lucid Imagination. I don&#8217;t know how anyone at salesforce with a window gets any work done, considering the view &#8211; take a look at <a href="http://www.lucidimagination.com/blog/2010/07/29/sf-bay-area-july-lucene-meetup-highlights/">Grant&#8217;s photo</a> to see what I mean. Thanks to Bill from salesforce for hosting, and the guys at Lucid for organizing things. You can check out the two other talks <a href="http://www.lucidimagination.com/Community/meetups">here</a>, as well as talks from previous meetups.</p>
<p><strong>UPDATE: I&#8217;ll be doing a slightly expanded version of this talk at </strong><a href="http://lucenerevolution.org/"><strong>Lucene Revolution</strong></a><strong> in Boston on October 8th, incorporating some of the stuff I talk about below.</strong></p>
<p>I got a few interesting questions and comments after the talk, so I thought I&#8217;d expand a bit on what was in <a href="http://www.loggly.com/wp-content/uploads/2010/08/Lucene-Solr-Meetup-July-2010-short.pdf">my slides</a>, which were perhaps a little dense.</p>
<h4 id="toc-log-search-is-highly-skewed">&#8220;Log Search is highly skewed&#8221;</h4>
<p>In the talk, I said that the most important search data is the most recent. When you have a problem, you&#8217;re far more likely to care about what happened in the last few minutes or hours or days than what happened a month ago. Thats not say that you&#8217;ll never need to search older data, just that most of the time, you won&#8217;t.</p>
<p>After the talk, though, it became obvious that I should also have said that our users are likely to use search in a way that is also pretty skewed when compared to &#8220;normal&#8221; search products. Basically, we expect that most people will use the system somewhat sporadically, but that when they do, its likely to be a pretty intensive session of  bug hunting. So instead of a fairly continuous search load, we get random spikes for a small subset of all the data we have in Solr. This is actually good for us, because we don&#8217;t need to keep all of the shards for all of our customers &#8220;hot&#8221; in Solr.  When a customer shows up, we can warm their data quickly, and let Solr and the filesystem cache do their thing to deal with shards that haven&#8217;t been used for a while.</p>
<p>The most important point here is that the overall system is going to be spending the vast majority of its resources on indexing, rather than searching. I can&#8217;t give you numbers, but if we end up spending anything more than about 5-10% of our cycles on search, I&#8217;ll be very surprised. This is not your typical consumer search product.</p>
<h4 id="toc-0mq">0MQ</h4>
<p>I talked a bit about <a href="http://www.zeromq.org/">0MQ</a>, and said that we chose it primarily because its fast and lightweight, even though its possible that we could lose data if things break. I clarified this a bit in a comment on <a href="http://www.ultrasaurus.com/sarahblog/2010/07/lucenesolr-meetup-july-28/">Sarah Allen&#8217;s blog</a> because I want to make sure the message is that 0MQ is awesome, not that it loses data. Here&#8217;s the guts of what I said&#8230;</p>
<blockquote><p>I wanted to clarify one point in your writeup, though, to make sure people don’t get the wrong idea about 0MQ. Yes, our implementation of 0MQ has a potential “leak”, where we can lose messages, but its a very uncommon case, and the impact is small. Specifically, if one of the solr nodes dies hard, we potentially lose any events that were sent to it in the last batch (0MQ batches to minimize comms overhead). In steady state, 0MQ is rock solid, 100% reliable, and faaaaaast.</p>
<p>Pieter (at iMatix) and I are currently discussing ways to solve the hard death problem, and I don’t anticipate it being a problem very long. As I said in the talk, 0MQ is unbelievably cool – if you haven’t got a project that needs it, make one up!</p></blockquote>
<p>We sponsored some work to get the SWAP functionality in version 2 of 0MQ, and I&#8217;ve been blown away by the guys at iMatix &#8211; they really want 0MQ to work, and work well. My  throw-away comment prompted an email from Pieter asking for more details, and, as I said to Sarah above, we&#8217;re already looking at how to fix it.</p>
<p>Oh, and in case you&#8217;re wondering how fast a one-armed paper-hanger is, take a look at what  <a href="http://www.word-detective.com/080805.html">The Word Detective</a> says about it (scroll down till you see the &#8220;You missed a spot&#8221; section). Maybe I should have used &#8220;flat out like a lizard drinking&#8221; instead?</p>
<h4 id="toc-sharding">Sharding</h4>
<p>The way we create shards by indexing, then merging,  then merging again and again and again raised a few questions that are worth repeating&#8230;</p>
<p>To recap, we build small (5 minute) shards on our hot indexers. When we stop adding events to them, they get merged with older shards until we hit another size limit (30 minutes). They then get merged with even older shards, until we hit the next time limit (4 hours). And so on up the chain until they cap out at a week long. Along the way, we push indexes from box to box, to balance the load on the system as a whole.</p>
<p>The first question is fairly obvious: <em>Why?</em></p>
<p>At first glance, it seems like we&#8217;re just creating work for ourselves. Surely we could just build the shards and use them as is, right? The problem is that we would have a lot of 5 minute shards floating around the system, and we already know that Solr starts getting cranky when you run a lot of cores in a single instance. So, why don&#8217;t we just build bigger shards? The issue there is that with the version of Solr we&#8217;re using, we have to reopen the index to make new data available, and we currently do that every 10 seconds (hence the &#8220;NRT + SolrCloud = Our Nirvana&#8221; in my slides). Since we have to do this, we&#8217;d end up with too many segments in the hot index, or (if we&#8217;re not careful with our merge factor) a lot of automatic merging that means that the hot index becomes unavailable for updates for too long for my liking. So, we got pushed into this approach by something that I&#8217;m hoping will soon be a thing of the past. I&#8217;m really looking forward to <a href="http://lucenerevolution.org/Presentation-Abstracts-Day2#realtime-busch">Michael Busch&#8217;s talk</a> at <a href="http://lucenerevolution.org">Lucene Revolution</a> which promises to remove the &#8220;N&#8221; from NRT. I&#8217;m not sure what is better than nirvana, but I&#8217;m hoping to find out soon</p>
<p>We may have been forced into doing things this way, but there is a lot of value in the model we have. In some ways, we&#8217;re taking over a part of Lucene (merging) that has been absolutely  invaluable, but can sometimes be a little difficult to control. We now have complete control over when and where indexes get merged. I probably should point out that we deliberately don&#8217;t do any merging on the 5 minute shards, and that we&#8217;re careful with the merge parameters on the larger shards to make the merges that do happen as efficient as possible.  The model also gives us a very simple index naming scheme based on time, which means we always know exactly where to find data for a time-constrained query. More on this in a bit&#8230;</p>
<p>The next question (from the meetup) was <em>what is the overhead of all this merging?</em></p>
<p>Rather than give numbers, its worth thinking about whether we&#8217;re actually doing anything more than Lucene already does when you start building big indexes. I think the answer to that is that we&#8217;re actually just exposing and taking over the automatic behaviour, rather than doing something &#8220;extra&#8221;. So I think the real overhead is close to zero. Compared to building a bunch of shards in parallel using Hadoop, we&#8217;re certainly doing more work, but most of the Hadoop based systems I&#8217;ve looked at are geared more towards building indexes from a large existing corpus, rather than dealing with a real time stream.</p>
<p>My final comment on this is that since its all completely configurable, we&#8217;re not locked into any of the times I&#8217;ve mentioned above. Maybe when we move to NRT, or RT, we can bump the hot shard size up to hours or days, assuming that we&#8217;re still in control of merging. We shall see&#8230;</p>
<h4 id="toc-constructive-laziness">Constructive Laziness</h4>
<p>Circling back to the first section, where I talked about how skewed we expect our search to be, the time-based shards gives us a very clean way to limit the impact of our search requests. Since we can constrain a search to a specific time period, its easy for us to identify which indexes we need to hit to satisfy the search. Our ideal search is for something in the last few minutes, which can be entirely served out of one or two of the five minute shards. We may have gigabytes or (hopefully) terabytes of index data for the same customer sitting around on our system, but if we can satisfy their request by hitting two small, heavily cached cores, then we&#8217;re in great shape. I wonder if life will be so kind to us?</p>
<h4 id="toc-random-aside-synchronicity">Random aside: Synchronicity</h4>
<p>Every now and then, things just come together in strange ways. A couple of weeks ago, Kord and I talked with Diego and Santiago from <a href="http://www.flaptor.com/index.php">Flaptor</a>, who are working on <a href="http://indextank.com/">IndexTank</a>. Diego and I were at LookSmart together years and years and years ago, but thats not the synchronicity. As we were talking, Diego said they were working on a &#8220;Nebulizer&#8221; which does automatic distribution of their index in the cloud. The day before the meeting, I&#8217;d pulled all of the code that deals with this in our system into a class named &#8220;TheDecider&#8221; (I&#8217;m still wrestling with a way to make misunderestimate() a useful method in this class).  That evening I went to a NoSQL meetup, and met someone who is also working on the equivalent for their system. Maybe there is something in the air?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.loggly.com/2010/08/our-solr-system/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Cross-Domain AJAX Calls To Query Loggly&#8217;s APIs</title>
		<link>http://www.loggly.com/2010/05/cross-domain-ajax-calls-to-query-logglys-apis/</link>
		<comments>http://www.loggly.com/2010/05/cross-domain-ajax-calls-to-query-logglys-apis/#comments</comments>
		<pubDate>Thu, 27 May 2010 17:42:33 +0000</pubDate>
		<dc:creator>Raffy</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[django]]></category>
		<category><![CDATA[json]]></category>
		<category><![CDATA[jsonp]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[restful]]></category>

		<guid isPermaLink="false">http://www.loggly.com/?p=1055</guid>
		<description><![CDATA[Last week I started playing some more with the Logging APIs from Loggly. For the first time I started embedding AJAX calls to the API into a Web application running on an external domain. Well, guess what happened? The browser barked at me telling me that I couldn&#8217;t execute a cross-domain AJAX call. I guess [...]]]></description>
			<content:encoded><![CDATA[<p>Last week I started playing some more with the <a href="http://wiki.loggly.com/api-documentation">Logging APIs</a> from Loggly. For the first time I started embedding AJAX calls to the API into a Web application running on an external domain. Well, guess what happened? The browser barked at me telling me that I couldn&#8217;t execute a <a href="">cross-domain AJAX</a> call. I guess from a security perspective, that makes a lot of sense. However, I started thinking about how I could overcome this problem. The one way that I could have done it was not to use AJAX, but write some code server-side that would fetch the information format the Loggly API and then present it back to my Web application. I could even expose the information as an end point on the same domain that I then query from my application (see Figure).</p>
<p><a href="http://www.loggly.com/wp-content/uploads/2010/05/jsonp.png"><img src="http://www.loggly.com/wp-content/uploads/2010/05/jsonp.png" alt="" title="jsonp" width="400" class="wp-image-1058" /></a></p>
<p>Well, this seemed wrong. Why did we just design a really nice, RESTful API and then developers who want to use it have to build a server-side wrapper first. This didn&#8217;t make sense to me. So I kept digging. Fortunately, I found the solution. It&#8217;s called <b><a href="http://ajaxian.com/archives/jsonp-json-with-padding">JSONP</a></b> (JASON with Padding). Here is how it works and how you can leverage it in your own applications.</p>
<p>Let&#8217;s assume I am building an application at labs.loggly.com that will access the API located at loggly.loggly.com. With jQuery, my AJAX call looks as follows:</p>
<pre style="display: none;" name="code" class="javascript">$.ajax({url: "http://loggly.loggly.com/api/search/?q=ntp", username="guest", password="loggly", ...})</pre>
<p>Now, if you do this, you will get the cross-domain error. However, if you just slightly change your call to include an extra parameter, it will succeed:</p>
<pre style="display: none;" name="code" class="javascript">$.ajax({url: "http://loggly.loggly.com/api/search/?q=ntp",
    username='guest', password='loggly',
    dataType:'jsonp',
    success: function(data) {
        flare = data['data'];
    },
    error: function(XMLHttpRequest, textStatus, errorThrown) {
        alert(textStatus+" - "+errorThrown);
    }
})</pre>
<p>Note the newly added dataType parameter. That&#8217;s it? Yes, that&#8217;s it. It will work like a charm. No more cross-domain security issues. What basically happens are two things. First, the AJAX request that is executed has one more extra query parameter: <b>&#038;callback=?</b>, where the question mark is some string that jQuery randomly generates. The second thing that happens is on the Loggly side. If the callback parameter is present, Loggly does not return the plain JSON element that you would expect, but it wraps it in a function call. Something like:</p>
<pre style="display: none;" name="code" class="javascript">jsonp12312312({data:{"May-20-2010 12:13:45": 2"}, numFound: 1})</pre>
<p>The next thing that happens is that when your browser gets the answer back like this, it will try to execute the function called <b>jsonp12312312</b>. jQuery internally handled that for you by creating a function hook for that function that points to the success function provided to the AJAX call.</p>
<p>That&#8217;s really it. We are looking forward seeing your applications that are using the Loggly APIs!</p>
<p>By the way, Loggly is using <a href="http://bitbucket.org/jespern/django-piston/wiki/Home">Django Piston</a> for handling the APIs. The library automatically handles JSONP responses when a parameter called &#8220;callback&#8221; is present! </p>
]]></content:encoded>
			<wfw:commentRss>http://www.loggly.com/2010/05/cross-domain-ajax-calls-to-query-logglys-apis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Logging Library for Django &#8211; How We Log at Loggly</title>
		<link>http://www.loggly.com/2010/05/a-logging-library-for-django-how-we-log-at-loggly/</link>
		<comments>http://www.loggly.com/2010/05/a-logging-library-for-django-how-we-log-at-loggly/#comments</comments>
		<pubDate>Sat, 22 May 2010 18:25:54 +0000</pubDate>
		<dc:creator>Raffy</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Log Management]]></category>
		<category><![CDATA[django]]></category>
		<category><![CDATA[logging]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.loggly.com/?p=1048</guid>
		<description><![CDATA[In my last blog entry, I showed you how you can enable logging in Django 1.2. Now we are going to look at the logging library that we built for Loggly to simplify the task of logging in our own Django application, the Loggly Web interface.
Here is how we log from within our application:
from loggly.logging [...]]]></description>
			<content:encoded><![CDATA[<p>In my last blog entry, I showed you how you can <a href="http://www.loggly.com/2010/05/how-to-enable-logging-in-django-1-2">enable logging in Django 1.2</a>. Now we are going to look at the logging library that we built for Loggly to simplify the task of logging in our own Django application, the Loggly Web interface.</p>
<p>Here is how we log from within our application:</p>
<pre style="display: none;" name="code" class="python">from loggly.logging import *

error({'object':'input','action':'create'})</pre>
<p>That&#8217;s it. The above code creates the following log entry:</p>
<pre style="display: none;" name="code" class="python">Mar 18 15:34:03 app loggly: severity=ERROR,user=logdog_zrlram,request_id=
08BaswoAAQgAADVDG3IAAAAD,object=input,action=create,status=failure</pre>
<p>The logging call expects a dict of key-value pairs. This is to enforce key-value based log entries that make it easy for consumers to understand what a specific value means. Without the inclusion of a key, a value is more or less useless. In the example above, note that I only provided two keys: object, and action. However, the log entry contains a number of other data items. Those items are automatically added to the log entries by our logging library without burdening the developer to explicitly include them.</p>
<p>It is probably time to show you <a href="">Loggly&#8217;s logging library</a>:</p>
<pre style="display: none;" name="code" class="python">import logging
import inspect

DEFAULT_LOGGER = 'loggly_web'
logs = None

def logHelper(rest=None, request=None):

    global logs
    output = list()

    # get the logger
    if not logs:
        logs = logging.getLogger(DEFAULT_LOGGER)

    # Loop through all the stack frames until you find the request
    stack = inspect.stack()
    for frame in stack:
        if frame[0].f_locals.has_key('request'):
            request = frame[0].f_locals['request']
            if request is None:
                continue
            # there is a request object
            if hasattr(request,'user') and hasattr(request.user,'username') and len(request.user.username)>0:
                output.append("user="+str(request.user.username).strip())
            if hasattr(request,'META') and request.META.has_key('UNIQUE_ID'):
                output.append("request_id="+str(request.META['UNIQUE_ID']).strip())
            # we found the request object. Get out of here
            break

    # getting input dictionary and appending
    if rest:
        for key in rest:
            output.append("%s=%s" % (str(key.strip()), str(rest[key]).strip()))

    ret = ",".join(map(str, output))
    return ret

def info(rest=None, user=None):

    msg = logHelper(rest, request)
    logs.info(msg)

def error(rest=None, user=None):

    msg = logHelper(rest, request)
    logs.error(msg)</pre>
<p>Note that this is only an extract. <a href="/wp-content/uploads/2010/04/loggly_logging.py">Download</a> the entire library if you want to use it in your own code. Here are some important things  the code does:</p>
<ul>
<li>line 17 to 29: This part of the code  inspects the call stack to check whether there is an HTTP request object somewhere. The request object contains the username for the session and that is what we automatically extract . This frees the user from manually adding that information to the logging call. Automation is good!</li>
<li>line 26 and 27: We are using <a href="http://httpd.apache.org/docs/2.0/mod/mod_unique_id.html">UNIQUE_IDs</a> in Apache. In order to track a request from the Apache logs down into our application, we include that same ID into our Django logs. This is a huge win for associating Apache logs with our application logs.</li>
<li>line 32 to 24: All the dict entries are added as &#8216;key=value&#8217; pairs to the log entry. So you can log any key you want.</li>
<li>line 39 to 47: These are the calls that you use in your code. Note that you can add a user field, which overwrites the username from the request. In some cases that is necessary and useful.</li>
</ul>
<p>Let us know if you are using our library. I would love to hear back from you. I will post another blog entry later, where I will be talking about how to patch Django itself to do some more logging. We will be looking at how the authentication methods can be extended.</p>
<p>The links:<br />
<a href="/wp-content/uploads/2010/04/django_logging_1.2.patch">Django 1.2 Logging Patch</a><br />
<a href="/wp-content/uploads/2010/04/loggly_logging.py">Loggly Logging Library</a> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.loggly.com/2010/05/a-logging-library-for-django-how-we-log-at-loggly/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>How to Enable Logging in Django 1.2</title>
		<link>http://www.loggly.com/2010/05/how-to-enable-logging-in-django-1-2/</link>
		<comments>http://www.loggly.com/2010/05/how-to-enable-logging-in-django-1-2/#comments</comments>
		<pubDate>Thu, 20 May 2010 18:35:01 +0000</pubDate>
		<dc:creator>Raffy</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Log Management]]></category>
		<category><![CDATA[django]]></category>
		<category><![CDATA[framework]]></category>
		<category><![CDATA[logging]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.loggly.com/?p=1011</guid>
		<description><![CDATA[On Monday Django 1.2 was released. At Loggly, we highly anticipated this release. We have been running version 1.2 since the early Alpha releases. During the Alpha times, Simon Willison released a patch and proposal/ticket of how to enable logging in Django applications. It looked like the patch would get included in the main Django [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.loggly.com/wp-content/uploads/2010/05/django-logo-positive.png" alt="Django 1.2" width="250" style="float:right"/>On Monday <a href="http://www.djangoproject.com/weblog/2010/may/17/12/">Django 1.2</a> was released. At Loggly, we highly anticipated this release. We have been running version 1.2 since the early Alpha releases. During the Alpha times, Simon Willison released a <A href="http://github.com/simonw/django/commit/b5227e1ac1d70b1f936ee69d6e347d8148df461e">patch</a> and <a href="http://code.djangoproject.com/ticket/12012">proposal/ticket</a> of how to enable logging in Django applications. It looked like the patch would get included in the main Django branch. Can you imagine how disappointed we were when the we realized that the patch didn&#8217;t make it into the final release?<br />
Well, what we ended up doing was to patch Django with Simon&#8217;s patch. Took me a little while to update the patch to work with the final 1.2 release. If you want to use the patch, <a href="/wp-content/uploads/2010/04/django_logging_1.2.patch">download</a> it here. What you need to do then is patch your Django install with this. So, download the <a href="http://www.djangoproject.com/download/">Django 1.2</a> tar ball and do the following:</p>
<pre style="display: none;" name="code" class="python">tar -xzf Django-1.2.tar.gz
cd Django-1.2
patch -p0 < /tmp/django_logging_1.2.patch
python ./setup.py install</pre>
<p>This will include the patch into your Django distro and install it. As a next step, you configure your application for logging by adding a snippet similar to the following in your settings.py file:</p>
<pre style="display: none;" name="code" class="python">LOGGING = {
    'loggly_feature': {
        'handler': 'logging.handlers.SysLogHandler',
        'address': ('localhost', 514),
        'facility': 'local3',
        'level': 'INFO',
        'format': 'loggly: %(message)s',        # does not include the severity. it's features, baby!
    },
}</pre>
<p>From there you use the following code snippet to log out of your application:</p>
<pre style="display: none;" name="code" class="python">import logging
logs = logging.getLogger('loggly_feature')
logs.error("message")</pre>
<p>More information about how to use the <a href="http://docs.python.org/library/logging.html">logging libraries</a> you find in the Python documentation. It also seems like Django 1.3 will have <a href="http://code.djangoproject.com/ticket/12012">logging</a> built in. From what I can tell, it looks similar to Simon's approach, but it's not quite the same. I'll keep you posted here!</p>
<p>In my next blog post I will show you how we implemented a logging library for Loggly so that we can very easily log from anywhere within our application.</p>
<p><a href="/wp-content/uploads/2010/04/django_logging_1.2.patch">Django 1.2 Logging Patch</a> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.loggly.com/2010/05/how-to-enable-logging-in-django-1-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Securing your Web Application with httponly cookies OR How Apache.org and Atlassian could have been secured</title>
		<link>http://www.loggly.com/2010/04/securing-your-web-application-with-httponly-cookies-or-how-apache-org-and-atlassian-could-have-been-secured/</link>
		<comments>http://www.loggly.com/2010/04/securing-your-web-application-with-httponly-cookies-or-how-apache-org-and-atlassian-could-have-been-secured/#comments</comments>
		<pubDate>Wed, 14 Apr 2010 22:47:35 +0000</pubDate>
		<dc:creator>Raffy</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://www.loggly.com/?p=762</guid>
		<description><![CDATA[The other day I was reading about the Apache and Atlassian hack. Max wrote a really nice summary of how that attack could have been prevented. One of the points he raised was that they should have used HTTPONLY cookies. 
I then realized that we might have the same problem with Loggly. After some traffic [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://farm1.static.flickr.com/165/390654468_afa1bc5249.jpg" alt="Attack" style="float:right" width=250/>
<p>The other day I was reading about the <a href="http://blog.skeptikal.org/2010/04/apacheorg-hacked-atlassian-fail.html">Apache and Atlassian hack</a>. Max wrote a really nice summary of <a href="http://avatraxiom.livejournal.com/102080.html">how that attack could have been prevented</a>. One of the points he raised was that they should have used <a href="http://www.codinghorror.com/blog/2008/08/protecting-your-cookies-httponly.html">HTTPONLY</a> cookies. </p>
<p>I then realized that we might have the same problem with Loggly. After some traffic dumping of our Web sessions, I realized that Django didn&#8217;t support httponly cookies. A quick google search revealed that someone wrote a djangosnippet to add <a href="http://www.djangosnippets.org/snippets/1983/">httponly cookies</a>. I had to slightly rewrite it, so here is the code I am using:</p>
<pre name=code class=python>class cookie_httponly:
    def process_response(self, request, response):
        scn = settings.SESSION_COOKIE_NAME or 'sessionid'
        if response.cookies.has_key(scn):
            response.cookies[scn]['httponly'] = True
        return response</pre>
<p>Don&#8217;t forget to add the middleware right before the SessionMiddleware. If you are using Python 2.6 or higher, you are done. Unfortunately, we are running Python 2.5, which does not support the httponly flag on cookies. A quick patch solved that problem as well:</p>
<pre name=code class=bash>--- /usr/lib/python2.5/Cookie.py   (revision 66233)
+++ /usr/lib/python2.5/Cookie.py   (working copy)
@@ -408,6 +408,9 @@
     # For historical reasons, these attributes are also reserved:
     #   expires
     #
+    # This is an extension from Microsoft:
+    #   httponly
+    #
     # This dictionary provides a mapping from the lowercase
     # variant on the left to the appropriate traditional
     # formatting on the right.
@@ -417,6 +420,7 @@
                    "domain"      : "Domain",
                    "max-age" : "Max-Age",
                    "secure"      : "secure",
+                   "httponly"  : "httponly",
                    "version" : "Version",
                    }

@@ -499,6 +503,8 @@
                 RA("%s=%d" % (self._reserved[K], V))
             elif K == "secure":
                 RA(str(self._reserved[K]))
+            elif K == "httponly":
+                RA(str(self._reserved[K]))
             else:
                 RA("%s=%s" % (self._reserved[K], V))</pre>
<p>Loggly is now more secure against XSS attacks!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.loggly.com/2010/04/securing-your-web-application-with-httponly-cookies-or-how-apache-org-and-atlassian-could-have-been-secured/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Visualizing your Data in the Cloud with Loggly and HighCharts</title>
		<link>http://www.loggly.com/2010/03/visualizing-your-data-in-the-cloud-with-loggly-and-highcharts/</link>
		<comments>http://www.loggly.com/2010/03/visualizing-your-data-in-the-cloud-with-loggly-and-highcharts/#comments</comments>
		<pubDate>Fri, 26 Mar 2010 22:39:30 +0000</pubDate>
		<dc:creator>Raffy</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Log Management]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[chart]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[loggly]]></category>
		<category><![CDATA[mashup]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://www.loggly.com/?p=694</guid>
		<description><![CDATA[A short while into writing code for the Loggly interface we decided that we needed some eye candy. Given my background in visualization, I was keen on providing our users with an experience that helps them understand their data in an intuitive way.
Over the last few years I&#8217;ve been looking into a ton of visualization [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.loggly.com/wp-content/uploads/2010/03/Screen-shot-2010-03-26-at-Mar-26-6.32.05-PM.png"><img width=680 title="Loggly Cloud Data" src="http://www.loggly.com/wp-content/uploads/2010/03/Screen-shot-2010-03-26-at-Mar-26-6.32.05-PM.png" alt="" /></a></p>
<p>A short while into writing code for the Loggly interface we decided that we needed some eye candy. Given my <a href="http://secviz.org">background</a> in visualization, I was keen on providing our users with an experience that helps them understand their data in an intuitive way.</p>
<p>Over the last few years I&#8217;ve been looking into a ton of visualization libraries for the Web.  In the past, if you had asked me what library to use for generating charts on your Web site, I would have said, &#8220;Use Flash&#8221;.  While there are a number of interesting Flash libraries out there, the landscape has shifted significantly in the last year.  Everyone is moving to JavaScript. After some research, I opted to use a <a href="http://www.datavisualization.ch/tools/13-javascript-libraries-for-visualizations">JavaScript charting library</a> called <a href="http://highcharts.com/">HighCharts</a>.   I tried a bunch of other canvas-based libraries, but let me tell you without hesitation, HighCharts rocks.</p>
<p>I am going to show you how we are using HighCharts and how I implemented <em><strong>zooming to dynamically reload more event data</strong></em> on the fly.  With any charting library, if you keep zooming in on a chart, it will not progressively load more detailed data.  At detailed zoom levels you end up with a small range of data in your graph.  Basically if you view a day&#8217;s data first, and then zoom into a specific minute, you would only see one data point.</p>
<p>To start, here&#8217;s the JavaScript I use to display a chart:</p>
<pre name="code" class="javascript">var parse_date = function(data) {
    var result = [];
    $.each(data, function(key, value) {
        var re = new RegExp(/(\d+)-(\d+)-(\d+)T(\d+):(\d+):(\d+)(?:\.(\d+))?/);
        var date = re.exec(key);
        if (date[7] == undefined) {date[7]=0;}
        var real_date = Date.UTC(date[1], parseInt(date[2])-1,date[3],date[4],date[5],date[6],date[7]);
        result.push([real_date, value]);
    });
    return result;
}

chart = new Highcharts.Chart({
    credits: { enabled: false },
    chart: {
        renderTo: 'activity',
        defaultSeriesType: 'area',
        margin: [10, 20, 40, 55],
        zoomType: "x",
            events: {
                selection: function(event) {
                    // change the time frame to be searched
                    var start = Highcharts.dateFormat('%Y-%m-%dT%H:%M:%SZ', event.xAxis[0].min);
                    var end = Highcharts.dateFormat('%Y-%m-%dT%H:%M:%SZ', event.xAxis[0].max);
                    $.ajax({ type: "GET", url: "http://subdomain.loggly.com/api/search/?" \
                        + "q=inputname:logglyapp&#038;starttime="+start+"&#038;endtime="+end \
                        + "&#038;facets=True&#038;buckets=24",
                        success: function(data) {
                             chart.xAxis[0].setExtremes();
                             chart.series[0].setData(parse_date(data));
                             // fix the reset zoom button
                             $('.highcharts-toolbar').click(resetZoom);
                        },
                        error: function(req, text, error) {
                            $("#err").html("Reload error!");
                        }
                    });
                }
        }
    },
    xAxis: { title: { text: 'Time' }, type: 'datetime' },
    yAxis: { title: { text: '# Events' }, min:0,
        plotLines: [{ value: 0, width: 1, color: '#808080' }]
    },
    tooltip: { formatter: function() {
            return Highcharts.dateFormat('%B %e %Y %H:%M:%S', this.x) + '<br/>'+
            '<b>'+this.y+' Events</b>' }},
    plotOptions: {
        area: {
            dataParser: parse_date,
        }
    },
    series: [{ id: 1, name: 'search',
        dataURL: 'http://subdomain.loggly.com/api/search/'
            + '?q=inputname:logglyapp&#038;facets=True'}],
    title: { text: 'traffic last 24 hours' }
});

var reset_zoom = function() {
    // requery for the original data:
    $.ajax({ type: "GET", url: "http://subdomain.loggly.com/api/search/"
        + "?q=inputname:logglyapp&#038;facets=True",
        success: function(data) {
           chart.toolbar.remove('zoom');
           chart.xAxis[0].setExtremes();
           chart.get(1).setData(parse_date(data));
        },
        error: function(req, text, error) {
            $("#err").html("Loading error!");
        }
    });
}
});
</script>
</pre>
<p>Let&#8217;s have a quick look at the code. There are two things I want to communicate here: 1. The code I used to display a HightChart graph and 2. The way I am using <a href="http://wiki.loggly.com/api-documentation">Loggly&#8217;s APIs</a> to query the data.</p>
<p>I mentioned the special zooming that I implemented. Take a look at lines 20 to 39.  This is the function that handles zooming, and it is where I am reloading the more detailed data.  I set the new start and end dates (lines 23 and 24) and then I am querying the Loggly API with the new timeframe (lines 25 to 27). Upon success &#8211; this is important &#8211; I am using the <b>chart.series[0].setData()</b> method to set the new data for the chart. The next line overwrites the default button or a link that lets the user zoom out again (lines 32). Note: because you are implementing your own zoom, the default &#8220;reset zoom&#8221; button from HighCharts will not work anymore and you have to implement your overwrite it with your own function to reset the chart.</p>
<p>The function dealing with the reset functionality is on lines 59 to 72. It does nothing else than query the Loggly API for the original data (I am passing no time parameters) and setting the data just like the previous call. The other thing you have to do is in lines 64 where you need to remove the HighCharts default &#8220;reset zoom&#8221; link and reset the extremes (line 65).
<p>Moving on, we&#8217;ll briefly discuss the way I&#8217;m using the <B>Loggly API</B>.  If you&#8217;d like to use it, you need an account with us. We are currently in private beta, therefore you will need us to give you access to the beta program in order to do so. <a href="mailto:support@loggly.com?subject=BETA+request+for+API+usage">Email</a> if you want an account to play around with! Back to the code. Make sure you replace the <subdomain> with your actual subdomain. Now that this is out of the way, you can query the API by simply making a GET request to: /api/search. You pass the <b>q</b> parameter with your query. In my example I am getting all the data from my input with the name <i>logglyapp</i>. To get timeline data, you&#8217;ll need to pass the parameter <b>facets=True</b> into the call. This will give you counts for time buckets.</p>
<p>To make everything work together, you need one more piece: the <b>date_parse</b> function.  You need this part because the Loggly API returns the data with real human readable timestamps and HighCharts wants UTC encoded timestamps. The function on lines 1 to 11 takes care of converting the time for you. Just copy it.</p>
<p>I hope this was useful. Let us know if you are having trouble with any of this. We are looking forward hearing about your graphing endeavors.</p>
<p>If you look at my <a href="http://del.icio.us/zrlram/visualization">del.icio.us</a> feed, you&#8217;ll find a bunch more visualization and charting links.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.loggly.com/2010/03/visualizing-your-data-in-the-cloud-with-loggly-and-highcharts/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Fixing Client IPs in Apache Logs with Amazon Load Balancers</title>
		<link>http://www.loggly.com/2010/03/fixing-client-ips-in-apache-logs-with-amazon-load-balancers/</link>
		<comments>http://www.loggly.com/2010/03/fixing-client-ips-in-apache-logs-with-amazon-load-balancers/#comments</comments>
		<pubDate>Mon, 22 Mar 2010 23:09:07 +0000</pubDate>
		<dc:creator>Raffy</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Log Management]]></category>

		<guid isPermaLink="false">http://www.loggly.com/?p=681</guid>
		<description><![CDATA[If you are running your Web servers behind a load balancer, you have probably noticed that your logs contain the load balancer&#8217;s IP address as the client IP, which is kind of annoying. There is an Apache module called RPAF, which fixes exactly this issue. Once you have it downloaded and installed, you configure it [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.loggly.com/wp-content/uploads/2010/03/images.jpg" alt="" title="Balancing Act" width="124" height="93" class="alignright size-full wp-image-683" />
<p>If you are running your Web servers behind a load balancer, you have probably noticed that your logs contain the load balancer&#8217;s IP address as the client IP, which is kind of annoying. There is an Apache module called <a href="http://stderr.net/apache/rpaf/">RPAF</a>, which fixes exactly this issue. Once you have it downloaded and installed, you configure it as follows:</p>
<pre name="code" class="bash:nocontrols">RPAFenable On
RPAFsethostname On
RPAFproxy_ips 127.0.0.1 10.0.0.1
RPAFheader X-Forwarded-For</pre>
<p>The module works such that it takes the IP address that is being transmitted in the X-FORWARD-FOR header and sticks it into the request as the ClientIP.</p>
<p>The not so nice part is that the RPAFproxy_ips statement lets you define the IP addresses of your load balancer. If you have only a set amount of them, all is fine. But here comes the catch. If you are deployed on Amazon AWS and you are using their load balancer (LB), then this solution is going to frustrate you quite a bit. You will notice that the LB&#8217;s IP address changes constantly and you keep adding IP addresses to the configuration statement. After about 10 IP addresses, I got sick of that and I started looking at the source code of RPAF to solve this problem once and for all. Here is what I did:</p>
<p>On line 139 of mod_rpaf-2.0.c, I added a <B>return 1;</B> statement. This will tell the is_in_array() function to always assume that the request is coming from a load balancer, without checking the configured list of IP addresses. The rest of the RPAF code is robust enough to only replace the client ip when an X-FORWARD-FOR header is actually set. After the change, do a <b>make install-2.0</b> and you are in business.</p>
<p>Happy logging!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.loggly.com/2010/03/fixing-client-ips-in-apache-logs-with-amazon-load-balancers/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How to use RightScale APIs with Python</title>
		<link>http://www.loggly.com/2010/03/rightscale-apis-with-python/</link>
		<comments>http://www.loggly.com/2010/03/rightscale-apis-with-python/#comments</comments>
		<pubDate>Wed, 17 Mar 2010 17:55:10 +0000</pubDate>
		<dc:creator>Raffy</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[django]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[rightscale]]></category>

		<guid isPermaLink="false">http://www.loggly.org/?p=468</guid>
		<description><![CDATA[I have been quiet for long enough on this blog. It&#8217;s time for me to share some things that I learned in the last few months while I was working on Loggly&#8217;s Application layer. Lately, I spent some quality time with Django and consequentially Python.
What I want to focus on today is our integration with [...]]]></description>
			<content:encoded><![CDATA[<p>I have been quiet for long enough on this blog. It&#8217;s time for me to share some things that I learned in the last few months while I was working on Loggly&#8217;s Application layer. Lately, I spent some quality time with <a href="http://www.djangoproject.com">Django</a> and consequentially Python.</p>
<p>What I want to focus on today is our integration with <a href="http://www.rightscale.com">RightScale</a>. At Loggly, we use RightScale to manage our AWS instances. Loggly runs three types of servers. (Well, I am simplifying). We have a <em>proxy</em> tier which receives your log messages. The proxy tier, which is basically a bank of machines, forwards the messages to the indexing <em>back end</em> that runs Solr. The third group of machines are the Web or <em>application</em> servers. When a new proxy box comes online, the RightScale management interface knows about the box. I had to know about thse proxies on the application tier (i.e., within Django) as well. How do you do that?</p>
<p>The first solution would be to have the proxies register with Django, as soon as they get online. What happens though when they go down or are taken offline? Seems complicated to keep track of that. Another solution would be to periodically poll the proxies from Django. Not very nice either.</p>
<p>My solution is much more elegant. RightScale has two features that helped me out. The first one is <strong>machine tags</strong>. Each proxy server is labeled as such. (See <a href="http://support.rightscale.com/12-Guides/RightScale_Methodologies/Tagging">Machine Tagging</a>). Secondly, I am using the <a href="http://support.rightscale.com/12-Guides/03-RightScale_API/">RightScale API</a> to figure out how many proxies I have and what their IPs are. (As a side note, the RightScale APIs are in Beta right now. There might be changes or improvements coming down the pipe.)</p>
<p>I struggled for quite a bit with using the RightScale APIs out of Python. Here are some things that I learned the hard way and you might find helpful:</p>
<p>Using the API to query all your machines in a specific deployment:</p>
<pre name="code" class="bash:nocontrols">curl -H 'X-API-VERSION: 1.0' -u [user@domain.com]:[password] \

https://my.rightscale.com/api/acct/[account]/deployments/[deployment_number]</pre>
<p>Note how you have to add the extra header to request version 1.0 of the API.</p>
<p>Here is how you get all the machines that have a specific tag. Note the structure of my tag! I set <strong>role:proxy=true</strong>. You need to use this hierarchical model!</p>
<pre name="code" class="bash:nocontrols">curl -H 'X-API-VERSION: 1.0' -u [user@domain.com]:[password] -d'resource_type=server' \
-d 'tags[]=role:proxy' https://my.rightscale.com/api/acct/[account]/tags/search.js</pre>
<p>Want JSON output instead of XML, add &#8220;&amp;format=js&#8221; at the end of your request!</p>
<p>Now, from the response, you would think you could just use that HREF to query an individual server. Wrong. That doesn&#8217;t work. You have to add &#8220;<strong>/settings</strong>&#8221; in order to make that work:</p>
<pre name="code" class="bash:nocontrols">curl -H 'X-API-VERSION: 1.0' -u [user@domain.com]:[password] \

https://my.rightscale.com/api/acct/20184/instances/[instance_id]/status</pre>
<p>Here is how you set a tag on a server: (Note: If you change the tag in the user interface for a running server, it will not take effect. Only if you start a new server of that type, will the tag be there. Unlike the API call, where you can set a tag on a running machine).</p>
<pre name="code" class="bash:nocontrols">curl -H 'X-API-VERSION: 1.0' -u [user@domain.com]:[password] \
-d 'resource_href=https://my.rightscale.com/api/acct/[account]/servers/[server_id]' \
-d tags[]=role:proxy=true https://my.rightscale.com/api/acct/[account]/tags/set
</pre>
<p>The part I struggled with most was how to call the API from within Python. Turns out httplib2 expects the Web server to respond slightly different than the RightScale server is. If you are using the following code, you will not be able to connect:</p>
<pre name="code" class="python">h = httplib2.Http()
h.add_credentials(user,password)
response, content = h.request(url, headers=headers)</pre>
<p>httplib2 will connect to the Web server without sending the credentials. Only if the server challenges the client to use auth, it will then send the authentication headers. And this is precisely what RightScale is not doing. Therefore, you have to do the following in order to include the authentication headers in the first request already:</p>
<pre name="code" class="python">h = httplib2.Http()
import base64
base64string = base64.encodestring('%s:%s' % (user, password))[:-1]
headers['Authorization'] = "Basic %s" % base64string
response, content = h.request(url, headers=headers)</pre>
<p>Credentials are an interesting topic. I ended up creating a separate user in the RightScale interface that I am using for the APIs. Don&#8217;t be fooled though. These credentials still let that user log into the Web interface. I hope that RightScale will add a capability such that I can have a user that can only use the API.</p>
<p>I hope this helps you getting off the ground a bit quicker when using RightScale. Let me know how it goes. You can also find me on Twitter: <a href="http://twitter.com/zrlram">@zrlram</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.loggly.com/2010/03/rightscale-apis-with-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Django Middleware Munging</title>
		<link>http://www.loggly.com/2009/12/django-middleware-munging/</link>
		<comments>http://www.loggly.com/2009/12/django-middleware-munging/#comments</comments>
		<pubDate>Fri, 04 Dec 2009 21:07:57 +0000</pubDate>
		<dc:creator>Kord</dc:creator>
				<category><![CDATA[Code]]></category>

		<guid isPermaLink="false">http://www.loggly.com/?p=293</guid>
		<description><![CDATA[We&#8217;ve been hitting the code pretty hard of late at Loggly, and the beta is really starting to take shape on the development servers.  There&#8217;s lots to do, of course, so we&#8217;ve taken to using Unfuddle to track tickets, host our repository for code commits.  Later on we&#8217;ll use Unfuddle&#8217;s APIs to help track customer&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve been hitting the code pretty hard of late at Loggly, and the beta is really starting to take shape on the development servers.  There&#8217;s lots to do, of course, so we&#8217;ve taken to using <a href="http://www.unfuddle.com/">Unfuddle</a> to track tickets, host our repository for code commits.  Later on we&#8217;ll use <a href="http://unfuddle.com/docs/api">Unfuddle&#8217;s APIs</a> to help track customer&#8217;s feature requests and tickets.  Here&#8217;s a screen cap of our latest commit timeline:</p>
<p><a href="http://www.loggly.com/wp-content/uploads/2009/12/commits.png"><img class="alignnone size-full wp-image-294" title="commits" src="http://www.loggly.com/wp-content/uploads/2009/12/commits.png" alt="commits" width="600"/></a></p>
<p>One of the things you&#8217;ll notice when you use Unfuddle is the presence of a subdomain in the URLs you use on the site.  Our subdomain is &#8216;loggly&#8217; on Unfuddle, and we  log into our project area by going to <a href="http://loggly.unfuddle.com/">http://loggly.unfuddle.com/</a> (no, you can&#8217;t check our code out).  This type of customer segmentation allows for multiple unique usernames per customer, but doesn&#8217;t require a unique username site-wide.  For <a href="http://groups.google.com/group/Google_Webmaster_Help-Indexing/browse_thread/thread/32838dd5f6bb9cd8">non-SEO sections</a> of the site, this is a perfect solution.</p>
<p>We are taking a similar approach with Loggly, where a user will sign up for an account and define a unique customer identifier (we&#8217;re kicking around calling this a &#8220;mill&#8221;), which will then be mapped to a subdomain on the system.  So, for example, if Foobar, Inc. were to sign up for a Loggly account, they would access the site via <strong>http://foobar.loggly.com/</strong>, and then could create any number of user/pass combinations they wanted to access their company&#8217;s log resources.</p>
<p>The only problem with this approach is that we use <a href="http://djangoproject.com/">Django</a>, and their <a href="http://docs.djangoproject.com/en/dev/topics/auth/">built in auth system</a> (which is fantastic, BTW) doesn&#8217;t really have facilities for this type of functionality.  While we could certainly hack the Django auth system by writing our own multi-tenant auth module, it would take away from more pressing issues &#8211; like launching the beta!</p>
<h3 id="toc-enter-the-middleware-solution">Enter the Middleware Solution</h3>
<p>One way to solve this is by munging the subdomain and username together, which provides a unique system-wide username.  If, for example, you were to log in as <strong>steve</strong> under <strong>foobar.loggly.com</strong>, then we&#8217;d stick them together to be something like &#8220;foobar_steve&#8221;.  Obviously we can&#8217;t have everyone remembering this long monstrosity for their username, so we&#8217;ll need to munge the subdomain off the URL and the username the user types in to get the correct combination to send off to the auth system.</p>
<p>Thankfully Django provides a <a href="http://docs.djangoproject.com/en/dev/topics/http/middleware/">super-easy way to add middleware</a> to a project.  By injecting a small piece of code into the request from the user&#8217;s browser, we are able to do our on-the-fly transformation before the auth system takes over.  Nobody is the wiser because we can modify the display name code in the <a href="http://docs.djangoproject.com/en/dev/topics/auth/#storing-additional-information-about-users">profile model</a> to show the &#8220;normal&#8221; username to the user.  Here&#8217;s what the result looks like:</p>
<pre name="code" class="python:nocontrols:nogutter">
settings.py:
...
MIDDLEWARE_CLASSES = (
'loggly.profile.MungeMiddle.MungeForMillMiddleware',
...
)
...

MungeMiddle.py:
class MungeForMillMiddleware:
    def process_request(self, request):
        if request.POST.has_key('username'):
            data = request.POST.copy()
            user = "%s_%s" % (request.META['HTTP_HOST'].split('.')[0], data['username'])
            data['username'] = user
            request.POST =  data
</pre>
<p>When a request comes in, we pull out the POST data and make a copy of it with <strong>.copy()</strong>.  We then munge up the username with the subdomain out of HTTP_HOST, and then set the POST data to forward on to the rest of the stack.  We don&#8217;t do this for all requests, just ones with the username set, so it&#8217;s lightweight enough for production use.  We end up sticking the shorteded version of the username into the <a href="http://docs.djangoproject.com/en/dev/topics/auth/#storing-additional-information-about-users">profile table</a>, and use it for display. </p>
<p>So there you have it.  A 5 minute fix for a 5 hour problem.  I&#8217;m sure there are more elegant solutions to doing subdomain segmentation with Django&#8217;s out-of-the-box auth system, but frankly we don&#8217;t have time to stop and code them up.  We&#8217;re bent on getting our beta out as soon as possible, and if it requires hacks like these to do it, then so be it!  Release early, release often.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.loggly.com/2009/12/django-middleware-munging/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Coolcam for Your Log Files</title>
		<link>http://www.loggly.com/2009/09/coolcam-for-your-logfiles/</link>
		<comments>http://www.loggly.com/2009/09/coolcam-for-your-logfiles/#comments</comments>
		<pubDate>Fri, 11 Sep 2009 17:02:09 +0000</pubDate>
		<dc:creator>Kord</dc:creator>
				<category><![CDATA[Code]]></category>

		<guid isPermaLink="false">http://www.loggly.com/?p=198</guid>
		<description><![CDATA[I&#8217;ve been talking about coolcams for years as a way to help quickly show off a product&#8217;s features.  Coolcams aren&#8217;t meant to be useful &#8211; they exist simply to entertain and engage your audience.  It&#8217;s like an elevator pitch for your product demo.

Last year I did a coolcam mashup based around Poly9&#8217;s Flash [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been talking about <a href="http://thedailywtf.com/Articles/The-Cool-Cam.aspx">coolcams</a> for years as a way to help quickly show off a product&#8217;s features.  Coolcams aren&#8217;t meant to be useful &#8211; they exist simply to entertain and engage your audience.  It&#8217;s like an elevator pitch for your product demo.</p>
<p><a href="http://www.flickr.com/photos/glimmer/2429711287/"><img src="http://www.loggly.com/wp-content/uploads/2009/09/2429711287_a25fd6a0e3.jpg" alt="2429711287_a25fd6a0e3" title="2429711287_a25fd6a0e3" width="500" height="333" class="alignnone size-full wp-image-215" /></a></p>
<p>Last year I did a coolcam mashup based around <a href="http://globe.poly9.com/">Poly9&#8217;s Flash Globe</a> and events from my web server&#8217;s log files.  I never did get around to publishing the code, but the URL was accessed 100s of times by the sales guys I worked with.  If it ever was down, or broken, I&#8217;d get an email from then in minutes.  As it turns out, a lot of them used the globe to start conversations with their customers.</p>
<p><strong>Loggly Globe</strong><br />
Flash forward to present day.  I&#8217;ve completely rewritten the code and put it up on Loggly&#8217;s site to share with everyone.  The way <a href="http://www.loggly.com:8001/static/globe.html">the globe</a> works is pretty simple.  Using the <a href="http://webpy.org/">web.py framework</a>, it starts a web server which does two things.  First, it serves up the HTML to your browser, which includes the Poly9 globe object and the jQuery library.  Second, it serves up a JSON object to the page which is parsed and sent to the globe object.  The code that serves up the JSON object does a few magical things for you:</p>
<ul>
<li>tails yor web access log file for visits</li>
<li>parses out the ip address, timestamp, etc. from the log event</li>
<li>takes the ip address and does a geoip lookup on it</li>
<li>removes duplicate visits from a single ip address</li>
<li>wraps the whole thing up in a JSON object</li>
</ul>
<p>While Loggly Globe is hard coded to parse our logs, it should be fairly easy to use it yourself.  To get started, <a href="http://www.loggly.com/wp-content/uploads/2009/09/globe_1.0.tar.gz">download the tarball for Globe 1.0</a>, and then extract it somewhere on your server:</p>
<blockquote><p>kord@loggly&gt; <strong>tar xvfz globe_1.0.tar.gz</strong></p></blockquote>
<p>You may need a couple of Python libraries installed.  Assuming you have easy_install installed you can run:</p>
<blockquote><p>kord@loggly&gt; <strong>easy_install web.py</strong><br />
&#8230;<br />
kord@loggly&gt; <strong>easy_install httplib2</strong></p></blockquote>
<p>You&#8217;ll want to edit the <strong>globe.py</strong> file and modify the location of the Apache log file to point to your local log file.  You&#8217;ll also want to edit the regular expression extractions to match your log file format.  Here&#8217;s a line out of our logs for reference, and the corresponding extractions, most of which were pulled from <a href="http://seehuhn.de/blog/52">Random Encounter</a>.  Make any changes you need to match the regex up with your logs.</p>
<blockquote><p><code>75.101.142.96 - - [11/Sep/2009:09:19:17 -0700] "GET / HTTP/1.1" 200 6196 "-" "collectd/4.4.2" 195546</code></p>
<p>parts = [<br />
&nbsp;&nbsp;&nbsp;r'(?P\S+)',                    # host %h<br />
&nbsp;&nbsp;&nbsp;r'\S+',                          # indent %l (unused)<br />
&nbsp;&nbsp;&nbsp;r'(?P\S+)',                    # user %u<br />
&nbsp;&nbsp;&nbsp;r'\[(?P.+)\]&#8216;,                 # time %t<br />
&nbsp;&nbsp;&nbsp;r&#8217;&#8221;(?P.+)&#8221;&#8216;,               # request &#8220;%r&#8221;<br />
&nbsp;&nbsp;&nbsp;r&#8217;(?P[0-9]+)&#8217;,             # status %&gt;s<br />
&nbsp;&nbsp;&nbsp;r&#8217;(?P\S+)&#8217;,                     # size %b (careful, can be &#8216;-&#8217;)<br />
&nbsp;&nbsp;&nbsp;r&#8217;&#8221;(?P.*)&#8221;&#8216;,                 # referer &#8220;%{Referer}i&#8221;<br />
&nbsp;&nbsp;&nbsp;r&#8217;&#8221;(?P.*)&#8221;&#8216;,                   # user agent &#8220;%{User-agent}i&#8221;<br />
&nbsp;&nbsp;&nbsp;r&#8217;(?P[0-9]+)&#8217;,               # stuff at end<br />
]<br />
</code></p></blockquote>
<p>Now you'll want to start the server.   You can specify a port number to listen on if you want:</p>
<blockquote><p>kord@loggly&gt; <strong>cd globe</strong><br />
kord@loggly:/globe&gt; <strong>python globe.py 8001</strong></p>
<p>http://0.0.0.0:8001/</p></blockquote>
<p>Try hitting <strong>http://yourserver:8001/json</strong> and see if you get a response back.  Here's an example of what you should see: <a href="http://www.loggly.com:8001/json">http://www.loggly.com:8001/json</a>.  Here's the <a href="http://www.loggly.com:8001/static/globe.html">demo</a> again, if you just want to skip to the good stuff.  Additional work could be done to integrate the code into an Lightty or Apache install to make it more permanent.  You can read more about doing that on <a href="http://webpy.org/cookbook">Web.py's cookbook page</a>.</p>
<p>Once we get the beta launched, you'll be able to make mashups like these with your own log files.  We're looking forward to doing more coolcams like this with Loggly!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.loggly.com/2009/09/coolcam-for-your-logfiles/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
