Securing your Web Application with httponly cookies OR How Apache.org and Atlassian could have been secured

Attack

The other day I was reading about the Apache and Atlassian hack. Max wrote a really nice summary of how that attack could have been prevented. One of the points he raised was that they should have used HTTPONLY cookies.

I then realized that we might have the same problem with Loggly. After some traffic dumping of our Web sessions, I realized that Django didn’t support httponly cookies. A quick google search revealed that someone wrote a djangosnippet to add httponly cookies. I had to slightly rewrite it, so here is the code I am using:

class cookie_httponly:
    def process_response(self, request, response):
        scn = settings.SESSION_COOKIE_NAME or 'sessionid'
        if response.cookies.has_key(scn):
            response.cookies[scn]['httponly'] = True
        return response

Don’t forget to add the middleware right before the SessionMiddleware. If you are using Python 2.6 or higher, you are done. Unfortunately, we are running Python 2.5, which does not support the httponly flag on cookies. A quick patch solved that problem as well:

--- /usr/lib/python2.5/Cookie.py   (revision 66233)
+++ /usr/lib/python2.5/Cookie.py   (working copy)
@@ -408,6 +408,9 @@
     # For historical reasons, these attributes are also reserved:
     #   expires
     #
+    # This is an extension from Microsoft:
+    #   httponly
+    #
     # This dictionary provides a mapping from the lowercase
     # variant on the left to the appropriate traditional
     # formatting on the right.
@@ -417,6 +420,7 @@
                    "domain"      : "Domain",
                    "max-age" : "Max-Age",
                    "secure"      : "secure",
+                   "httponly"  : "httponly",
                    "version" : "Version",
                    }

@@ -499,6 +503,8 @@
                 RA("%s=%d" % (self._reserved[K], V))
             elif K == "secure":
                 RA(str(self._reserved[K]))
+            elif K == "httponly":
+                RA(str(self._reserved[K]))
             else:
                 RA("%s=%s" % (self._reserved[K], V))

Loggly is now more secure against XSS attacks!

2 Comments

Visualizing your Data in the Cloud with Loggly and HighCharts

A short while into writing code for the Loggly interface we decided that we needed some eye candy. Given my background in visualization, I was keen on providing our users with an experience that helps them understand their data in an intuitive way.

Over the last few years I’ve been looking into a ton of visualization libraries for the Web. In the past, if you had asked me what library to use for generating charts on your Web site, I would have said, “Use Flash”. While there are a number of interesting Flash libraries out there, the landscape has shifted significantly in the last year. Everyone is moving to JavaScript. After some research, I opted to use a JavaScript charting library called HighCharts. I tried a bunch of other canvas-based libraries, but let me tell you without hesitation, HighCharts rocks.

I am going to show you how we are using HighCharts and how I implemented zooming to dynamically reload more event data on the fly. With any charting library, if you keep zooming in on a chart, it will not progressively load more detailed data. At detailed zoom levels you end up with a small range of data in your graph. Basically if you view a day’s data first, and then zoom into a specific minute, you would only see one data point.

To start, here’s the JavaScript I use to display a chart:

var parse_date = function(data) {
    var result = [];
    $.each(data, function(key, value) {
        var re = new RegExp(/(\d+)-(\d+)-(\d+)T(\d+):(\d+):(\d+)(?:\.(\d+))?/);
        var date = re.exec(key);
        if (date[7] == undefined) {date[7]=0;}
        var real_date = Date.UTC(date[1], parseInt(date[2])-1,date[3],date[4],date[5],date[6],date[7]);
        result.push([real_date, value]);
    });
    return result;
}

chart = new Highcharts.Chart({
    credits: { enabled: false },
    chart: {
        renderTo: 'activity',
        defaultSeriesType: 'area',
        margin: [10, 20, 40, 55],
        zoomType: "x",
            events: {
                selection: function(event) {
                    // change the time frame to be searched
                    var start = Highcharts.dateFormat('%Y-%m-%dT%H:%M:%SZ', event.xAxis[0].min);
                    var end = Highcharts.dateFormat('%Y-%m-%dT%H:%M:%SZ', event.xAxis[0].max);
                    $.ajax({ type: "GET", url: "http://subdomain.loggly.com/api/search/?" \
                        + "q=inputname:logglyapp&starttime="+start+"&endtime="+end \
                        + "&facets=True&buckets=24",
                        success: function(data) {
                             chart.xAxis[0].setExtremes();
                             chart.series[0].setData(parse_date(data));
                             // fix the reset zoom button
                             $('.highcharts-toolbar').click(resetZoom);
                        },
                        error: function(req, text, error) {
                            $("#err").html("Reload error!");
                        }
                    });
                }
        }
    },
    xAxis: { title: { text: 'Time' }, type: 'datetime' },
    yAxis: { title: { text: '# Events' }, min:0,
        plotLines: [{ value: 0, width: 1, color: '#808080' }]
    },
    tooltip: { formatter: function() {
            return Highcharts.dateFormat('%B %e %Y %H:%M:%S', this.x) + '
'+ ''+this.y+' Events' }}, plotOptions: { area: { dataParser: parse_date, } }, series: [{ id: 1, name: 'search', dataURL: 'http://subdomain.loggly.com/api/search/' + '?q=inputname:logglyapp&facets=True'}], title: { text: 'traffic last 24 hours' } }); var reset_zoom = function() { // requery for the original data: $.ajax({ type: "GET", url: "http://subdomain.loggly.com/api/search/" + "?q=inputname:logglyapp&facets=True", success: function(data) { chart.toolbar.remove('zoom'); chart.xAxis[0].setExtremes(); chart.get(1).setData(parse_date(data)); }, error: function(req, text, error) { $("#err").html("Loading error!"); } }); } });

Let’s have a quick look at the code. There are two things I want to communicate here: 1. The code I used to display a HightChart graph and 2. The way I am using Loggly’s APIs to query the data.

I mentioned the special zooming that I implemented. Take a look at lines 20 to 39. This is the function that handles zooming, and it is where I am reloading the more detailed data. I set the new start and end dates (lines 23 and 24) and then I am querying the Loggly API with the new timeframe (lines 25 to 27). Upon success – this is important – I am using the chart.series[0].setData() method to set the new data for the chart. The next line overwrites the default button or a link that lets the user zoom out again (lines 32). Note: because you are implementing your own zoom, the default “reset zoom” button from HighCharts will not work anymore and you have to implement your overwrite it with your own function to reset the chart.

The function dealing with the reset functionality is on lines 59 to 72. It does nothing else than query the Loggly API for the original data (I am passing no time parameters) and setting the data just like the previous call. The other thing you have to do is in lines 64 where you need to remove the HighCharts default “reset zoom” link and reset the extremes (line 65).

Moving on, we’ll briefly discuss the way I’m using the Loggly API. If you’d like to use it, you need an account with us. We are currently in private beta, therefore you will need us to give you access to the beta program in order to do so. Email if you want an account to play around with! Back to the code. Make sure you replace the with your actual subdomain. Now that this is out of the way, you can query the API by simply making a GET request to: /api/search. You pass the q parameter with your query. In my example I am getting all the data from my input with the name logglyapp. To get timeline data, you’ll need to pass the parameter facets=True into the call. This will give you counts for time buckets.

To make everything work together, you need one more piece: the date_parse function. You need this part because the Loggly API returns the data with real human readable timestamps and HighCharts wants UTC encoded timestamps. The function on lines 1 to 11 takes care of converting the time for you. Just copy it.

I hope this was useful. Let us know if you are having trouble with any of this. We are looking forward hearing about your graphing endeavors.

If you look at my del.icio.us feed, you’ll find a bunch more visualization and charting links.

4 Comments

Fixing Client IPs in Apache Logs with Amazon Load Balancers

If you are running your Web servers behind a load balancer, you have probably noticed that your logs contain the load balancer’s IP address as the client IP, which is kind of annoying. There is an Apache module called RPAF, which fixes exactly this issue. Once you have it downloaded and installed, you configure it as follows:

RPAFenable On
RPAFsethostname On
RPAFproxy_ips 127.0.0.1 10.0.0.1
RPAFheader X-Forwarded-For

The module works such that it takes the IP address that is being transmitted in the X-FORWARD-FOR header and sticks it into the request as the ClientIP.

The not so nice part is that the RPAFproxy_ips statement lets you define the IP addresses of your load balancer. If you have only a set amount of them, all is fine. But here comes the catch. If you are deployed on Amazon AWS and you are using their load balancer (LB), then this solution is going to frustrate you quite a bit. You will notice that the LB’s IP address changes constantly and you keep adding IP addresses to the configuration statement. After about 10 IP addresses, I got sick of that and I started looking at the source code of RPAF to solve this problem once and for all. Here is what I did:

On line 139 of mod_rpaf-2.0.c, I added a return 1; statement. This will tell the is_in_array() function to always assume that the request is coming from a load balancer, without checking the configured list of IP addresses. The rest of the RPAF code is robust enough to only replace the client ip when an X-FORWARD-FOR header is actually set. After the change, do a make install-2.0 and you are in business.

Happy logging!

2 Comments

How to use RightScale APIs with Python

I have been quiet for long enough on this blog. It’s time for me to share some things that I learned in the last few months while I was working on Loggly’s Application layer. Lately, I spent some quality time with Django and consequentially Python.

What I want to focus on today is our integration with RightScale. At Loggly, we use RightScale to manage our AWS instances. Loggly runs three types of servers. (Well, I am simplifying). We have a proxy tier which receives your log messages. The proxy tier, which is basically a bank of machines, forwards the messages to the indexing back end that runs Solr. The third group of machines are the Web or application servers. When a new proxy box comes online, the RightScale management interface knows about the box. I had to know about thse proxies on the application tier (i.e., within Django) as well. How do you do that?

The first solution would be to have the proxies register with Django, as soon as they get online. What happens though when they go down or are taken offline? Seems complicated to keep track of that. Another solution would be to periodically poll the proxies from Django. Not very nice either.

My solution is much more elegant. RightScale has two features that helped me out. The first one is machine tags. Each proxy server is labeled as such. (See Machine Tagging). Secondly, I am using the RightScale API to figure out how many proxies I have and what their IPs are. (As a side note, the RightScale APIs are in Beta right now. There might be changes or improvements coming down the pipe.)

I struggled for quite a bit with using the RightScale APIs out of Python. Here are some things that I learned the hard way and you might find helpful:

Using the API to query all your machines in a specific deployment:

curl -H 'X-API-VERSION: 1.0' -u [user@domain.com]:[password] \

https://my.rightscale.com/api/acct/[account]/deployments/[deployment_number]

Note how you have to add the extra header to request version 1.0 of the API.

Here is how you get all the machines that have a specific tag. Note the structure of my tag! I set role:proxy=true. You need to use this hierarchical model!

curl -H 'X-API-VERSION: 1.0' -u [user@domain.com]:[password] -d'resource_type=server' \
-d 'tags[]=role:proxy' https://my.rightscale.com/api/acct/[account]/tags/search.js

Want JSON output instead of XML, add “&format=js” at the end of your request!

Now, from the response, you would think you could just use that HREF to query an individual server. Wrong. That doesn’t work. You have to add “/settings” in order to make that work:

curl -H 'X-API-VERSION: 1.0' -u [user@domain.com]:[password] \

https://my.rightscale.com/api/acct/20184/instances/[instance_id]/status

Here is how you set a tag on a server: (Note: If you change the tag in the user interface for a running server, it will not take effect. Only if you start a new server of that type, will the tag be there. Unlike the API call, where you can set a tag on a running machine).

curl -H 'X-API-VERSION: 1.0' -u [user@domain.com]:[password] \
-d 'resource_href=https://my.rightscale.com/api/acct/[account]/servers/[server_id]' \
-d tags[]=role:proxy=true https://my.rightscale.com/api/acct/[account]/tags/set

The part I struggled with most was how to call the API from within Python. Turns out httplib2 expects the Web server to respond slightly different than the RightScale server is. If you are using the following code, you will not be able to connect:

h = httplib2.Http()
h.add_credentials(user,password)
response, content = h.request(url, headers=headers)

httplib2 will connect to the Web server without sending the credentials. Only if the server challenges the client to use auth, it will then send the authentication headers. And this is precisely what RightScale is not doing. Therefore, you have to do the following in order to include the authentication headers in the first request already:

h = httplib2.Http()
import base64
base64string = base64.encodestring('%s:%s' % (user, password))[:-1]
headers['Authorization'] = "Basic %s" % base64string
response, content = h.request(url, headers=headers)

Credentials are an interesting topic. I ended up creating a separate user in the RightScale interface that I am using for the APIs. Don’t be fooled though. These credentials still let that user log into the Web interface. I hope that RightScale will add a capability such that I can have a user that can only use the API.

I hope this helps you getting off the ground a bit quicker when using RightScale. Let me know how it goes. You can also find me on Twitter: @zrlram

0 Comments