How to Solve Privacy, Security and Performance In Logging
In the series so far, we’ve covered a fair bit of ground. We’ve raised questions about how much to log, and what not to log. We’ve started to see how logging can be implemented in Ruby, PHP, and Python; and we’ve started to see how to create a flexible approach, which accounts for a range of different needs; such as environments, information priority, filtering, formatting, and processing.
So far, it’s been a good balance of theory and practice. Here in part three, I want to look at three more questions, ones which have a direct impact on two crucial areas: privacy and security. We’ll walk through key logging decisions like which protocol to use, whether to encrypt messages, and how these choices impact performance. I’ll include examples in each section, and wrap up with an example app that logs memory usage.
I’m going to use PHP’s excellent Monolog library as an example, but the concepts apply to any language. If you’re not familiar with it, Monolog is a very feature-rich library, providing all the support we’ve seen available with Ruby and Python. Monolog supports the following features:
- Sending messages with an accompanying priority
- Adding multiple handlers which can log messages if they’re at a set level or above
- Formatters, which helps the information written make sense and be more easily digestible later
- Processors, which can inject a range of content, automatically, in to the messages written
It can also log to sockets, inboxes, databases, and a range of web services; use PHP’s native email handler as well as to Pushover, Mandrill, HipChat, Flowdock and Slack; and integrate with logging services such as Loggly, Rollbar, New Relic, Sentry, and Graylog. I strongly encourage you to check it out.
Let’s start by setting up the basic foundation for logging in Monolog, and then I’ll dive into more advanced concepts. First, I’ll replicate the Python logging example from Part Two of my blog series.
The code starts by creating a new logger object and then adding a stream (or file) handler, setting the minimum priority level to debug. It then adds a Syslog handler, so that all messages will also be sent to the system’s Syslog daemon as well as to the log file. Because the Syslog handler is set to the INFO priority level, the stream handler will log more information than the Syslog handler. We then add a formatter, so that the information written will contain the message’s timestamp, channel, level and the message. So you can see that with the help of Monolog, PHP is back up on par with Ruby and Python’s built-in logging capabilities. Now let’s consider the three new questions and look at how how you can address them with Monolog.
What Protocol Are You Using?
Have you considered the protocols which you’re using to log information? Or have you not really given much thought to it? In the first example we added a Syslog handler to the list of formatters. The handler uses the TCP protocol which, all but, guarantees transmission of the message content. But is that the right choice in every situation? Consider the following two questions for a moment:
- Do you need to guarantee the delivery of debug messages?
- Can you afford not to have messages with a priority from error and above not recorded?
Have you considered using the UDP protocol instead? If you’re not familiar with UDP, it’s quicker than more traditional TCP, but it doesn’t guarantee that the message will be delivered. Here’s a quick summary of the differences from Cyberciti.biz:
|Messages are guaranteed to arrive||Messages aren’t guaranteed to arrive and may get lost along the way|
|Messages arrive in a set order||Messages can arrive in any order|
|TCP requires overhead to ensure that all the parts of the message arrive and are put together in the intended order.||UDP had no packet ordering, and doesn’t track connections. Effectively it’s a fire and forget situation.|
|Data is sent as a stream||Data packets are sent individually|
In addition to the Syslog handler, there’s also a SyslogUdpHandler, which uses the UDP protocol instead. Let’s replace the original SyslogHandler with the UDP version. In the example code, replace the Syslog handler code with the code below:
This will instead log to a Syslog server listening on localhost, on port 514.
Should You Send Messages Encrypted or as Plain Text?
In Part One, I asked the following questions:
- Would you include social security numbers, credit card numbers, or passwords?
- Would you include database credentials?
- Would you include remote API credentials?
If you’re going to log information, what considerations do you make to obfuscating the data you send? Do you extract certain parts, or just avoid sending the logs in the first place? However is it an option for you to discriminate what you do and don’t send?
What if you believe some information is absolutely essential for errors to be debugged in a meaningful time period? Have you considered encrypting the communications channel or using handlers and services which allow for encrypted transmission? Perhaps you thought it wasn’t possible.
Let’s see how we can adjust the code to send log messages to Loggly over HTTPS with the LogglyHandler, which handles encrypted connections.
You can either add the code above as a supplemental handler or as the only handler. It uses the LogglyHandler class, along with the LogglyFormatter, to handle the sending and formatting of the information sent. If you’re not sure where to find your token, check out the developer documentation. A number of the other services supported by Monolog support encrypted connections, such as Mandrill, SwiftMailer, and Flowdock.
What’s the Performance Overhead of Security and Privacy Choices?
This leads to the next question: Whilst plain text is insecure, it requires less overhead, as there’s no need to encrypt before transmission and decrypt afterwards. Depending on your hardware infrastructure and application’s needs, can your application handle the processing overhead that a secure, encrypted, transmission imposes? Perhaps this isn’t a consideration for you. There may be only a negligible difference between one format and the other. If you’re in that position, you may be able to move forward without giving much thought to performance; but I think it’s still worth considering. Please bear in mind though that switching from HTTP to HTTPS isn’t a trivial operation, depending on how your site’s constructed, and that performance will be reduced, particularly when setting up the connection, even if only marginally. However, if your site works in an asynchronous or partly asynchronous manner, this may alleviate a fair percentage of the potential slowdown.
Example for Logging Memory Usage
Let’s now look at one final example, albeit a bit contrived. What we’re going to do is to add two processors. What they do is to inject extra information into the message logged. In the example below, we’ll use the memory peak usage and memory processors to record the memory used at the time of logging.
Here’s an example of what you’ll see in the Loggly dashboard:
Now, along with the information recorded, you can get an idea of the cost at that time. This is, to be fair, a simplistic example; you should also look at external analysis techniques, above and beyond the logging service itself.
Logging is a highly valuable, even essential, aspect of any modern application; no matter whether they’re web-based, or a more traditional native application. And modern languages and logging libraries provide us with a veritable cornucopia of choices, as we’ve seen in this series so far.
But just because we can do something, should we? When logging, are you consistent in your privacy and security model? Do we truly think about the consequences of the choices we make, both immediately, and over the longer term? Do we think about the choice of protocols we use, and whether they’re the best fit? Do you always log with a TCP, or would UDP better suit your needs?
Do you consider whether the information is encrypted, or is it all sent in plain text? What might happen if some of the log requests are intercepted? Finally, what are the performance impacts of the approaches you take? Are they significant, and how can they be reduced?
I encourage you to take time to revisit the choices you and your team have made in your application(s). Is it time to make some changes? In the next, and final, part of this series, we consider another logging scenario and wrap up our discussion of the outstanding questions we’ve raised so far. See you then.
Keep Reading This Series