Four key considerations to guide your logging approach in PHP, Python, or Ruby
Matthew Setter is a freelance technical writer helping businesses create documentation that developers need to really use their platforms to the full. He’s also the editor of Master Zend Framework, where you can learn everything there is to know about the Zend Framework. Follow him on Twitter: @settermjd.
Logging Is the Foundation for Solving Operational Problems
Software applications are often complex pieces of instrumentation and architecture. They can be single-purpose scripts for cleaning up clicks on an affiliate ad network which are run on a periodic basis or high-usage APIs for e-commerce, big-data, or higher education purposes.
But whatever they are, things can and do go wrong, whether as a result of a logic error in the application or something outside of its control. These can include such things as a full hard disk, corrupt memory, or the inability to connect to a remote service that is critical to the running of the application.
At times like these, it’s handy to have a record of what was going on both before and after the event, as well as what the actual event itself was. That way, systems administrators and developers alike are well equipped to track down the source of the issue and work to avoid it happening again.
Without information about the event, how can anyone know what happened and how to fix it? It’s safe to say that logging is a must.
Four Key Questions Should Guide Your Logging Efforts
But before we jump in to actually logging data, we need to stop and consider four key questions. These are:
- How much is too much?
- How much is too little?
- What information should be logged?
- What information should be obfuscated or even avoided?
These questions will help guide us in our logging efforts and also provide a sanity (read reality) check over the longer term.
What Information Should be Logged?
Before you start logging, look at your application, your business, your organization and consider what the needs are. How much do you need to know? What are the essential processes and functionality? What kinds and forms of information will best help your developers and support staff address issues quickly and proactively?
And in addition to logging the problematic aspects of the application, it’s also good to log when things go well. Even if it’s only a simple two word message descriptor that a particular section of code didn’t have a problem. So try to keep a balance of the positive and negatives when deciding on a logging strategy.
How Much is too Little? How Much is too Much?
Just how much information do you need? How much is enough? If we log anything and everything we’ll end up with what’s called white noise. If you’re not familiar with the term, in this context it means “meaningless or distracting commotion, hubbub, or chatter”.
In a logging context it’s the situation where you have so much information to wade through, information without rhyme, reason, or sufficient context, that it can make what you do have next to useless.
So before you open the floodgates and log anything and everything, consider carefully what you’re going to store. For instance:
- Do you need all that data?
- Does it have a structure which can be parsed and indexed efficiently?
- Are you just logging for the sake of it?
Take care about just how much information is logged. Whilst you don’t need to know everything, you do need data. So take care to review what gets stored. Jon Gifford wrote in some depth on this back in January.
In the post he made a strong case for ensuring that you’re logging throughout your applications at all levels; and shared 8 recommendations for doing it properly. Here’s a sample:
- Instrumentation is NOT a substitute for profiling, and vice versa
- Treat instrumentation as an ongoing, iterative process where you start by logging everything at a high level and then add deeper instrumentation
- If possible, always log enough context so that you get the complete picture of what happened from a single log event
- Flying slower is better than flying blind
What are your thoughts on the best practices? Can you suggest some more?
What Information Should be Obfuscated or Avoided?
When you use a logging service, other than something on-premise, the data is effectively “in the cloud”. Whilst cloud-style services offer a lot of benefits, convenience being one of the most-enjoyed ones, your data is not under your sole control. And even when you house data on-premise, it can be vulnerable to inappropriate access.
I’m not wanting to frighten you; but what happens if someone, other than staff at your company, gains access to that information? What if someone from your company shares, whether intentionally or otherwise, some or all of your log data? Are you prepared for that eventuality?
Would you include social security numbers, credit card numbers, or passwords? Would you include database credentials? Would you include remote API credentials? How much information about your database schema would you include?
Getting Started with Logging
In an upcoming three-part series, I’ll walk you through some best practice logging techniques, with special emphasis on three of the most popular scripting languages available today: PHP, Python, and Ruby.
The reason for this collective approach, is that whilst the syntax differs between the three languages, their similarities often outweigh their differences from a 5,000-foot view. Plus, as they’re the most popular languages, I don’t want one community to feel left out or disadvantaged by their language of choice not being covered.
We’re going to look at how to use them best to log data in applications, with a particular emphasis, naturally, on interacting with Loggly. To do this, I’m going to split the articles up over a three different areas, specifically:
Here we’ll look at the pros and cons of using the particular language’s built-in logging options for logging data.
In this section, we’ll see how to consider whether using third party logging libraries can provide more flexibility and functionality than what the language’s offer natively.
Finally, we’ll look at how different process models, such as using queuing systems, can make logging more effective and scalable as applications grow in size and complexity.
Logging is incredibly important, especially given the complexity of modern applications; but where is the line for you, between too little and too much? And how do you balance the need for information with its security? By considering these four questions you and your team will lay the foundations of successful logging. See you in Part 2.
Keep Reading This Series