Observability vs. Monitoring – What’s the Difference?
Some of the earliest memories of my career involve getting spammed by hundreds of emails from monitoring systems. After managing to find an issue, I’d scroll through endless logs trying to find obscure bugs that are impossible to recreate. I’m sure you’ve been there. I’ve spent many hours, days, and even weeks pressing “Page Down” on my keyboard trying to find a clue pointing me toward the root cause. I’m here to tell you: there’s a better way.
Monitoring is great at letting you know something is wrong. However, investigation is often impossible without querying the system directly. Observability is similar, as the main purpose is to let you know something is wrong. It’s easy to confuse the two. Highly observable systems differ by also giving you enough information to fix the issue. Most importantly, observability helps you understand the internal state of the system from the outside. Each bug you fix helps you understand the internals of your application.
This post covers the main differences between monitoring and observability. It includes a comparison between the two and explains when it’s best to choose one approach over the other.
How Well Do You Understand Your System?
Observability isn’t a new concept. It’s part of quantum mechanics in physics and has been around for decades. In your application architecture, however, it’s the trendy new way of understanding your system. Wikipedia describes the mathematical concept of observability as “a measure of how well internal states of a system can be inferred from knowledge of its external outputs.” This description is also true when it comes to building applications. State refers to data in your application at different stages, and external outputs refer to traces or logs.
How well can you understand the state of a system without querying the internals? Those at the top end of the observability scale can describe, in detail, a specific user’s journey through their application using traces and logs. Those at the bottom of the scale likely need to query the database or application directly to understand the state of a user at different stages. But like it or not, you’re on the observability scale right now—you just have to decide where you want to be.
Monitoring Is About Surveillance and Detecting Changes
Monitoring is a broad subject. Wikipedia defines it as “the carrying out of surveillance on, or continuous or regular observation of, an environment or people in order to detect signals, movements or changes of state or quality.” You’ll note the definition actually includes observation. In practice, however, this means dashboards with stats like CPU, memory, and uptime. Additionally, this setup might include an alarm when a metric reaches a threshold. You could, for example, send yourself an email when you run out of disk space.
I always have a warm glow inside when I create complex dashboards. I show them off to others, proud of the complexity, but how useful are they? You might observe the memory on your server keeps maxing out, but you’d need to interact with the application to figure out what’s going wrong. This way of working is tedious. For a monolithic service, it might not be too onerous, but if you have multiple services, it becomes time-consuming.
Maintaining Modern Architecture
If you’re maintaining a service using a popular modern architecture such as serverless or Kubernetes, understanding your system through monitoring is difficult. Effective monitoring of these kinds of services is a huge trade-off for the benefits. However, this trade-off is often worth the effort—the benefits are widely written about. Capital One has some great write-ups of its experiences. The monitoring of these services is in its infancy. There are many new services trying to fill the gap, but we aren’t there yet. I’ve spent a huge amount of time debugging these kinds of systems. Most of my time is spent querying databases and adding debug logging, which normally has restricted access on production systems.
My Time Is Valuable, and Yours Is Too
I forget this regularly (probably because I’m having so much fun building things), but my time is valuable, and yours is too. Implementing a monitoring system over a highly observable system will take less time and cost less money. But will this up-front investment pay you back in the future? This is a hard question, especially if you’re new to this space. My advice is to try it out on something small. You might not get incredible time savings from this initial attempt, but the lessons you learn will help you on the larger projects.
Let Your Tools Do Most of the Work
To design a system on the upper end of the observability scale, you need to decide what information is useful before you log it. I realize this sounds like lots of work. However, many tools out there—like SolarWinds® Loggly®—are designed to do much of the work for you. Most of these services use an application performance management (APM) client to store key information, such as a failed API call, where it matters. They also give you the ability to store extra information.
Most tools will take the data from your system and present it in different ways to help you understand what’s happening inside your applications. You can see how stable each release is by viewing the errors per release or drill down to a specific user to understand their experience. This is only possible because you send all this information up front. To get a taste of what’s possible, here’s a list of the best observability tools.
Having a Highly Observable System Isn’t Always Required
After reading about all the benefits of having an observable system, you might wonder why anyone would choose monitoring. As with all choices when it comes to designing a system, there are no easy decisions. For example, if you have a small monolithic application running directly on a VM or server, it’s not worth adding the complexity of observability. You probably use loads of tools to get your job done—like GitHub, AWS, and IDE—and learning how to use a new one distracts you from giving your customers value. Sometimes your cloud provider’s default tools are good enough.
Observability Helps You Understand the Internal State of Your System
Observability can be measured with a scale of how well you understand the internal state of your system using external outputs. The closer you are to the top of the scale, the easier maintaining your system becomes. If you’re using a popular system architecture such as serverless or Kubernetes, increasing your observability will save you a massive amount of time.
Monitoring, on the other hand, focuses more on surveillance and notifications of state change, and it uses things like complex dashboards of server metrics such as CPU and memory usage. This is a great first step and may be all you need. Don’t unnecessarily overcomplicate things.
If setting up any system at all sounds daunting, you aren’t alone. However, there are many tools out there designed to do most of this work for you. I’d suggest finding one suited to your architecture and trying it out. Loggly is a great platform to start with. It includes the monitoring tools you’re familiar with, and once you’re comfortable with the service, you can integrate it with SolarWinds AppOptics™ for a full-featured APM solution. It’s worth a try.
The Loggly and SolarWinds trademarks, service marks, and logos are the exclusive property of SolarWinds Worldwide, LLC or its affiliates. All other trademarks are the property of their respective owners.