Tools and techniques for logging microservices
The microservice architecture is taking the tech world by storm. A growing number of businesses are turning towards microservices as a way of handling large workloads in a distributed and scalable way. In this post, we’ll look at methods for logging microservices and the tools that make them possible.
Microservices in a Nutshell
A microservice is an application consisting of multiple small, independent, intercommunicating components. Each component (known as a service) is modular, reusable, and self-contained and communicates with other services using language-agnostic protocols. Picture a typical e-commerce application: Over the course of a single transaction, the application needs to receive the user’s order, verify inventory, authorize payment, and coordinate delivery. Each task becomes a small, self-contained unit that communicates with other units through APIs.
The difference between traditional monolithic applications and microservices is that:
- Microservices are stateless. That means that an instance of a service can be created, stopped, restarted, and destroyed at any time without impacting other services. Any logging functionality we implement can’t rely on the service persisting for any period of time.
- Microservices are distributed. You’ll likely find yourself logging related data from two completely independent platforms. To log effectively, we need a way to correlate events across the infrastructure.
- Microservices are independent. A stack trace in a monolithic Java application, for instance, will bring you straight to the source of a problem. A stack trace in a service will only bring you to the start of that service, not the entry point of the microservice itself.
Techniques for Logging Microservices
When logging microservices, you have to consider that logs are coming from several different services. We’ll look at two ways to approach logging: Logging from within each service, and logging from the infrastructure.
1. Logging from Individual Services
Services work together to perform a specific function, but each service can be thought of as its own separate system. This means you can add your logging framework into the service itself, just as you would a regular application.
Let’s use the e-commerce platform from earlier as an example. We have a NGINX web server handling requests from the public Internet, a MySQL server storing customer and order data, and multiple PHP modules for processing orders, validating payment with a third-party processor, and generating the HTML that gets returned to the customer.
In our PHP modules, we could choose from a variety of logging frameworks such as log4php, Monolog, or the standard error_log function. If we break up these components into services, we can define a logging strategy inside each service that’s completely independent of every other service and of our overall logging strategy.
Within each service, we can append information to each log event to help identify where the event took place. For instance, we could append a field to our logs that records the name of the service that generated the event. This lets us easily group events from the same service.
The downside to this approach is that it requires each service to implement its own logging methods. Not only is this redundant, but it also adds complexity and increases the difficulty of changing logging behavior across multiple services.
2. Logging from a Central Service
In this model, services forward their logs to a logging service. Services still need a way to generate logs, but the logging service is responsible for processing, storing, or sending logs to a centralized logging service such as Loggly.
An example using Docker is the Loggly Docker image provided by SendGrid Labs. This image creates a container running a rsyslog daemon that listens for incoming syslog messages. Incoming events are immediately forwarded to Loggly, where they’re parsed and stored. You can even have multiple logging containers running simultaneously for load balancing and redundancy.
An alternative to implementing a logging solution for each service is to gather logs from the infrastructure itself. For instance, active Docker containers print log events to stderr and stdout. The Docker logging driver detects output generated on these streams, captures it, and forwards it through the configured driver. You can then configure a syslog driver that forwards log events from your containers directly to Loggly.
Tools like logspout also provide a means of collecting and forwarding distributed log events. Logspout is another Docker-based solution that runs as its own service. It attaches to other containers on the same host and routes events to a destination such as Loggly. The benefit of tools like logspout is their ease of use: Simply deploy them alongside your other services and they immediately go to work.
Tips for Logging Microservices
If you start logging your services right away, you’ll begin to notice a problem: Your logs are completely unaware of your microservice architecture. You’ll have data about the error itself but no information about the service that generated the log or even where in your architecture the log originated. This makes it extremely difficult to trace an error back to its source.
1. Store Enough Data
The very nature of microservices makes it difficult to pinpoint the source of log events. For instance, you might come across a critical event in your log history, but how will you know which microservice generated the event? Without context, tracing logs quickly becomes a game of whodunit.
An application using a logging framework stores attributes such as severity, the class and method that generated the event, and relevant content such as stack traces and exception messages. This is great for a monolithic program, but microservices add another dimension. We need a way of identifying the service that generated the event, not just the part of the service.
Let’s go back to our web service example from earlier. The PHP modules have been split into three services. We’ve generated a stack trace in one of them, but we’re not sure which one. One service is for processing user input, another is for processing business logic, and the third is for building the resulting HTML. A solution is to assign three unique identifiers to each service – php_input_processing, php_business_processing, and php_html_generation – and append those to new log events. This way, we know where in our architecture the stack trace originated from and we can simplify the troubleshooting process.
2. Correlate Events
We now know which service generated the log event and where in the service the event was generated. The problem now is finding out which actions led up to the event. We need a way of tracing a series of events back to the source, even if it means traversing multiple services.
A common solution to this problem involves attaching a unique identifier to requests as they enter the microservice. This request identifier, which only persists for the lifetime of the request, follows the request across each service and is appended to new log entries. By filtering on the identifier, you can easily follow a trail of events from the entrypoint all the way to the error.
Going back to our web service example, imagine a request comes into the NGINX service. The service generates a random request ID (such as 1234), which it appends to its log events. The NGINX service passes the request – along with “1234” – to the php_input_processing service, which then passes the request to php_business_processing. There’s an error in the code in the php_business_processing service, which is causing the web service to crash. When the service crashes, it creates a new log event and appends the request ID before forwarding the log to the logging service. Once the log is sent, the process stops, and the next incoming request receives a new request ID.
3. Log Your Deployment
Failures can – and often do – occur during deployment. In a microservice architecture, new code can be deployed in a matter of seconds. If the deployment fails, being able to identify that failure is crucial to the success of your microservice.
One way to validate a deployment is to test the status of the service shortly after deployment. If the test fails, it creates a log event with detailed information about the behavior of the service and the state of the architecture. This can provide insight into the cause of the failure, as well as possible solutions.
4. Set Alerts
Logs are often used as a source of historical data, but they can also be used to identify ongoing problems and predict potential issues. If a service fails or becomes unresponsive, you’ll want to be notified immediately. This can extend to more specific cases, such as receiving a notification if a web service is generating a large number of 404 errors.
With Loggly, you can generate alerts by filtering log events based on a pattern. Loggly will run the alert on an interval that you specify. You define a condition that triggers the alert, as well as a means of notifying the appropriate contacts. For example, if Loggly detects your web service logging an excessive number of 404 response codes in a short amount of time, you could email a summary of the problem to yourself or forward the alert to a third-party service such as PagerDuty.
Putting It All Together
What this process ultimately allows us to do is to generate microservice logs that are unique, traceable, and contain enough data to act on quickly and effectively. To create these logs:
- Append a unique identifier to requests as they enter the architecture.
- Identify the service that generates events.
- Generate events specific to each microservice as the request enters and leaves the service.
- Correlate events from each microservice to create an overall view of the request.
The result is a series of log events that contain (at a minimum):
- A request ID, which lets us trace a single request across our architecture
- A service ID, which uniquely identifies the service that generated the event
- Unique content associated with the event, such as messages, severity levels, class and method names, and stack traces
Andre Newman is a software developer and writer on all things tech. With over eight years of experience in Java, .NET, C++, and Python, Andre has developed software for remote systems management, online retail, and real-time web applications. He currently manages 8-bit Buddhism, a blog that blends technology with spirituality.