How to Monitor Third-Party Services Using Log Data
Web services are a fundamental part of many modern applications, from large-scale service oriented architectures that develop and maintain many internal and external services to small-scale systems that utilize third-party services to quickly and affordably augment their system’s functionality. Web services allow developers to quickly add functionality; however, they can be difficult to monitor and debug due to authentication restrictions, lack of testing environments, or intermittency of issues. Furthermore, the messages returned by third-party services are often not ones you would want to expose through your application’s user interface. In such scenarios, accurate, searchable logging can be an invaluable tool in solving problems.
What to Log
Logged exceptions give critical insight into unscheduled service outages and bugs. They are generally the first and easiest type of logging to implement since they naturally have an event already generated to log. They’re also a quick and accurate way to determine service outages or bugs in the third-party service or in your code calling a particular service. It’s important to gather as many important details from the generated exception as you can, including:
- Name of the service that generated the exception
- Exception message (Including inner exceptions)
- Request and Response bodies and headers
There are many great logging services out there to help aggregate your logged exceptions, or any other logged message, and automatically alert you to problems with your service integrations. The more data you provide in your logs, the better chance you have of quickly identifying and resolving the issue. Nothing makes a phone call to a service provider easier than being able to provide a specific time frame and exception message.
It’s good practice to configure a reasonable timeout for requests to any third-party applications. Without a timeout, third-party performance issues become your performance issues. However, timeouts are not a perfect solution because there is a generally a large gap between a normally acceptable response time and the absolute longest time that you’d want your app to wait before a timeout exception should be thrown. If the service often has periods of decreased performance but still returns a response before the timeout exception is thrown, these periods of degraded performance may go unnoticed. In situations such as these, it’s important to log the request service, endpoint, and request time. Historical logs of service calls can be used to establish baseline numbers for service requests. These baselines can then be used to determine increases in service calls with new service versions or changes to your integration code. Performance degradation can also be a warning sign of impending service outages.
Lack of Requests from the Service
Not all third-party integrations consist of your application making requests to their service. They may be sending their own requests to a service endpoint or FTP location you provide. In these scenarios, it’s important to monitor requests from the service. If you’re counting on the third-party service, it’s important to know if or when requests from them stop coming in. Many logging services allow you to trigger an event if there are fewer than a certain number of events in a given time period. This heartbeat mechanism is an easy method to alert you if the number of requests falls below the normal threshold. This is particularly important if the data you are providing only consists of updates to a larger data set that the service is operating on, as it can be an extended period before the outage is discovered. Few things are as embarrassing as having an external user ask if something is running and having to tell them that you didn’t know your own program wasn’t running.
How to Log
Most modern programming languages have several logging libraries to choose from. For example, the Java logging tool Log4J and its ports to other languages are great extensible libraries for logging messages of different severity levels to many possible storage locations.
If a logging service is being used, an SDK is generally provided. For simplicity, start with a class that wraps the logging service’s SDK with two public methods — one for plain string messages, and one for exception messages, with the only major difference being that the exception method takes an exception object parameter, while the plain message method takes just a simple string message parameter. At Speedway Motors, we’ve added a few more methods for some common logging cases in our specific application, but we’ve found that the first two methods can handle most situations you’ll run across. Both methods also take context information surrounding the web request that generated the log event so that the information outlined above can be logged with the message.
Logging is a fundamental part of any application and allows developers to “see” what is going on in their applications. From exceptions being thrown by the system to input and output data logging is paramount to understanding production environments and their health. This is particularly true when monitoring third-party applications, since they are by their nature out of your control. Log analysis of third-party service requests is a quick and easy way to expand your current logging solution to monitor the health and uptime of third-party integrations.
Ryan Ebke is the web development manager and lead architect for all of the web-based applications at Speedway Motors. His team is responsible for the development and maintenance of a custom e-commerce platform which handles all of Speedway Motors’ digital channels. Ryan is an alumnus of the Computer Science department at the University of Nebraska-Lincoln. His draws on a unique and diverse knowledge base of real-world business and development experience.