Five Causes of Log Event Bursts and How to Prevent Them from Ruining Your Day

 

Five Causes of Log Event Bursts and How to Prevent Them from Ruining Your Day

Occasionally, Loggly customers experience unexpected bursts of log volume that use up the entire amount of their daily log volume in just a few minutes. These bursts can cause delays in indexing, or they can cause dropped events. In almost every case, event bursts are due to a configuration problem. Because event bursts can lead to you being blindsided by operational problems, you should understand why they happen and be on the lookout for trouble.

Screen Shot 2014-02-21 at 5.56.01 PM

Cause #1: Deploying a new production environment

You may have started a Loggly trial or test deployment on a single small environment. If you then push Loggly to tens or hundreds of servers without increasing your account volume limit first, you can easily use up all of your initially allotted data within a few minutes.

How to Fix It

Before doing a big deployment for the first time, check your log volume on your Account Overview page. You’ll see how much you’re currently using so that you can extrapolate to your intended deployment.

With Loggly, you can easily add or subtract volume to your plan directly through your subscription page, without having to talk to one of our friendly account managers. So feel free to be conservative – cover yourself with adequate volume and decrease it when you really understand where you are.

Cause #2: “Spinning” daemons or services

Missing resources or code errors can cause spinning, particularly after a failed deployment or one with critical errors in it. To make matters worse, a supervisor service will automatically restart your service when it stops. When this happens, you may see gigabytes of errors in minutes.

How to Fix It

Consider killing your supervisor process so that it stops generating new logs. Then, roll your service back to the state when it was last working well.

Cause #3: Debug-level logging on in production code

In general, development organizations log info or warn-level logs in production and use debug-level logging only in development and QA. If debug logs get turned on in a production environment, they will generate many times the normal (and expected) amount of log data.

How to Fix It

Unless you have a reason for using debug-level logs (and the daily volume limit to support it), you should reset the log level in your application configuration to info or warn. Alternatively, some customers use a syslog daemon to filter out debug logs before sending them to Loggly. (Although if you’re not centralizing your log data, you’re not really benefiting from it..)

Cause #4: Sending your old log files

File monitoring can generate old log files that are very, very large. When you then deploy syslog with file monitoring for the first time, all of these olds files come in a big burst. Multiply that by tens or hundreds of servers, and there goes your data volume.

How to Fix It

Run log rotate or delete the old files before having your syslog daemon send them to Loggly. You might want to configure log rotate to clean up your old logs on a regular basis.

Cause #5: Your application went viral

When word about your application spreads and usage suddenly spikes, the people in DevOps are the first to know. Loggly knows this is a happy problem, and we’re there to help. With the Peak Overage Protection feature available with our Pro plans, we will index your log spikes even if your total daily volume exceeds the volume available through your subscription. If you exceed your limit over a sustained period we’ll reach out to adjust your account; but we know that changes often happen with no warning.

How to Fix It

Visit the Subscription page in your Loggly application to upgrade to a Pro plan.

Help! I’m Bursting!

You can see a full count of events on your Dashboard Summary widget. From there, you can drill into a spike to see what events make it up. You will probably see repeating messages or errors that will lead you to the source.

At Loggly, we want you to have full visibility into your logs whenever you need it to solve operational problems, monitor the health or your application, track trends, or gain insight. The steps I have outlined here should prevent event bursting that has the potential to reduce this visibility and ruin your day.


Share Your Thoughts

Top