Troubleshooting Event Bursts
Occasionally our customers will have an unexpected burst of log volume that will use up their entire day’s volume in just minutes. It can also cause delays in indexing or dropped events if they are over their limit. Most of the time it’s due to a configuration problem. We want to help you understand why this happens, and how to prevent it.
How To See What Is Bursting
Look at your Dashboard Summary widget to see a count of events during the time the burst happened. You can also do a “*” search for all events during the time period to see a count of events. If you see a spike in events, click on it to see the events in the search events view. If there are many repeating messages or errors, see what component they are coming from and what caused the problem.
Reasons Why Bursting Happens and How To Fix It
Deploying a new production environment
If you previously only installed Loggly in a smaller environment or test system, and then pushed to several tens or hundreds of servers without increasing your account volume limit first you can easily exceed your limit in a few minutes.
Before doing a big deployment for the first time, check your log volume on your Account Overview page. This will tell you the volume from one server, and help you calculate how much volume you need for all your servers. It’s ok to pick a higher volume to get started and then come down once you get a better understanding of your volume needs. If you are still in our free trial there is no charge to you. However, free trials have a limit of 5GB/day until you talk to the sales team and request a higher account.
“Spinning” daemons or services
Your service can spin when there are missing resources or code errors, and there is a supervisor service that automatically restarts your service when it stops. This happens it can easily use it all the CPU resources on your server, and produce GB of error logs in minutes. It’s most likely to happen just after a deployment that failed or has errors in it.
To fix the problem, you should consider killing your supervisor process to stop new logs from generating, and then roll your service back to the state when it was last working well.
Leaving debug level logging on in production code
Generally people are used to logging info or warn level logs in their production environments, and use debug level logging in their development or QA environments. Sometimes there can be a misconfiguration that causes debug logs to be turned on in production environment unexpectedly. This can generate many times the normal amount of logs and use up the entire daily limit in minutes.
To fix the problem, reset the log level in your application configuration to info or warn. You can also have your syslog daemon like rsyslog filter out debug logs before sending them to Loggly. It’s usually ok to turn debug level logging on in production for short periods, as long as it fits in your account plan terms.
Sending large old log files
If you setup file monitoring, such as for Apache access logs, sometimes you may not be aware that you have stored old log files that are very large. It’s possible for people to have many GB of logs spread across tens or hundreds of servers, and when they deploy syslog with file monitoring for the first time, they send all of them at once in a big burst. Many people configure log rotate to clean up old logs regularly, but sometimes it hasn’t run in a while or has never run.
To fix the problem, run log rotate or delete the old files before having your syslog daemon send them to Loggly.
What To Do If You Are Blocked Or Capped
- I Meant to Send That Much Data!
Use the Account Subscription tool to raise your account limit. You can also contact the sales team or submit a support request to have your limit raised.
- I Fixed the Problem
We will automatically test your volume every few minutes to see if it’s back to normal levels and will lift your cap automatically. If it’s not showing up, contact the sales team or submit a support request to lift the cap.
- I’m Not Sure What’s Wrong
Please submit a support request to have one of our support engineers check your account and offer advice on how to fix the problem.