What Happens in Vegas … Comes Crashing Home
In December 2014, I was so excited! We had just launched the private beta of Karma, a consumer app that is reducing the stress and friction of using services in the sharing economy (Airbnb, DogVacay, Craigslist, and others). We decided to go to Vegas to celebrate our CEO’s birthday. After a fantastic two days there, I was waiting to board my return flight when I saw a ton of New Relic alerts saying that our hard drives were about to crash because they were full. It wasn’t possible that the private beta could have generated that much volume. What was happening?
I soon realized that the problem was the log files on my servers. A single error in the code had generated a ton of log volume. As the hard drive volume reached 96%, the application logged a pending crash that repeated itself over and over again, generating even more volume.
So, 200 users and one bug in the code showed me how critical it is to have cloud-based log management in place from the start.
Loggly Came to the Rescue
Luckily, before we launched, I saw an ad for Loggly in one of my social feeds and decided to check it out. Aside from the free T-shirt (which is super comfy, BTW), I liked Loggly’s ease of setup, useful documentation, and support of the open source community. When I choose software solutions (whether commercial or open source), I look for an attention to detail and great customer service. It’s the little things that matter, and with Loggly I feel like I have an expert friend when I need help.
I set up a pretty decent logger framework using Winston. Winston-loggly has worked great for us: It’s easy to set up, well documented, and widely adopted. As my logger funnels the logs to Loggly, it:
- Tags the logs by environment (staging or production) and by component (API, web, or mobile).
- Strips out some formatting that’s not relevant once the logs are in Loggly. For example, we have formatted them to color code certain text when they print to console. (My pull request with this ability was merged in June of last year.)
Loggly Keeps Our Users’ Karma with Faster Troubleshooting
Today, a log of my log management usage is for troubleshooting. For example, I recently saw that our servers were restarting unexpectedly. It didn’t take any time at all to isolate the logs with a restart string in them, look at surrounding events, and identify the event that led to the problem.
We also use Loggly to trace the sequence of events that validate a particular user’s Karma Score. We simply search on that user’s unique user ID and follow her karma through our application. This exercise helps us diagnose elusive problems and gain peace of mind that our app is working as expected.
Finally, I use a Loggly dashboard to keep an eye on overall log volume. If logs are spiking, something is often going on that demands my immediate attention!
Error Reporting Keeps Us Moving
I use daily alerts summarizing the errors that occurred over the last 24 hours to help prioritize fixes. With sample log events displaying right in the alert, it takes only a minute get a sense of the problems and dig deeper. With grepping, this kind of analysis would take forever! Error reporting has been a lifesaver for me and my team.
Logging Is Always a Work in Progress
Just as with our live service, we’re always looking for ways to improve our logging and build in the insights that our logs deliver to us. While our initial focus has been on application logging, we’re looking forward to bringing server and even client-side logs into Loggly. We’re also looking to track more log metrics, both operations- and business-focused.
So Here’s My Parting Advice
- Never, never store logs on production servers! Services like Loggly make life much easier and protect you from unnecessary risk.
- Tag your logs on Day 1. You’ll have a much easier time honing in on relevant logs.
- Take full advantage of Loggly alerts. They’re worth more than a good weekend in Vegas.
- And finally, make sure that you have good karma in the sharing economy. Visit https://havekarma.com and build up your own Karma Score today!
Justin Reynard is the chief technology officer of Karma, a consumer application that collects, compiles, and analyzes your feedback and reviews from multiple supported sites and uses it to create a reputation score—your Karma Score. Previously he worked on the Emmy-winning Rides.TV platform and video games such as Fallout: New Vegas and Dungeon Siege III. You can follow Justin on Twitter @framerate or check out his contributions on GitHub.