Datami runs cloud-based applications that serve major cellular carriers throughout the world. The microservices run in diverse Amazon Web Services (AWS) regions or in carriers’ private data centers. With millions of mobile subscribers as the end users, the Datami products can experience very high loads—and generate huge log volumes.
When Datami initially launched, developers and DevOps teams were logging directly into servers for troubleshooting. Realizing that this approach would not scale, it began aggregating its logs with Fluentd and deployed the Elasticsearch-Logstash-Kibana (ELK) stack for search and visualization. However, the Datami team soon discovered that it was very difficult to keep ELK running correctly without devoting extra staff to log management. “We really wanted to focus the company’s limited resources on its core business,” explains Mohit Khanna, senior cloud architect at Datami.
Datami now sends application logs as well as logs from its Apache Kafka infrastructure and MongoDB database to Loggly. Most of these logs are formatted in JSON, so they are easier to search and filter with Loggly. The Datami team kept its existing Fluentd log collector in place and simply pointed it to Loggly. “Fluentd worked right out of the box for us, and it made things easy because our team already knew it,” Khanna reports.
Loggly has increased our productivity across the product lifecycle and allowed our developers to concentrate on coding.
— Mohit Khanna, Senior Cloud Architect
Datami’s team of 20 developers and two DevOps professionals uses Loggly to troubleshoot problems and has seen productivity improvements across the team. “Before Loggly, reaction times were much longer,” Khanna says. “Now, we’re able to tell what went wrong in five minutes.”
The development team has handed over a lot of this day-to-day log analysis to the QA department and in some cases to customer support. The Datami team might search to identify which microservice is causing a particular issue, look into the history using log data, or investigate the latency of a particular REST API. The team uses Slack extensively and will often share Loggly search results when collaborating on a problem. When a team member needs to share log data with a customer, it’s easy to put a search result in grid view and export the data to a CSV file. “This approach is helpful because not all of our customer contacts are technical,” Khanna adds.
“Personally, I own almost 50 Amazon EC2 instances. Now, I hardly ever log into them.”
When a product launches in a new market, Datami usually sees sudden spikes in subscriber traffic. The team has created Loggly alerts that send notifications directly to a Slack channel, so that the DevOps team knows to increase its monitoring using a toolset that includes Datadog, Kibana, and their internal Monitoring portal in addition to Loggly. Datami also uses Loggly alerts to detect exceptions thrown in its microservices.
The QA group at Datami also make use of log data in two ways:
The QA team takes advantage of Loggly Derived Fields to create multiple fields within log data, from which they can set alerts to immediately detect certain correlated data values.
In a microservices architecture, log management is a critical tool for troubleshooting issues. “Anyone with more than six or seven microservices in production on more than 20 servers simply must centralize the log data,” Khanna concludes. “Loggly is a no-brainer.”