Segment offers cloud-based companies a single platform that collects, stores, and sends data to hundreds of business tools with the flip of a switch. This transparent integration takes a lot of headaches away from its customers, serving as a routing hub for all of their analytics and marketing tools. But when running your business involves integrating with more than 100 web-based services and billions of API calls every month, there are many places where things can go awry. Segment depends on its log data to troubleshoot operational issues and to keep its service in top shape.
“We have hundreds of thousands of requests to partner services at any given time, which makes identifying and resolving issues quickly a little complicated,” says Calvin French-Owen, one of the founders at Segment. “Any given problem can be the result of an error in our service, bad responses or slow responses from partners, or configuration issues on the part of our customers.” Therefore, Segment logs responses from its application and all of its external touchpoints.
Segment’s log management strategy evolved several times as its customer base grew. The company’s original approach of aggregating its logs with syslog made it difficult to correlate related issues across machines. After experimenting with and outgrowing a couple of other cloud-based log management services, Segment built an internal logging service on the ELK Stack (Elasticsearch, Logstash, & Kibana). However, the service lacked depth in reporting and didn’t deliver on the need to expose insights quickly from the data without internal cycles.
Additionally, because engineering resources were focused on advancing Segment’s core product, Segment had limited resources to build log indices; the end result was that most logs were only stored as text. Without indexed log data, it took developers and support personnel much longer to isolate the specific logs they needed to troubleshoot customers’ integration problems. Finally, Segment engineers TJ Holowaychuk and Garrett Johnson decided to really invest in Segment’s logging systems and migrate to Loggly.
“Logging is not the meat of our business – analytics is,” French-Owen remarks. “We want to be able to move really quickly on analytics, and we know that Loggly can do a much better job at log management than we can.”
French-Owen cited several key deciding factors in choosing Loggly:
For all of its machines, Segment logs errors related to requests to and responses from all of its integrations. It also logs errors that happen with events, for example a disconnection from one of Segment’s queues, Redis clients, or Mongo. Previous to deploying Loggly, Segment had deployed a separate log server to which all of its machines send logs. As a result, sending data to Loggly was as easy as adding a single plugin and turning it on, and self-scaling means that Segment was able to tailor their investment to their changing and growing business needs.When Segment first started using Loggly, it sent the same types of text-based logs that it indexed with its internal service. The company now makes extensive use of JSON and has been refining its logs to gain the maximum advantage from Loggly’s indexing.
With billions of API calls every month, we generate a huge amount of log data, and it’s difficult and expensive to manage ourselves. It really made the most sense to bring in an expert like Loggly to take it off our plates.
— Calvin French-Owen, Founder
Segment’s technical support team is the primary group of Loggly users. “Since there are so many companies involved when we route data— our customers, our integration partners and ourselves—Loggly is the first place we go to see what’s working and what’s not,” French-Owen comments. “We need both a broad view of where something went wrong along with the ability to narrow down onto a specific request.”
Before Loggly, troubleshooting a customer’s problem usually involved manual integration testing and a long series of back-and-forth discussions between the customer, Segment support, and the Segment developer team. Now, when a customer contacts the Segment support team, the Segment team member first uses a debugger in the Segment application to verify that data requests are being sent out. Next, the team member goes into Loggly and uses a unique customer identifier to to determine if requests are reaching the relevant partner service and if any error messages are being generated in the process.
Segment support can quickly identify:
“It’s been great that we can solve customers’ problems much faster than we could before,” French-Owen notes. “And in most cases, the DevOps team no longer has to get involved at all. Loggly is a way to keep our sanity around having all of these external touchpoints.”
Moving forward, French-Owen sees opportunities to use log data for proactive monitoring. Today, if his team sees an uptick in errors relating to a specific partner, they may reach out to that partner, buffer messages, or both. The team is also looking to Loggly for: