Pyladies at Loggly: Why Java Engineers Also Love Python
For those who don’t know, PyLadies is an international organization focused on increasing women’s participation in the Python community through mentorship, outreach, education, conferences, and social gatherings.
When I was approached with the opportunity to speak at this meetup, I initially thought, “Hey, that sounds great, but I’m not a Python engineer.” I got to thinking, though. Even as a Java engineer, I still occasionally write things in Python. There are many critical components of Loggly’s tech stack that are written in Python. Although Python isn’t my primary coding language, I would not be able to do my job well without it, nor would Loggly run without it. This fact is what inspired me to give the talk and to write this blog post.
In the talk, I focused on what Loggly does, why we do it, and how we do it, with an emphasis on the role that Python plays in our tech stack. Since logs are by nature huge, unwieldy, and unpredictable, Loggly has learned a ton about scalability. Our tech stack is wildly more complicated than the news feed at LinkedIn, which is what I worked on prior to Loggly. The only way to manage such a beast is through automation and excellent tooling. And what language out there is better for automating things than Python? What language is as effortless as Python in whipping up a quick script?
To be more specific: the main places where we use Python in Loggly’s infrastructure are analytics and business intelligence applications, monitors and governors, and test automation. Oh, and our whole web app is written in Django, but I want to focus on the uses that the infrastructure team, all of us Java developers, have for Python.
Our internal analytics are powered by a Python app that consumes application metrics from a Kafka queue. This app aggregates and analyzes the information it consumes, and stores computed metrics in our analytics database. The metrics data we collect help us make product and business decisions, which in turn help us provide valuable functionality to our customers. Because of Python’s ease of use when it comes to text processing and the excellent support for Python in third-party services, Python was a natural choice for the BI piece of our infrastructure.
Our monitors and governors are also an invaluable piece in the puzzle. Without our monitors, for example, we would have no way of knowing where things stand with our processing pipeline. Thanks to some Python scripts that we whipped up in times of need, we on the Infrastructure and DevOps teams are able to diagnose production issues in a matter of minutes since finicky and tedious curl commands are distilled into a few easy keystrokes. Those Python scripts, if necessary, can then be completely automated with the help of cron. It’s true that this sort of thing could be accomplished with any scripting language, but I find myself looking up the specifics of Python syntax far less often than any other scripting language that I have dabbled in. This is a huge plus when you are in a time crunch.
One of my favorite Python tools that Loggly has developed is called Cerberus. Cerberus is one of our service governors. It keeps an eye on customer data rates on our infrastructure. If the data rate exceeds an allowed limit, it takes action based on the defined policies. Why is Cerberus a Python app? Because we needed something that was going to make a lot of REST calls, and we needed it pronto. Thanks to Python’s superb requests library and again, ease of use, building this tool was easy and saved us from having to perform this time-sensitive process manually. For more information on the topic of service governors and scalability in general, see our CTO’s excellent blog post here: Building a SaaS Service for an Unknown Scale: Part 2.
Lastly, we use Python for our test automation. We have a number of integration tests which must bring up sometimes as many as a dozen different applications. Most of these tests generate and send a bunch of test data, check each application to verify the data flowed through it, and then inspect the final result in Elasticsearch. The test drivers were written in bash at one point but have since been migrated to Python as Python is more readable, easier to maintain, and more sophisticated with its data structures and built-in methods.
I work on the parsing framework at Loggly, and I find Python to be an excellent tool for whipping up performance and correctness tests. At one point, I decided to write a tool that would guarantee that changes to the parsing framework would not cause any parsing to break. This was a harder problem than it seems at face value, as a handful of unit tests don’t really guarantee that your regexes will do what you expect when applied against the full breadth of production data. So I wrote a Python script that gathers gigabytes of data, compiles two versions of the parser, runs the same data through the old and new version, and analyzes the output to make sure everything is as expected. I had planned a day or two to finish this task, as I was rusty with Python and it seemed complex. But, to my surprise and delight, it took all of about three hours to finish. Thanks, Python! I couldn’t have done it without you!
So, in summary, any engineer, regardless of what language he or she writes in, should be versed in some sort of scripting language. I recommend Python because of its ease of use, its flexibility, and its handy string processing methods, not to mention its wide adoption and excellent community support. You might be able to get by without it, but the time will inevitably come when it could have saved you tons of time and effort. If you don’t know Python already, there are tons of resources and meetups you can join, such as PyLadies!