Log Management and Analytics

Explore the full capabilities of Log Management and Analytics powered by SolarWinds Loggly

View Product Info


Infrastructure Monitoring Powered by SolarWinds AppOptics

Instant visibility into servers, virtual hosts, and containerized environments

View Infrastructure Monitoring Info

Application Performance Monitoring Powered by SolarWinds AppOptics

Comprehensive, full-stack visibility, and troubleshooting

View Application Performance Monitoring Info

Digital Experience Monitoring Powered by SolarWinds Pingdom

Make your websites faster and more reliable with easy-to-use web performance and digital experience monitoring

View Digital Experience Monitoring Info

Blog DevOps

The Ultimate Operations Book List for the Infrastructure Engineer

By Karen Sowa 02 Jun 2015

The DevOps culture is pretty new for most of us, and there’s a lot to learn if you are shifting into an operations role or evangelizing DevOps in your organization. Several members of the Loggly team reference Paul Stack’s book list that covers DevOps, continuous delivery, operations and systems thinking, tooling, and a bunch of other topics that are important for DevOps. They wanted us to expand upon his list, so we put together the compilation below.

We think our list is quite thorough, but like software development, it’s never done. If you have read any books that you think belong, be sure to make a comment below and we’ll check them all out.


Continuous Delivery/RelEng

Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation
Jez Humble and David Farley present the principles and technical practices for rapid, incremental delivery of new, high-quality functionality to users. Continuous Delivery offers a rapid, reliable, low-risk delivery process, as well as an automated process formanaging all changes. Humble and Farley also discuss the ecosystem required for continuous delivery, including infrastructure, data and configuration management, and governance.

Release It
Michael T. Nygard shows readers how to design and build applications to withstand sudden influxes of traffic, customers from hundreds of countries, as well as flaky networks and tangled databases. Nygard explains that many system problems begin with the design, and he shares how to design applications for maximum uptime, performance, and return on investment.

Build Quality In
Editors Steve Smith and Matthew Skelton present Continuous Delivery and DevOps experience reports from the wild. Twenty Continuous Delivery and DevOps practitioners share their own first-hand experiences—what worked, what didn’t, and the highs and lows of trying to build quality into an organization—to help readers on their own Continuous Delivery and DevOps journeys.

DevOps for Developers
Michael Hütterman delivers a practical, thorough introduction to approaches, processes, and tools for fostering collaboration between software development and operations, helping readers streamline the software delivery process and improve the time from inception to delivery.


System Architecture

Software Architecture for Developers
In this practical and pragmatic guide to lightweight software architecture for developers, Simon Brown covers the essence of software architecture; why the software architecture role should include coding, coaching, and collaboration; how to visualize software architecture using simple sketches; and the things readers need to think about before coding. Brown also provides a lightweight approach to documenting software, along with much more.

Building Microservices
Sam Newman provides examples and practical advice for a holistic view of the topics that matter most to system architects and administrators who are building, managing, and evolving microservice architectures. Using an example fictional company throughout the book, readers learn how building a microservice architecture affects a single domain. Newman provides a firm grounding in the concepts while diving into current solutions for modeling, integrating, testing, deploying, and monitoring autonomous services.

Test-Driven Infrastructure with Chef
Stephen Nelson-Smith demonstrates a radical approach to developing web infrastructure, combining the powerful Chef configuration management framework with the leading behavior-driven development (BDD) tool Cucumber. Readers learn how to deliver real business value by developing infrastructure code test-first, allowing them to make significant changes without the fear of unexpected side effects using the open source infrastructure testing platform Cucumber-Chef.

DevOps Culture/Leadership

The Goal: A Process of Ongoing Improvement
Eliyahu M. Goldratt and Jeff Cox’s The Goal is a gripping novel featuring Alex Rogo, a harried plant manager struggling to improve performance as his factory rapidly heads for disaster. Unless he can turn performance around, his plant will be closed by corporate HQ in 90 days, resulting in hundreds of job losses. A chance meeting with an old professor helps him break out of his conventional thinking patterns and see what needs to be done. The Goal contains a serious message for all managers in industry and explains the ideas underlying Eli Goldratt’s Theory of Constraints (TOC).

The Phoenix Project
Gene Kim, Kevin Behr, and George Spafford present a fictionalized tale modeled closely after The Goal in which an IT manager named Bill, who works at Parts Unlimited, gets a call from the company’s CEO on his drive to the office one morning. Because the company’s new IT initiative, the Phoenix Project, is over budget and very late, the CEO wants Bill to report directly to him and fix the mess in 90 days—or else the entire department will be outsourced. Through Bill’s work at Parts Unlimited, this entertaining and enlightening novel ultimately shows readers how to improve their own IT organizations.

The Visible Ops Handbook: Implementing ITIL in 4 Practical and Auditable Steps
Gene Kim, George Spafford, and Kevin Behr have met with hundreds of IT organizations, eight of which they have identified as having the highest service levels, best security, and best efficiencies. After studying these high-performing organizations to figure out the secrets to their success, they’ve codified the information in The Visible Ops Handbook, allowing readers to replicate their key processes in just four steps.

Fearless Change
Drs. Linda Rising and Mary Lynn Manns share 48 proven techniques, or patterns, for implementing change in organizations and show readers how to use the patterns successfully. Drawing on the experiences of hundreds of leaders, Rising and Mann’s patterns apply to every stage of the change process—knowledge, persuasion, decision, implementation, and confirmation—and offer powerful insight into change-agent behavior, organizational culture, and the roles of every participant.

More Fearless Change
In this newly released book, Drs. Rising and Manns reflect on lessons learned about their original patterns over the past decade while introducing 15 new techniques. With strategies that appeal to each individual’s logic (head), feelings (heart), and desire to contribute (hands), More Fearless Change offers a way to motivate real change and sustain it for the long haul.

Crucial Conversations
Kerry Patterson, Joseph Grenny, Ron McMillan, and Al Switzler’s New York Times and Washington Post best-seller helps readers get past the hard parts of dialogue to take the lead in tough conversations to achieve real, productive relationships that will enrich their careers and lives.

Team Geek: A Software Developer’s Guide to Working Well with Others
Software engineers Brian W. Fitzpatrick and Ben Collins-Sussman, whose popular series of talks has attracted a massive following, cover basic patterns and anti-patterns for working with other people, teams, and users while trying to develop software. They argue that writing software is a team sport in which human factors have as much influence on the outcome as technical factors. Readers learn about the often-overlooked human components of collaboration and other “soft skills” of software engineering, resulting in a much greater impact for the same amount of effort.

The Wisdom of Teams: Creating the High-Performance Organization
Jon R. Katzenbach and Douglas K. Smith talk to hundreds of people at more than 30 companies to uncover where and how teams work best along with how to enhance their efficacy. Readers learn the most important element to team success, who makes the best team leaders, and why companywide change depends on teams.

Out of the Crisis
In his 1982 classic Out of the Crisis, W. Edwards Deming presents a theory of management, based on his famous 14 Points for Management, explaining the principles of management transformation and how to apply them.

Remote: Office Not Required
37signals founders Jason Fried and David Heinemeier Hansson explore the “work from home” phenomenon. Remote shows why, with some exceptions, many businesses should want to promote this model for getting things done and explains how a remote work setup can be accomplished.

Start with Why
Simon Sinek asks Why—why are some people and organizations more innovative, more influential, and more profitable than others? Why do some command greater loyalty? After studying the world’s greatest leaders, Sinek argues that they think, act, and communicate in the exact same way—and it’s the complete opposite of everyone else. Sinek uses these real-life stories to argue what it takes to lead and inspire.

Leaders Eat Last
After publishing Start With Why, Simon Sinek noticed in working with teams around the world that some trusted each other so deeply that they would literally put their lives on the line for one another, while other teams, no matter what incentives were offered, were doomed to infighting, fragmentation, and failure. The answer revealed itself during a conversation in which a Marine Corps general explained, “Officers eat last.” From the Marine Corps to the earliest hunter gatherer tribes and everything in between, the best organizations foster trust and cooperation because their leaders build what Sinek terms a “circle of safety” separating the security inside the team from the challenges outside.

Turn the Ship Around!: A True Story of Turning Followers into Leaders
Captain David Marquet tells the true story of how the Santa Fe skyrocketed from worst to first in the fleet by challenging the U.S. Navy’s traditional leader-follower approach. Struggling against his own instincts to take control, Marquet instead achieved the vastly more powerful model of giving control and, before long, each member of his crew became a leader and assumed responsibility for everything he did, from clerical tasks to crucial combat decisions. The crew became fully engaged, contributing their full intellectual capacity every day, and the Santa Fe started winning awards and promoting a highly disproportionate number of officers to submarine command.


Web Operations

Web Operations: Keeping the Data on Time
A collection of essays and interviews compiled by John Allspaw and Jesse Robbins in which web veterans like Theo Schlossnagle, Baron Schwartz, and Alistair Croll offer insights into the evolving field of web operations and stories from the trenches to teach readers what’s necessary to help a site thrive.

The Art of Capacity Planning: Scaling Web Resources
John Allspaw offers a hands-on and practical guide to planning for scalable growth, with many techniques and considerations to help readers plan, deploy, and manage web application infrastructure. Allspaw provides personal insight and anecdotes from his work as manager of data operations at Flickr along with insights from colleagues in other industries, giving readers solid guidelines for measuring growth, predicting trends, and making cost-effective preparations.

Systems Thinking

Thinking in Systems: A Primer
Donella H. Meadows offers concise and crucial insight for problem solving on scales ranging from the personal to the global, bringing systems thinking out of the realm of computers and equations and into the tangible world and showing readers how to develop the systems-thinking skills that thought leaders across the globe consider critical for complicated, crowded, and interdependent 21st-century life.

Failure Is Not an Option: Mission Control from Mercury to Apollo 13 and Beyond
In this memoir, veteran NASA flight director Gene Kranz tells riveting stories from the early days of the Mercury program through Apollo 11, the moon landing, and Apollo 13. Kranz recounts these historic events while offering new insight into technological failures and near misses. Kranz’ behind-the-scenes details demonstrate the leadership, discipline, trust, and teamwork that made the space program a success.

The Field Guide to Understanding Human Error
Sidney Dekker guides readers through the traps and misconceptions of the “bad apple” view of human error, presenting the view that human error is an organizational problem. Dekker offers readers advice for applying new theories to their organizations and handles questions about accountability and constructing meaningful countermeasures.

Drift into Failure
Sidney Dekker asks what the collapse of sub-prime lending has in common with a broken jackscrew in an airliner’s tailplane, the oil spill disaster in the Gulf of Mexico, or the burn-up of space shuttle Columbia. Dekker looks at these systems that drifted into failure, arguing that the growth of complexity in society has outpaced our understanding of how complex systems work and fail. Dekker argues that failure emerges opportunistically and non-randomly from the very webs of relationships that breed success and that are supposed to protect organizations from disaster.

Just Culture: Balancing Safety and Accountability
Sidney Dekker claims that a just culture protects people’s honest mistakes from being seen as culpable, then goes on to examine how justice is created inside of organizations—how do we know when someone has “crossed the line,” and who gets to draw the line in the first place? Dekker argues that a just culture is critical for the creation of a safety culture, and that without openness, information sharing, and reporting of failures, learning and accountability cannot be fairly and constructively balanced.

The ETTO Principle: Efficiency-Thoroughness Trade-Off
Going against the standard emphasis on human error in most accident investigation and risk assessment, Erik Hollnagel argues that it is impossible to achieve safety by eliminating risks and failures, asserting instead that it is better to study why things go right and find ways to support and amplify those successful outcomes. The ETTO principle proposes that it is normal for people in work situations to adjust their performance by means of an efficiency-thoroughness trade-off (ETTO), usually by sacrificing thoroughness for efficiency due to a lack of time or resources, work and company pressures, and so on. This simple but powerful principle for human performance can be used to understand both positive and negative outcomes.

The Fifth Discipline: The Art and Practice of the Learning Organization
Based on 15 years of putting the book’s ideas into practice, Peter M. Senge’s The Fifth Discipline argues that, in the long run, the only sustainable advantage is an organization’s ability to learn faster than the competition. Senge offers leadership stories which demonstrate the core ideas of The Fifth Discipline, as well as advice for companies looking to rid themselves of the learning “disabilities” that threaten their productivity and success.



Pro Nagios 2.0
James Turnbull simplifies deployment and installation by providing examples of real-world monitoring situations and explaining how to configure, architect, and deploy EM solutions to address them. Turnbull also offers step-by-step guidelines for creating Nagios plug-ins to monitor devices for which Nagios doesn’t provide plug-ins.

The Logstash Book
James Turnbull walks readers through installing, deploying, managing, and extending the open source tool Logstash. Readers start their new “job” as sys admins at Example.com and complete projects that add up to a functional and effective log management solution that they can deploy into their own environments.

The Docker Book: Containerization Is the New Virtualization
James Turnbull walks readers through installing, deploying, managing, and extending the open source container service Docker. Readers are introduced to the basics of Docker and its components, then shown how to use Docker to build containers and services to perform tasks like building test environments for new projects, integrating Docker with continuous integration workflow, building applications services platforms, using Docker’s API, and more.


High Performance MySQL
Baron Schwartz, Peter Zaitsev, and Vadim Tkachenko share advanced techniques for everything from designing schemas, indexes, and queries to tuning the MySQL server, operating system, and hardware to their fullest potential. With illustrative stories and case studies as well as an explanation of how and why MySQL works, readers learn safe and practical ways to scale applications through replication, load balancing, high availability, and failover, helping them unlock MySQL’s full power.


Lean / Agile

Implementing Lean Software Development
A sequel and companion guide to their groundbreaking 2003 Lean Software Development, Mary and Tom Poppendieck show readers exactly how to implement lean software development. Drawing on their experience helping development organizations optimize the entire software value stream, the Poppendiecks show readers the questions to ask, the issues to focus on, and the techniques proven to work, with case studies from leading-edge software organizations and practical exercises for jump-starting Lean initiatives.

The Lean Mindset
Through research and case studies from leading organizations like Spotify, Ericsson, Intuit, GE Healthcare, Pixar, CareerBuilder, and Intel, Mary and Tom Poppendieck show how lean companies really work—and how a lean mindset is the key to creating great products and services. Readers discover proven patterns for developing the lean mindset, as well as hands-on advice for cultivating product teams that act like successful startups, creating the kind of efficiency that attracts customers, and leveraging the talents of bright, creative people.

Kanban is an increasingly popular way to visualize and limit work-in-progress in software development and information technology work. David J. Anderson explains what exactly Kanban is, why it’s so useful, and how readers can implement it, with practical tips for recognizing and acting on improvement opportunities.

Running Lean
Ash Maurya outlines a systematic process for quickly vetting product ideas and raising the odds of success as a means of preventing the wasted time, money, and effort spent building the wrong product. Based on his experience in building a wide array of products, Maurya takes readers through an exacting strategy for achieving a “product/market fit” for fledgling ventures, building on the ideas and concepts of several innovative methodologies, including the lean startup, customer development, and bootstrapping.

Lean from the Trenches
Henrik Kniberg focuses on lean methodology in practice, illustrating key points with photos, diagrams, and anecdotes to bring readers inside the project. Lean from the Trenches begins with an organization in need of a new way of doing things and ends with a group of 60, all working in sync to develop a scalable, complex system. Kniberg walks readers through the project step by step, from customer engagement, to the daily “cocktail party,” version control, bug tracking, and release.

The Lean Enterprise: How Corporations Can Innovate Like Startups
Trevor Owens reveals the methodologies, tools, and incentive structures guiding the world’s largest organizations to reclaim their innovation prowess in the first and most comprehensive book on bringing the startup mindset into large organizations.

Visual Studio Team Foundation Server 2012: Adopting Agile Software Practices: From Backlog to Continuous Feedback
Sam Guckenheimer and Neno Loje present the definitive guide to applying agile development and modern software engineering practices with Visual Studio Team Foundation Server 2012—Microsoft’s complementary Application Lifecycle Management (ALM) platform. Guckenheimer and Loje focus on solving real development challenges, systematically eliminating waste, improving transparency, and delivering better software more quickly and painlessly.

The Loggly and SolarWinds trademarks, service marks, and logos are the exclusive property of SolarWinds Worldwide, LLC or its affiliates. All other trademarks are the property of their respective owners.
Karen Sowa

Karen Sowa