The release of Site Reliability Engineering (2016) ignited an industry-wide conversation about the development and management of online systems. Today, an array of conferences, books, and tools have joined that original conversation to support what you could think of as a movement: the community of people interested in reliability as a way of life.
As that community has grown and diversified, we’ve learned a lot about what fosters reliability in services, teams, and organizations. Perhaps surprisingly, we see reliability being delivered via a mix of arrangements. Yes, dedicated SRE teams, you-build-it-you-run-it, and even embedded reliability expertise. In fact, reliability can transcend tech stacks, industries, and organizational histories, and believe that anyone can have good reliability if two key factors are in place:
* A positive, collaborative, always-learning culture
* Tooling that transcends silos and team boundaries, enabling anyone to do the best thing for everyone
These factors are highly intertwined. The DevOps community understands this intuitively: tooling, culture, and results are related. When tooling helps break down organizational silos, a collaborative culture emerges. But when tooling is team-specific and painful to use across silos, unhappy culture and poor availability follows.
This is why we’re building Stanza - a platform for making reliable services, integrating diverse infrastructure, understanding events, and taking action across teams, tools, and technologies.
Stanza levels up your production engineering teams by giving them instant, queryable access to a real-time model of production - converging data from popular tools like AWS, DataDog, Sentry, and GitHub into a cohesive story about what is (and was) happening across services and dependencies. This provides teams with context not just on their own work, but on the work of others in the org - enabling faster understanding, more effective onboarding, and scaling collaboration. Teams can also build more reliable services from a set of optimised components and APIs that make automation easier.
When you find the problem, Stanza’s extension model lets engineers act quickly - no digging around for docs or old scripts. The plug-in interface helps teams to act directly from the platform. Sharing automation is easier, and building on other’s work is easier, helping foster a community around production.
Since Stanza integrates what’s happening across several systems with how teams can take action, it opens up new possibilities for human-in-the-loop service management. Stanza recommends actions to take, informed by a mix of data analysis and decades of production wisdom, and our production analytics improve the more data sources it has to work with.
With quick to set up integrations for your most often-used tools, Stanza helps you move faster and safely, together.
Looking forward to building the next generation reliability community with you!