Make Sense

Of Production

Stanza helps engineers understand what’s happening, collaborate efficiently, and act intelligently by fusing SRE wisdom with a real-time model of your production environment.

Cute robot holding out hand

Work in real time

Cute robot playing with abstract spiral
Animation of the product that shows a user (Maggie) reading a production story.
Maggie views comments on the story and reads that Laura is working on a rollback.

She then scrolls down to a timeline of events ingested from various providers, including PagerDuty, GitHub and Prometheus, and pick out an event labeled 'custom action - traffic shift on ingress-lb'.

Maggie expands the event details, which reveal that she took the action at 17:53:45 based on a suggestion from the StanzaBot. She marks the action as a 'mitigation action' for purposes of making a story timeline.

She then clicks on a 'reassign alert' auto action that suggests that next time Laura's team gets the alert first because the story was re-assigned to Laura. She says yes to making this change.

Then she clicks on a share button and selects 'Joseph' from a dropdown.
In a comment box she types "So glad your team wrote that traffic shifter! Would have been lost without it. Re-assigned the alert to you folks! - Maggie"
Cute robot playing with tangled yarn

Collaborate to create your production story

Ingest data about your production from popular providers like AWS, DataDog, GitHub, PagerDuty and Sentry
Create your production story, assisted by Stanza’s recommendations
Annotate events, request input from colleagues, and share production state information quickly

Query your production environment

If I change this config, what services might be impacted?


// Find everything that depends on my postgres database
stanza.Now().node(‘Postgres’).incomingNodes().list()

// Find the things my signup API depends on
stanza.Now().node(‘SignupApi’).outgoingNodes().list()

A hundred alerts just went off - are they all related to one thing?


// find the number of distinct subgraphs that have alerts on them right now
stanza.Now().alerts().nodes().subgraphs().count()

Was there a new build during last week’s incident?


// Mark the current state with tags
stanza.Now().nodes('Postgres','CustomerAPI').tag('postmortem-incident-name','failedServices') 
//used for tagging failure modes for investigation later

// find old tags to re-analyze state later
stanza.AtTags('postmortem-incident-name','failedServices').nodes().builds().list()

// or search for old state ad-hoc
// Find builds that occurred during incident last week
stanza.AtTime('2022-08-28T13:22:00z').nodes('CustomerAPI', 'SignupProcessor').builds().list()

Leverage SRE Wisdom

Use the Stanza platform to build robust, safe and observable automation

Build, share and use new automation from the community library of best-practices

Get real-time suggestions about how to take action from the Stanza AI

Cute robot holding pencil

Decades of real world expertise - codified

The Stanza team has built and operated some of the most critical and successful systems on the Internet. We’ve had our engineering at the heart of Google, YouTube, Widevine, Amazon, Dropbox, Azure, Stripe, Slack–and more–at every layer of the stack, and across an array of industry and government sectors.

In addition, the Stanza team has written and contributed to the most influential books on reliability engineering in industry.

Stanza brings you decades of software, systems, and security engineering expertise, crystallized into code.

Image of books:
Site reliability engineering
The Site Reliability workbook
Implementing Service Level Objectives
Reliable machine learning
Seeking SRE
97 things every SRE should know
Building secure and reliable systems