DORA Metrics: A Full Guide to Elite Performance Engineering

Ever wondered why some tech companies seem to effortlessly outpace their competitors? Forget the 10x developer — it’s because they’ve built the 10x team. How do you do that?

The secret might just lie in their approach to applying DORA metrics.

DORA metrics are powerful indicators of an organization's ability to deliver value quickly, reliably, and consistently (when applied in a trust-based environment). Research has shown that organizations with high DORA maturity are 2x more likely to exceed profitability targets, with some key drivers being:

Faster time-to-market: High-performing teams can deploy on-demand, multiple times per day. This agility means you're constantly delivering fresh value to users.
Increased stability: Lower change failure rates mean fewer fire drills and more time for innovation.
Improved team morale: When deployments are smooth and recovery is swift, developers spend less time putting out fires and more time doing what they love – building great products.

We’ve seen it first hand, witnessing Octopus Deploy ship 47% more PRs by leveraging insights from the Multitudes platform on their DORA metrics performance.

Sections:

1. What are DORA metrics?
2. Why do DORA metrics matter?
3. What does Elite Engineering Performance look like?
4. Implementing DORA Metrics for Software Delivery
5. Common pitfalls when measuring DORA Metrics
6. Software to measure developer productivity

‍

1. What are DORA metrics?

DORA metrics, when first introduced by Google, focused on 4 key metrics (”the four keys”) that are strong indicators of software delivery performance. Over time, these metrics have evolved, leading to updates and the introduction of a fifth metric in 2024:

Change Lead Time: The time it takes to go from first commit to code successfully running in production.
Deployment Frequency: How often an organization deploys code to production or releases it to end users.
Failed Deployment Recovery Time (Formerly Mean Time to Recovery): The time it takes to restore service when a deployment causes an outage or service failure in production (whereas previously Mean Time to Recovery also included uncontrollable failure events such as an earthquake disrupting service).
Change Failure Rate: The percentage of changes that result in degraded service or require remediation (e.g., that lead to service impairment or outage, and require a hotfix, rollback, fix forward, or patch).
Rework rate: This fifth metric was introduced later in 2024, and together with Change Failure Rate provide an indicator of software delivery stability. Since it's a newer addition, there aren’t yet established quantifiable benchmarks and so this metric tends to receive less focus.

DORA categorizes their metrics into assessing two key dimensions which describe software delivery performance:

Factor	Description	Metrics
Software delivery throughput	Speed of making updates of any kind, normal changes and changes in response to a failure	• Change lead time • Deployment frequency • Failed deployment recovery time
Software delivery stability	Whether deployments unintentionally lead to immediate, additional work.	• Change failure rate • Rework rate

‍

2. Why do DORA metrics matter?

DORA metrics matter they serve as powerful indicators of an environment where everyone collaborates to deliver better software, faster.

To fully appreciate DORA metrics, it’s worth highlighting that engineering and operations teams used to work in separate silos. That was not effective, as engineers would ship code over the wall to operations, who'd then struggle to deploy and maintain it without understanding the full context. That led to the concept of platform engineering or DevOps, a new way of doing things more harmoniously.

The DORA metrics help teams understand where they stand on this DevOps journey. These metrics place teams into performance levels – from Low to Elite – based on how well they're doing. For instance, the best performing teams (Elite) keep their change failure rate under 5% and can bounce back from deployment failures in under an hour. Pretty impressive, right?

DORA metrics focus on performance and they correlate with customer value creation and the financial performance of companies. By tracking these four key metrics, teams can pinpoint areas for improvement by benchmarking against the industry standards below.

‍

3. What does Elite Engineering Performance look like?

DORA performance levels are accepted as a benchmark for what elite engineering performance looks like. This is because DORA's 10+ years of research and related book Accelerate consistently show a direct link between high-performing tech teams , psychological safety, and financial performance.

Here's a summary of the latest 2024 DORA metrics benchmarks:

‍

Deployment Frequency

Deployment Frequency tracks how often an organization deploys code to production or releases it to end users. This metric is a key indicator of your team's ability to deliver value continuously, and more importantly, it shows how often our customers get new value from our development work.

Top teams respond quickly to customer needs and rapidly iterate their products, as shown by the Deployment frequency benchmarks:

“Elite” level — On demand (deploying multiple times per day)
“High” level — At least once per week

Change Lead Time (also known as Lead Time for Changes)

Change Lead Time measures the time it takes from first commit to code successfully running in production, representing one of the most controllable stages by the engineering team. It also shows how quickly you can get features into the hands of customers, which is when value is truly delivered.

Benchmarks show that the Change Lead Time for high-performing companies is:

“Elite” level — Less than one hour
“High” level — Less than one week

Change Failure Rate

Change Failure Rate represents the percentage of changes that result in degraded service or require remediation (e.g., lead to service impairment or outage, and require a hotfix, rollback, fix forward, or patch). This metric reveals how often teams can’t deliver new value for customers due to a failure, and indicates the quality of your software delivery process.

Change Failure rates for top performing teams based off DORA benchmarks are:

“Elite” level — Less than 5%
“High” level — Less than 20%

Failed Deployment Recovery Time (Formerly Mean Time To Recovery)

Failed Deployment Recovery Time measures average time it takes to restore service when a software change causes an outage or service failure in production. It’s important because it shows how long your customers are unable to experience the full value of the app because of incidents. A low Failed Deployment Recovery Time indicates high efficiency in problem-solving and the ability to take risks with new features.

Based off DORA benchmarks, the time taken for top performing teams recover from failed deployments are:

“Elite” level — Less than 1 hour
“High” level — Less than 1 day

However, research on the Verica Open Incident Database (VOID) highlights that may be issues around taking averages incidents data, such as high data variability and positive skew distributions often captured, making it an unreliable metric at times. As a result, using supplementary measures for incident response data are becoming more popular. Mean Time to Acknowledge (MTTA) is an example, which measures the average time it takes someone to acknowledge a new incident in production, which we include in Multitudes.

At Multitudes, we like Courtney Nash's metaphor of “putting some spinach in your fruit smoothies” — we can keep using the metrics that people are familiar with while starting to incorporate better metrics (the "spinach") to illustrate the full picture.

What’s the difference between Failed Deployment Recovery Time (FDRT) and Mean Time To Recovery (MTTR)?

There is a key nuance between FDRT and MTTR is that:

FDRT looks at failures caused by a deployment
MTTR looks at failures caused by all possible reasons

Thus, FDRT is a cleaner measure of issues which the engineering team should’ve identified during QA or testing — whereas MTTR may include failures due to any reason such as an earthquake interrupting service at a data center. This is why DORA made the distinction between the two, in preference for FDRT, in their 2023 DORA report.

To do this, they changed the way they asked about failures in the DORA survey:

Previously for MTTR: “For the primary application or service you work on, how long does it generally take to restore service when a service incident or a defect that impacts users occurs (for example, unplanned outage, service impairment)?”
Now for FDRT: “For the primary application or service you work on, how long does it generally take to restore service after a change to production or release to users results in degraded service (for example, lead to service impairment or service outage) and subsequently require remediation (for example, require a hotfix, rollback, fix forward, or patch)?”

‍

4. Implementing DORA Metrics for Software Delivery

Ready to try DORA with your team?

Before jumping in, remember that a high-trust environment needs to be created first. Without trust, these metrics can be gamed and misused, leading to fear and uncertainty among team members.

Picture this: developers splitting their work into smaller, more frequent deployments just to improve their DORA scores. It’s Goodhart’s law in action: “When a measure becomes a target, it ceases to be a good measure”

So how can we implement DORA in thoughtful way? Here's our approach:

Be Clear about the Business Goal: Why do we want to bring in metrics? What are the bigger outcomes we're hoping to support? Make sure your chosen metrics align with the value you are trying to create (e.g., shipping a new release, retaining key talent). And don't keep it a secret – let everyone know why these metrics matter. We’ve written more about the core objectives of DORA here.
Establish a Baseline: Take a snapshot of your current performance against the selected metrics. This gives you a reference point for seeing how you compare to benchmarks and tracking progress as you run experiments.
Integrate with Existing Tools and Processes: Getting metrics should help you, not slow you down. This can look like:
- Incorporating tracking into your git tooling, CI/CD pipelines, issue tracking, and IMS
- Weaving metrics into your team practices like stand-ups, retros and 1:1s to stay across bottlenecks and issues early. Most dashboards are built and then get forgotten; you don't want that to happen here!
- Supporting team members on the metrics and encouraging open communication and feedback during the integration process.
Run an experiment: By now, hopefully your team has live access to key metrics, woven the metrics into your team practices, and are having rich discussions about what's going well and where you can improve. So its time to run an experiment! Pick one metric you want to improve, come up with a hypothesis on how to do it as a team, and then give it a go!
Track Progress and Celebrate Successes: Ideally, your tooling helps you check in how things are going (see how our platform can help you with this). Are you seeing the improvements you hoped for? Or not quite there yet? Whether the experiment worked or it’s time to try a new approach — the most important part is to celebrate every win or learning, no matter how small. And remember, spread the knowledge! Your wins could inspire other teams and contribute to a culture of continuous improvement by making failed experiments a common and celebrated part of work.

From there, it's rinse and repeat for steps 4-5. Choose one area to improve, then run an experiment, track progress and iterate.

This approach will help build a culture of continuous improvement and blameless experimentation. Getting into a habit of experimentation lowers the barriers for teams to change their ways of working, and builds the muscle of tackling new challenges and shifting priorities.

‍

5. Common pitfalls when measuring DORA Metrics

DORA metrics provide a standard framework for measuring software delivery performance. But, because software development is a type of knowledge work, so it’s not as simple to measure as inputs in leading to widgets out.

On top of that, the rise in remote and hybrid work has further complicated this scenario. We must be cautious not to fall into the trap of oversimplification or creating perverse incentives.

Lack of psychological safety

Fostering a high-trust environment with strong psychological safety is crucial for the effective use of productivity metrics. Google's Project Aristotle research into team effectiveness revealed that psychological safety, more than anything else, was critical to making a team work. The researchers found that individuals on teams with higher psychological safety were less likely to leave Google, they were more likely to harness the power of diverse ideas from their teammates, they brought in more revenue, and they were rated as effective twice as often by executives.

In such environments, teams have the space to experiment, learn, and refine their metrics over time, knowing that they will support each other in the process of continuous improvement. This in contrast to environments where the lack of trust causes fear that metrics are used to blame individuals — leading to burnout and behaviors that do not contribute genuine business value.

Mis-use of metrics

Reductive metrics can lead to reductive outcomes—focusing the number of commits or lines of code won’t give the full picture of how a team is performing and what they’ve contributed. Goodhart’s Law warns us that once a metric becomes a goal, its effectiveness as an indicator declines. Concentrating only on these superficial figures might inadvertently encourage behaviors that do not contribute genuine business value—for instance, writing lots of code without spending enough time on quality, or simply to game the metrics.

That’s why we recommend putting in place guardrails for what you won’t measure as much as what you will. Just because something can be measured doesn’t mean it should be, or that it will help anyone to measure it. We’ve written a separate article about our data ethics guardrails here.

Overemphasis on the metrics alone

DORA metrics help teams take a data-driven approach to software development and delivery. But, depending too much on metrics can give a skewed perspective of developer productivity. Numerical data usually doesn’t account for the entirety of a developer’s work quality. For instance, while high volumes of code commits might appear impressive based on quantitative analysis, without getting the team’s context, one cannot discern if these changes are substantive or merely cosmetic.

To gain a full grasp of productivity levels, it is crucial to pair metrics with the human context. As we always like to say at Multitudes, even if we had all the integrations, surveys, and data sources in the world, it would never be the sum of a human – because humans do things offline and they change. The best way to get the complete picture is to pair the data with conversations with people. That’s the best way for organizations to make informed choices and enhance overall team performance significantly.

‍

6. Software to measure developer productivity

To effectively track and analyze DORA metrics, teams can use Multitudes, which is an engineering insights platform for sustainable delivery. Multitudes integrates with your existing development tools, such as GitHub and Jira, to provide insights into your team's productivity and collaboration patterns.

With Multitudes, you can:

Automatically track DORA metrics like Change Lead Time and Deployment Frequency
Get visibility into work patterns and types of work, such as feature development vs. bug fixing
Identify collaboration patterns and potential silos within your team

By leveraging Multitudes, you can improve your DORA metrics while giving your teams more time to act on insights, enhancing their productivity and satisfaction.

Our clients ship 25% faster without sacrificing code quality.

Ready to unlock happier, higher-performing teams?