Ever wondered why some tech companies seem to effortlessly outpace their competitors? Forget the 10x developer — it’s because they’ve built the 10x team. How do you do that?
The secret might just lie in their approach to applying DORA metrics.
DORA metrics are powerful indicators of an organization's ability to deliver value quickly, reliably, and consistently (when applied in a trust-based environment). Research has shown that organizations with high DORA maturity are 2x more likely to exceed profitability targets, with some key drivers being:
We’ve seen it first hand, witnessing Octopus Deploy ship 47% more PRs by leveraging insights from the Multitudes platform on their DORA metrics performance.
Sections:
DORA categorizes their metrics into assessing two key dimensions which describe software delivery performance:
DORA metrics matter they serve as powerful indicators of an environment where everyone collaborates to deliver better software, faster.
To fully appreciate DORA metrics, it’s worth highlighting that engineering and operations teams used to work in separate silos. That was not effective, as engineers would ship code over the wall to operations, who'd then struggle to deploy and maintain it without understanding the full context. That led to the concept of platform engineering or DevOps, a new way of doing things more harmoniously.
The DORA metrics help teams understand where they stand on this DevOps journey. These metrics place teams into performance levels – from Low to Elite – based on how well they're doing. For instance, the best performing teams (Elite) keep their change failure rate under 5% and can bounce back from deployment failures in under an hour. Pretty impressive, right?
DORA metrics focus on performance and they correlate with customer value creation and the financial performance of companies. By tracking these four key metrics, teams can pinpoint areas for improvement by benchmarking against the industry standards below.
DORA performance levels are accepted as a benchmark for what elite engineering performance looks like. This is because DORA's 10+ years of research and related book Accelerate consistently show a direct link between high-performing tech teams , psychological safety, and financial performance.
Here's a summary of the latest 2024 DORA metrics benchmarks:
Deployment Frequency tracks how often an organization deploys code to production or releases it to end users. This metric is a key indicator of your team's ability to deliver value continuously, and more importantly, it shows how often our customers get new value from our development work.
Top teams respond quickly to customer needs and rapidly iterate their products, as shown by the Deployment frequency benchmarks:
Change Lead Time measures the time it takes from first commit to code successfully running in production, representing one of the most controllable stages by the engineering team. It also shows how quickly you can get features into the hands of customers, which is when value is truly delivered.
Benchmarks show that the Change Lead Time for high-performing companies is:
Change Failure Rate represents the percentage of changes that result in degraded service or require remediation (e.g., lead to service impairment or outage, and require a hotfix, rollback, fix forward, or patch). This metric reveals how often teams can’t deliver new value for customers due to a failure, and indicates the quality of your software delivery process.
Change Failure rates for top performing teams based off DORA benchmarks are:
Failed Deployment Recovery Time measures average time it takes to restore service when a software change causes an outage or service failure in production. It’s important because it shows how long your customers are unable to experience the full value of the app because of incidents. A low Failed Deployment Recovery Time indicates high efficiency in problem-solving and the ability to take risks with new features.
Based off DORA benchmarks, the time taken for top performing teams recover from failed deployments are:
However, research on the Verica Open Incident Database (VOID) highlights that may be issues around taking averages incidents data, such as high data variability and positive skew distributions often captured, making it an unreliable metric at times. As a result, using supplementary measures for incident response data are becoming more popular. Mean Time to Acknowledge (MTTA) is an example, which measures the average time it takes someone to acknowledge a new incident in production, which we include in Multitudes.
At Multitudes, we like Courtney Nash's metaphor of “putting some spinach in your fruit smoothies” — we can keep using the metrics that people are familiar with while starting to incorporate better metrics (the "spinach") to illustrate the full picture.
There is a key nuance between FDRT and MTTR is that:
Thus, FDRT is a cleaner measure of issues which the engineering team should’ve identified during QA or testing — whereas MTTR may include failures due to any reason such as an earthquake interrupting service at a data center. This is why DORA made the distinction between the two, in preference for FDRT, in their 2023 DORA report.
To do this, they changed the way they asked about failures in the DORA survey:
Ready to try DORA with your team?
Before jumping in, remember that a high-trust environment needs to be created first. Without trust, these metrics can be gamed and misused, leading to fear and uncertainty among team members.
Picture this: developers splitting their work into smaller, more frequent deployments just to improve their DORA scores. It’s Goodhart’s law in action: “When a measure becomes a target, it ceases to be a good measure”
So how can we implement DORA in thoughtful way? Here's our approach:
From there, it's rinse and repeat for steps 4-5. Choose one area to improve, then run an experiment, track progress and iterate.
This approach will help build a culture of continuous improvement and blameless experimentation. Getting into a habit of experimentation lowers the barriers for teams to change their ways of working, and builds the muscle of tackling new challenges and shifting priorities.
DORA metrics provide a standard framework for measuring software delivery performance. But, because software development is a type of knowledge work, so it’s not as simple to measure as inputs in leading to widgets out.
On top of that, the rise in remote and hybrid work has further complicated this scenario. We must be cautious not to fall into the trap of oversimplification or creating perverse incentives.
Fostering a high-trust environment with strong psychological safety is crucial for the effective use of productivity metrics. Google's Project Aristotle research into team effectiveness revealed that psychological safety, more than anything else, was critical to making a team work. The researchers found that individuals on teams with higher psychological safety were less likely to leave Google, they were more likely to harness the power of diverse ideas from their teammates, they brought in more revenue, and they were rated as effective twice as often by executives.
In such environments, teams have the space to experiment, learn, and refine their metrics over time, knowing that they will support each other in the process of continuous improvement. This in contrast to environments where the lack of trust causes fear that metrics are used to blame individuals — leading to burnout and behaviors that do not contribute genuine business value.
Reductive metrics can lead to reductive outcomes—focusing the number of commits or lines of code won’t give the full picture of how a team is performing and what they’ve contributed. Goodhart’s Law warns us that once a metric becomes a goal, its effectiveness as an indicator declines. Concentrating only on these superficial figures might inadvertently encourage behaviors that do not contribute genuine business value—for instance, writing lots of code without spending enough time on quality, or simply to game the metrics.
That’s why we recommend putting in place guardrails for what you won’t measure as much as what you will. Just because something can be measured doesn’t mean it should be, or that it will help anyone to measure it. We’ve written a separate article about our data ethics guardrails here.
DORA metrics help teams take a data-driven approach to software development and delivery. But, depending too much on metrics can give a skewed perspective of developer productivity. Numerical data usually doesn’t account for the entirety of a developer’s work quality. For instance, while high volumes of code commits might appear impressive based on quantitative analysis, without getting the team’s context, one cannot discern if these changes are substantive or merely cosmetic.
To gain a full grasp of productivity levels, it is crucial to pair metrics with the human context. As we always like to say at Multitudes, even if we had all the integrations, surveys, and data sources in the world, it would never be the sum of a human – because humans do things offline and they change. The best way to get the complete picture is to pair the data with conversations with people. That’s the best way for organizations to make informed choices and enhance overall team performance significantly.
To effectively track and analyze DORA metrics, teams can use Multitudes, which is an engineering insights platform for sustainable delivery. Multitudes integrates with your existing development tools, such as GitHub and Jira, to provide insights into your team's productivity and collaboration patterns.
With Multitudes, you can:
By leveraging Multitudes, you can improve your DORA metrics while giving your teams more time to act on insights, enhancing their productivity and satisfaction.
Our clients ship 25% faster without sacrificing code quality.
Ready to unlock happier, higher-performing teams?