DORA Metrics: Using Deployment Frequency to Measure Engineering Performance
The Four Metrics
Deployment frequency: how often code is deployed to production.
Lead time for changes: how long from commit to production.
Change failure rate: percentage of deployments that cause an incident or require rollback.
Time to restore service: how long from incident start to resolution.
Together, they capture both speed (frequency, lead time) and stability (failure rate, restore time). Good organizations score well on all four.
Why These Four
The metrics emerged from years of DORA’s State of DevOps research. They correlate with organizational outcomes — revenue growth, customer satisfaction, employee retention — better than most other engineering metrics studied.
They’re outcome metrics, not activity metrics. Lines of code, PR count, and ticket throughput are activity. Deployment frequency and lead time measure whether activity is actually shipping value.
Performance Levels
DORA classifies teams as Low, Medium, High, or Elite based on these metrics. Elite: deploy multiple times per day, lead time under an hour, change failure rate 0-15%, restore time under an hour. The full classification benchmarks are in the DORA quick check.
The classification is useful as a benchmark. The real value is movement over time, not the label.
How to Measure Them
Deployment frequency: instrument your deployment system. Counter incremented on every production deployment, dashboard showing daily/weekly/monthly rates.
Lead time: tag commits and measure time from commit to production deployment. PR merge time is a decent proxy when full instrumentation isn’t available.
Change failure rate: count incidents triggered by deployments divided by total deployments. Requires consistent incident classification.
Time to restore: from incident start to resolution. Requires incident tracking with clear start and end times.
Where DORA Metrics Go Wrong
Optimizing for the metric, not the underlying behavior. Teams that game deployment frequency by splitting one logical change across multiple deploys aren’t actually shipping faster.
Comparing across teams with different domains. A platform team and a feature team have legitimately different deployment patterns. Compare each team to itself over time.
Treating metrics as performance reviews. Punishing low scores destroys the honesty needed to use the metrics for improvement.
Related Reading
- See our deeper guide at /devops/slo-sla-sli-explained/.
Tooling for DORA Measurement
Several tools collect DORA metrics automatically: Faros, LinearB, Jellyfish, and Sleuth pull from GitHub, GitLab, deployment systems, and incident tools to compute metrics with minimal manual effort. Google’s own Four Keys project is an open-source reference implementation.
Building your own is also reasonable. The instrumentation isn’t complex: deploy webhooks to count deployments, PR data for lead time, incident management data for failure rate and restore time. Many teams build a lightweight internal dashboard rather than buying a commercial tool.
DORA and Other Frameworks
DORA isn’t the only software delivery metrics framework. SPACE (Satisfaction, Performance, Activity, Communication, Efficiency) is broader. The Engineering Excellence framework from McKinsey covers similar ground.
DORA is the most widely adopted and the easiest to start with. SPACE and similar frameworks become useful once DORA metrics are well-established and you’re looking for additional dimensions to optimize.
Cycle Time and Lead Time
Lead time for changes (DORA) measures commit-to-production. Cycle time (broader engineering metric) measures other intervals: ticket-to-PR, PR-to-merge, idea-to-deploy.
The DORA-defined lead time is a subset of overall cycle time. Optimizing one doesn’t necessarily optimize the other. Teams with fast DORA lead time but slow planning cycles still ship slowly from a business perspective.
Track multiple cycle time variants. Identify the longest interval in your delivery process. Optimize that. Repeat. Most teams find that one or two specific intervals dominate and that targeted improvement work pays back quickly.
DORA in Different Contexts
DORA metrics were derived from software delivery research, but apply with adjustments to other contexts. Embedded software shipping monthly will have different absolute targets than SaaS shipping continuously, but the directional movement still matters.
Compare each team to itself over time. Cross-team comparisons miss context (team size, system complexity, change rate). Trends are meaningful; point-in-time comparisons are usually misleading.
Don’t ignore reliability when chasing speed. Teams that deploy more often but break things more often haven’t improved — they’ve shifted the metric. All four DORA metrics together capture the balance.
Release Notes and Changelog Generation
Automated release notes from commit history close the loop between code changes and user-facing communication. Tools like release-drafter, semantic-release, and changesets generate changelogs from conventional commits or PR labels.
The discipline of writing PR titles and commit messages for downstream consumption pays back here. PR titles become changelog entries; clear titles make for clear changelogs.
For libraries with external users, automated semver bumping based on commit type (feat: minor, fix: patch, breaking change: major) reduces manual version management. The same tooling can publish to npm, PyPI, or other package registries on merge.
Security in CI/CD
CI/CD systems hold significant power: they can build code, sign images, push to registries, and deploy to production. Securing them matters.
Standard hardening: least-privilege credentials for each step, signed artifacts at each stage, audit logs of all pipeline executions, separation between build environments and production credentials.
Supply chain attacks via compromised CI are a real and growing threat. SLSA (Supply chain Levels for Software Artifacts) provides a framework for thinking about CI/CD security maturity. Most organizations land at SLSA level 1-2; reaching level 3 requires real investment but provides meaningful guarantees.
Pipeline Templating and Reuse
At scale, copy-pasted CI configuration becomes a maintenance burden. Every change to the standard pipeline requires touching dozens of repos.
Templating mechanisms vary by platform: GitHub Actions composite actions and reusable workflows, GitLab CI includes and templates, Jenkins shared libraries. Each provides a path to defining pipeline logic once and consuming it from many repositories.
The pattern that works: a small platform team maintains pipeline templates; service teams consume them by reference. Service-specific customization happens via variables and minimal local overrides. Template changes can be reviewed and tested centrally before propagating.
Build Cache and Performance
Build performance compounds at scale. A 30-second improvement on every pipeline run translates to hours per day across an organization.
Caching strategies matter most: dependency caches (npm, Maven, pip), Docker layer caches, intermediate build artifacts. Each cache type has different invalidation rules and storage requirements.
Remote caches shared across runners deliver the biggest improvement for monorepos and matrix builds. Bazel remote cache, Turborepo Remote Cache, and Nx Cloud all provide this for their respective ecosystems. Build times that dropped from 10 minutes to 1 minute aren’t unusual.
Putting It Into Practice
The patterns described throughout this article aren’t all equally important for every team. The right starting point depends on current state.
For teams without consistent CI/CD: focus first on basic pipeline reliability and speed. Inconsistent or slow pipelines undermine every other improvement you might try later.
For teams with working pipelines but high change failure rate: invest in better testing, smaller deployments, and explicit rollback procedures. The shift from ‘shipping is scary’ to ‘shipping is routine’ transforms how teams operate.
For teams with reliable CI/CD looking to advance: progressive delivery, deployment frequency improvement, and DORA metric tracking are the natural next steps. Each builds on the foundation rather than replacing it.
The advancement isn’t linear, and not every team needs every capability. Match the practices to the team’s actual constraints and let the rest wait.
Key Takeaways
The most important point throughout this guide: practical engineering decisions depend on specific context. Best-practice recommendations are starting points, not destinations. The right answer for your team depends on your scale, your existing tooling investment, your team’s experience, and the specific constraints you face.
Three principles worth carrying forward regardless of specific tool choices. First, measure what you change. Engineering improvements without measurement become folklore — claims without evidence. Track the metrics that show whether interventions are working.
Second, default to simpler architectures and tools. Complexity has cost. Each additional moving part is something to monitor, debug, upgrade, and eventually replace. Choose the simplest thing that meets your actual requirements, not the most sophisticated thing you could build.
Third, invest continuously in the boring foundations. Reliable CI, good observability, sensible access controls, and clear documentation pay back across every project. Skipping these for short-term feature velocity accumulates debt that eventually consumes the velocity it was supposed to enable.
The teams that operate well over the long term are usually not the teams with the most exotic tooling. They’re the teams with disciplined fundamentals, deliberate decision-making, and continuous incremental improvement.
Frequently Asked Questions
How do I start measuring DORA metrics?
Deployment frequency is the easiest — just count deploys. Lead time and change failure rate require more instrumentation. Build up gradually.
What’s a realistic target?
Move from Low to Medium first. Most teams that try get there within a year if leadership supports the changes needed.
Should I compare teams to each other?
No. Compare each team to itself over time. Cross-team comparisons miss too much context.
Are there other metrics worth tracking?
Reliability metrics (SLOs), developer experience surveys, and incident-related metrics complement DORA. Don’t displace it.