# Reduce MTTR

Mean Time to Resolution (MTTR) measures how quickly systems are restored after an incident begins.

Reducing MTTR depends on:

* Operational readiness: Clear ownership, actionable runbooks, and reliable on-call escalations ensure the right people are reachable at all times.
* Repair effectiveness: Tracking and improving how fast incidents are resolved and ensuring Cortex Workflows are configured to rollback, restart, scale up pods, and other processes you use during incidents.
  * See an example Workflow: [Rollback a service during an incident](/guides/incident-mgmt/workflow-rollback.md)

A well-designed Scorecard shouldn't just track MTTR as a number; **it should validate the conditions that enable low MTTR**, such as up-to-date runbooks, ownership, and on-call coverage with escalation depth.

### Best practices

When creating a Scorecard aimed at reducing MTTR, follow these best practices:

* Group rules by functional area (e.g., Incident Response, Observability, Reliability) to make gaps obvious.
* Keep evaluation windows aligned so that related signals trend together.
* Enable Cortex notifications for when overall Scorecard scores drop, prompting teams to review and act.

## Rules that focus on reducing MTTR

| Category                    | Purpose                                                                                                                                     | Example CQL expression                                                                        |
| --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- |
| On-call configuration       | Ensure service has an active PagerDuty schedule                                                                                             | `oncall != null`                                                                              |
| Contact reliability         | Verify responders can be reached through multiple channels                                                                                  | `actMethods(allowed=["EMAIL","PHONE","PUSH_NOTIFICATION","SMS"]) == 0`                        |
| Ownership coverage          | Require that every service has a defined owning team                                                                                        | `ownership != null`                                                                           |
| Monitoring coverage         | Ensure critical services have active alerting rules                                                                                         | `datadog.monitors().length > 0`                                                               |
| SLO tracking                | Ensure SLOs exist for latency                                                                                                               | `slos().filter((slo) => slo.name.matchesIn("latency") and slo.sliValue >= 0.9999).length > 0` |
| Ownership coverage          | Require that every service has a defined owning team                                                                                        | `ownership != null`                                                                           |
| Alerting channel configured | Require that a Slack or Microsoft Teams channel is configured for each service                                                              | `slack != null`                                                                               |
| Runbooks configured         | Require that a runbook is linked so responders have clear steps to follow                                                                   | `links("runbook").length > 0`                                                                 |
| MTTR benchmarks             | Target benchmarks for MTTR: In lower levels of a Scorecard, you might target <90 minutes, and in higher levels you might target <30 minutes | `oncall.analysis(lookback = duration("P30D")).meanSecondsToResolve < 1800`                    |
| CI/CD pipeline configured   | Require a CI/CD pipeline to exist so deployments can be automated and repeatable.                                                           | `git.fileExists(".gitlab-ci.yml")`                                                            |
| Pipeline success rate       | Ensure builds are passing successfully, showing stable automation and tests. You might target 85-95%.                                       | `git.percentBuildSuccess() >= 0.95`                                                           |

### Launch an Incident Preparedness Scorecard

Cortex offers a pre-built Scorecard for Incident Preparedness. You can launch this Scorecard to improve your incident processes, enabling a reduced MTTR.

Learn more about the template, and how to handle a broader Incident Management & Response use case, in [the Solutions docs](/solutions/incident-mgmt.md).

## Example: LetsGetChecked

Cortex customer LetsGetChecked used Cortex to automatically sync service and resource catalogs, enabling their team to quickly find accurate service information. They used Scorecards to drive operational excellence for onboarding, service maturity, and deployment frequency.

### Impact

They reduced MTTR by 67% and doubled their deployment frequency.

### Learn more

Learn more in the case study: [How LetsGetChecked doubled deployment frequency and slashed MTTR by 67% with Cortex](https://www.cortex.io/case-studies/letsgetchecked).

## Example: H\&R Block

H\&R block used Cortex to automate manual, repetitive tasks that were draining velocity and morale.&#x20;

### Impact

They reduced MTTR from up to 24 hours to less than one hour.

### Learn more

Learn more in the Cortex blog: [How H\&R Block automated the toil out of its developer experience](https://www.cortex.io/post/how-hr-block-automated-the-toil-out-of-developer-experience).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.cortex.io/guides/incident-mgmt/reduce-mttr.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
