Reduce MTTA

Incident Management Scorecards: Reduce Mean Time to Acknowledge (MTTA)

Mean Time to Acknowledge (MTTA) measures how quickly teams acknowledge an incident after it is triggered.

Reducing MTTA depends on:

Operational readiness: Ensuring the right people are reachable at all times.
Response behavior: Tracking and improving how fast incidents are acknowledged once triggered.

A well-designed Scorecard shouldn't just track MTTA as a number; it should validate the conditions that enable low MTTA, such as on-call setup, contact methods, and escalation policy depth.

Best practices

When creating a Scorecard aimed at reducing MTTA, follow these best practices:

Group rules by functional area (e.g., Incident Response, Monitoring, Reliability) to simplify assessment.
Keep evaluation windows aligned so that related signals trend together.
Enable Cortex notifications for when overall Scorecard scores drop, prompting teams to review and act.

Rules that focus on reducing MTTA

Examples from real Cortex users

The following anonymized examples come from real uses cases our customers are solving with Cortex.

Event Readiness Scorecard

For companies that have a busy season (e.g., companies that are busier during Black Friday), they might create a seasonal readiness Scorecard in Cortex. The following strategy ties performance metrics directly to readiness controls:

Track MTTA < 120 seconds for P1 and P2 incidents
Require two-tier escalation policies in PagerDuty: oncall.numOfEscalations() >= 2
Validate that on-call users have valid contact methods configured: oncall.usersWithoutContactMethods(...) == 0
Combine outcome metrics (MTTA) with configuration checks to ensure teams can meet targets consistently.

On-call Configuration Scorecard

The following strategy ensures every service can be reached before an incident occurs, eliminating the common MTTA outliers caused by misconfigured alerts:

Verify that on-call rotations exist oncall != null
Validate that on-call users have valid contact methods configured: oncall.usersWithoutContactMethods(...) == 0
Flag services without assigned responders

This CQL applies to all on-call integrations: Opsgenie, PagerDuty, Splunk On-Call (formerly VictorOps), and xMatters.

Operational Maturity Scorecard

The previous examples applies to services and other entities. Some organizations prefer to track team-level operational maturity, including incident management as one area of assessment.

The following strategy aims for low MTTA as part of a broader operational maturity, not an isolated performance goal:

Measure operational behaviors like post-incident reviews, ownership clarity, and alert hygiene.
Focus on consistent response patterns rather than single-point metrics.

Last updated 7 days ago

Was this helpful?