Reduce MTTA
Incident Management Scorecards: Reduce Mean Time to Acknowledge (MTTA)
Mean Time to Acknowledge (MTTA) measures how quickly teams acknowledge an incident after it is triggered.
Reducing MTTA depends on:
- Operational readiness: Ensuring the right people are reachable at all times. 
- Response behavior: Tracking and improving how fast incidents are acknowledged once triggered. 
A well-designed Scorecard shouldn't just track MTTA as a number; it should validate the conditions that enable low MTTA, such as on-call setup, contact methods, and escalation policy depth.
Best practices
When creating a Scorecard aimed at reducing MTTA, follow these best practices:
- Group rules by functional area (e.g., Incident Response, Monitoring, Reliability) to simplify assessment. 
- Keep evaluation windows aligned so that related signals trend together. 
- Enable Cortex notifications for when overall Scorecard scores drop, prompting teams to review and act. 
Rules that focus on reducing MTTA
On-call configuration
Ensure service has an active PagerDuty schedule
oncall != null
Contact reliability
Verify responders can be reached through multiple channels
oncall.usersWithoutContactMethods(allowed=["EMAIL","PHONE","PUSH_NOTIFICATION","SMS"]) == 0
Escalation depth
Require at least two escalation tiers
oncall.numOfEscalations() >= 2
Acknowledgment time
Track MTTA against defined thresholds
jq(oncall.analysis(...), '.meanSecondsToFirstAck <= 300')
Monitoring coverage
Ensure critical services have active alerting rules
datadog.monitors().length > 0
SLO tracking
Validate that a service has defined SLOs and error budgets
slos().any((slo) => slo.name.matchesIn(".Uptime."))
Ownership coverage
Require that every service has a defined owning team
ownership != null
Communication channel is set
Require that every service has a defined communication channel
slack != null and slack.numOfMembers() > 0
Entity was verified in the last 90 days
Require entity information, including ownership, on-call, and Slack to be verified in the last 90 days
verifications().lastVerifiedAt() != null and verifications().lastVerifiedAt().fromNow() > duration("P-90D")
Entity does not have pending verifications
Ensure entity does not have any pending verifications
verifications().verifications().any(verification => verification.status == "PENDING") == 0
Examples from real Cortex users
The following anonymized examples come from real uses cases our customers are solving with Cortex.
Event Readiness Scorecard
For companies that have a busy season (e.g., companies that are busier during Black Friday), they might create a seasonal readiness Scorecard in Cortex. The following strategy ties performance metrics directly to readiness controls:
- Track MTTA < 120 seconds for P1 and P2 incidents 
- Require two-tier escalation policies in PagerDuty: - oncall.numOfEscalations() >= 2
- Validate that on-call users have valid contact methods configured: - oncall.usersWithoutContactMethods(...) == 0
- Combine outcome metrics (MTTA) with configuration checks to ensure teams can meet targets consistently. 
On-call Configuration Scorecard
The following strategy ensures every service can be reached before an incident occurs, eliminating the common MTTA outliers caused by misconfigured alerts:
- Verify that on-call rotations exist - oncall != null
- Validate that on-call users have valid contact methods configured: - oncall.usersWithoutContactMethods(...) == 0
- Flag services without assigned responders 
This CQL applies to all on-call integrations: Opsgenie, PagerDuty, Splunk On-Call (formerly VictorOps), and xMatters.
Operational Maturity Scorecard
The previous examples applies to services and other entities. Some organizations prefer to track team-level operational maturity, including incident management as one area of assessment.
The following strategy aims for low MTTA as part of a broader operational maturity, not an isolated performance goal:
- Measure operational behaviors like post-incident reviews, ownership clarity, and alert hygiene. 
- Focus on consistent response patterns rather than single-point metrics. 
Last updated
Was this helpful?