Reduce MTTA
Incident Management Scorecards: Reduce Mean Time to Acknowledge (MTTA)
Mean Time to Acknowledge (MTTA) measures how quickly teams acknowledge an incident after it is triggered.
Reducing MTTA depends on:
Operational readiness: Ensuring the right people are reachable at all times.
Response behavior: Tracking and improving how fast incidents are acknowledged once triggered.
A well-designed Scorecard shouldn't just track MTTA as a number; it should validate the conditions that enable low MTTA, such as on-call setup, contact methods, and escalation policy depth.
Best practices
When creating a Scorecard aimed at reducing MTTA, follow these best practices:
Group rules by functional area (e.g., Incident Response, Monitoring, Reliability) to simplify assessment.
Keep evaluation windows aligned so that related signals trend together.
Enable Cortex notifications for when overall Scorecard scores drop, prompting teams to review and act.
Rules that focus on reducing MTTA
On-call configuration
Ensure service has an active PagerDuty schedule
oncall != null
Contact reliability
Verify responders can be reached through multiple channels
oncall.usersWithoutContactMethods(allowed=["EMAIL","PHONE","PUSH_NOTIFICATION","SMS"]) == 0
Escalation depth
Require at least two escalation tiers
oncall.numOfEscalations() >= 2
Acknowledgment time
Track MTTA against defined thresholds
jq(oncall.analysis(...), '.meanSecondsToFirstAck <= 300')
Monitoring coverage
Ensure critical services have active alerting rules
datadog.monitors().length > 0
SLO tracking
Validate that a service has defined SLOs and error budgets
slos().any((slo) => slo.name.matchesIn(".Uptime."))
Ownership coverage
Require that every service has a defined owning team
ownership != null
Communication channel is set
Require that every service has a defined communication channel
slack != null and slack.numOfMembers() > 0
Entity was verified in the last 90 days
Require entity information, including ownership, on-call, and Slack to be verified in the last 90 days
verifications().lastVerifiedAt() != null and verifications().lastVerifiedAt().fromNow() > duration("P-90D")
Entity does not have pending verifications
Ensure entity does not have any pending verifications
verifications().verifications().any(verification => verification.status == "PENDING") == 0
Examples from real Cortex users
The following anonymized examples come from real uses cases our customers are solving with Cortex.
Event Readiness Scorecard
For companies that have a busy season (e.g., companies that are busier during Black Friday), they might create a seasonal readiness Scorecard in Cortex. The following strategy ties performance metrics directly to readiness controls:
Track MTTA < 120 seconds for P1 and P2 incidents
Require two-tier escalation policies in PagerDuty:
oncall.numOfEscalations() >= 2Validate that on-call users have valid contact methods configured:
oncall.usersWithoutContactMethods(...) == 0Combine outcome metrics (MTTA) with configuration checks to ensure teams can meet targets consistently.
On-call Configuration Scorecard
The following strategy ensures every service can be reached before an incident occurs, eliminating the common MTTA outliers caused by misconfigured alerts:
Verify that on-call rotations exist
oncall != nullValidate that on-call users have valid contact methods configured:
oncall.usersWithoutContactMethods(...) == 0Flag services without assigned responders
This CQL applies to all on-call integrations: Opsgenie, PagerDuty, Splunk On-Call (formerly VictorOps), and xMatters.
Operational Maturity Scorecard
The previous examples applies to services and other entities. Some organizations prefer to track team-level operational maturity, including incident management as one area of assessment.
The following strategy aims for low MTTA as part of a broader operational maturity, not an isolated performance goal:
Measure operational behaviors like post-incident reviews, ownership clarity, and alert hygiene.
Focus on consistent response patterns rather than single-point metrics.
Last updated
Was this helpful?