Incident Management & Response

Incident Management is the process of detecting, responding to, and resolving unplanned service disruptions. When done well, Incident Management minimizes downtime, reduces customer impact, and builds organizational trust.

Cortex helps with Incident Management use cases.

How Cortex helps with Incident Management & Response

Use Cortex to help you reduce incident response time by:

  • Triggering incidents directly from an entity page in Cortex, enabling engineers to quickly declare incidents and link them to third-party incident management tools

  • Notifying users when an incident is triggered for PagerDuty via the On-Call Assistant, providing immediate access to entity health, recent deployments, runbooks, and dependencies to accelerate response and resolution

  • Using Cortex MCP to quickly gain visibility into impacted services and understand next steps

Configure Cortex to help you prepare for incidents by:

  • Connecting your data in Cortex: Ensuring accurate owners and runbooks are listed for your entities and integrating with the tools you use to handle incidents, giving you a centralized view of your ecosystem and a fully-prepared workspace before incidents happen

  • Tracking incident-related metrics in Scorecards to drive alignment and continuous improvement in incident response practices

  • Configuring Workflows that streamline common practices during an incident, such as rolling back deployments

Prepare for and handle incidents in Cortex

Learn how to use Cortex features for incident management: Prepare for and prevent incidents

Learn how to handle active incidents with Cortex: Incident Response in action

Last updated

Was this helpful?