Incident Management & Response
Incident Management is the process of detecting, responding to, and resolving unplanned service disruptions. When done well, Incident Management minimizes downtime, reduces customer impact, and builds organizational trust.

How Cortex helps with Incident Management & Response
Use Cortex to help you reduce incident response time by:
Triggering incidents directly from an entity page in Cortex, enabling engineers to quickly declare incidents and link them to third-party incident management tools
Notifying users when an incident is triggered for PagerDuty via the On-Call Assistant, providing immediate access to entity health, recent deployments, runbooks, and dependencies to accelerate response and resolution
Using Cortex MCP to quickly gain visibility into impacted services and understand next steps
Configure Cortex to help you prepare for incidents by:
Connecting your data in Cortex: Ensuring accurate owners and runbooks are listed for your entities and integrating with the tools you use to handle incidents, giving you a centralized view of your ecosystem and a fully-prepared workspace before incidents happen
Tracking incident-related metrics in Scorecards to drive alignment and continuous improvement in incident response practices
Configuring Workflows that streamline common practices during an incident, such as rolling back deployments
Prepare for and handle incidents in Cortex
Learn how to use Cortex features for incident management: Prepare for and prevent incidents
Learn how to handle active incidents with Cortex: Incident Response in action
Looking for additional resources on enforcing Incident Management best practices in Cortex? Check out the Cortex Academy "Incident Management & Response" course, available to all Cortex customers and POVs.
Last updated
Was this helpful?