Skip to main content

PagerDuty

CatalogScorecards

PagerDuty is an incident response platform that allows developers to manage alerts, schedule rotations, define escalation policies, and more. By integrating PagerDuty with Cortex, you can track dozens of key on-call metrics and help teams enforce adoption of on-call best practices.

In this guide, you'll learn how to set up and use the PagerDuty integration in Cortex, enhancing incident response and reporting. The PagerDuty integration unlocks several powerful features:

  • View on-call information directly in catalogs
  • Trigger incidents directly
  • Enforce adoption of on-call best practices for entities and teams
  • Link to escalation policies

The PagerDuty integration also allows you to set up the on-call assistant. You can read more about the on-call assistant in this guide.

Setup and configuration

Getting started

In order to connect Cortex to your PagerDuty instance, you’ll need to create a PagerDuty API key.

When adding the API key, you have the option to set read or write permissions.

  • Read-only key: Enables Cortex to read any and all data from PagerDuty
  • Write key: Allows users to trigger incidents from an entity page in Cortex, and enables On-Call Assistant
tip

You can use a read-only key if you do not wish to trigger incidents directly from the catalog.

Configuration

Once you've created an API key in PagerDuty, you'll add it on the PagerDuty Settings page.

caution

If you do not see the Settings page you're looking for, you likely don't have the proper permissions and need to contact your admin.

You can specify a read-only API key by toggling on read-only API key. If this option is toggled off, Cortex will use assume the provided API key has write permissions.

At this stage, you can also enable or disable On-Call Assistant, which notifies users in Slack when an incident is triggered in PagerDuty. On-Call Assistant requires that you set up a webhook subscription in your PagerDuty account. Note that On-Call Assistant will only work for service-level PagerDuty registrations since these notifications are related to affected services. You can read more about On-Call Assistant in this walkthrough.

caution

The read-only API key option must be toggled off in order for On-call Assistant to be enabled.

Once you save your configuration, you'll see the last four characters of the token you entered. If you’ve set everything up correctly, you’ll see the option to Remove Integration in settings.

You can also use the Test configuration button to confirm that the configuration was successful. If your configuration is valid, you’ll see a banner that says “Configuration is valid. If you see issues, please see documentation or reach out to Cortex support.”

Service-level vs. team-level configuration

We recommend setting up PagerDuty at the service level by registering service entities with PagerDuty services, rather than configuring team entities with a PagerDuty schedule.

If PagerDuty is set up on a service level, you'll be able to see current on-call information listed within a given services's page, but if PagerDuty is set up on the team level, you will only be able to view on-call rotation info from a team page.

There are several long-term benefits to setting up PagerDuty on a service level:

  • Structuring PagerDuty 1-1 with services enables better alert routing and analytics, something that organizations struggle more with when PagerDuty is set up on a team level.
  • With a service-level setup, it’s also easier to enforce that all services have a compliant on-call policy enacted in PagerDuty, especially when making use of Scorecards.
  • The service-level setup is less reliant on team members tagging incidents with service info because services and incidents are already linked.
  • By setting up PagerDuty on a service level, you also gain the ability to get data from your Cortex catalog into PagerDuty, such as tier/criticality. By tying the service entities in the catalog with those in PagerDuty, you can automate processes and streamline severity protocols.

Registration

Discovery

By default, Cortex will use the entity name or the entity tag as the “best guess” for the entity in PagerDuty. For example, if your entity name and tag are “My Project” and “my-project,” Cortex will seek the corresponding name or tag in PagerDuty.

Entity descriptor

For a given entity, you can define the PagerDuty service, schedules, or escalation policy within the entity’s YAML. You can only set up one of these three options per entity.

PagerDuty service

You can find the service ID value by visiting PagerDuty → Configuration → Services. The URL for the service will contain the ID, for example: https://cortexapp.pagerduty.com/services/<ID>

x-cortex-oncall:
pagerduty:
id: ASDF1234 # Service ID
type: SERVICE
Schedules

You can find the Schedule ID by vising PagerDuty → People → On-call schedules and clicking on the desired schedule. The ID is found in the URL, for example https://cortexapp.pagerduty.com/schedules#<ID>.

x-cortex-oncall:
pagerduty:
id: ASDF1234 # Schedule ID
type: SCHEDULE
Escalation policy

You can find the Escalation Policy ID by vising PagerDuty → People → Escalation Policies and clicking on the desired policy. The ID is found in the URL, for example https://cortexapp.pagerduty.com/escalation_policies#<ID>.

x-cortex-oncall:
pagerduty:
id: ASDF1234 # Escalation Policy ID
type: ESCALATION_POLICY
caution

You can only set up one of the three options above per entity.

Identity mapping

Cortex maps email addresses in your PagerDuty instance to email addresses that belong to team members in Cortex. When identity mapping is set up, users will be able to see their personal on-call status from the developer homepage.

Expected results

Once the PagerDuty integration is set up, you’ll be able to view on-call information on entity pages:

  • Current on-call for an entity
  • Escalation policy
  • Service
tip

The escalation policy and PagerDuty service details are hyperlinked to the corresponding pages in your PagerDuty instance.

With CQL, you can check various metrics for services and define corresponding Scorecard rules:

  • Check if entity has on-call attached
  • Number of incidents opened per week
  • Number of on-call escalation levels in rotation
  • Number of users without the allowed contact methods
  • On-call metrics:
    • Mean seconds to first ack
    • Mean seconds to resolve
    • Total business hour interruptions
    • Total number of escalations
    • Total number of incidents
    • Total number of off-hour interruptions
    • Total seconds snoozed
    • Up time percent

Cortex also pulls in metrics from PagerDuty for Eng Intelligence. This tool will display MTTR, incidents opened, and incidents opened per week.

Notifications

If you have a Slack integration set up, you can also use the /cortex oncall <tag> command to retrieve current on-call information. This feature works for both services and teams with registered PagerDuty schedules or escalation policies.

Triggering incidents

If you used a write token to set up the integration, you’ll also see the ability to trigger incident from the PagerDuty tab in an entity’s home page. This will open a modal where you can enter information about the incident: title, details, urgency, and associated email address. The incident will then be triggered directly in PagerDuty.

caution

The Trigger Incident feature in the catalog only works with PagerDuty services.

Background sync

PagerDuty performs a number of background jobs:

  • On-call: On-call information displayed on the developer homepage is refreshed every hour
  • Services and incidents: Services used for automapping and active incidents viewable in the catalog are fetched approximately every 5 minutes, or however long the refresh takes.
  • Users: User data for identity mapping is synced daily at 10 a.m. UTC

Still need help?

The following are all the ways to get assistance from our customer engineering team. Please use the option that is best for your users:

  • Email: help@cortex.io, or open a support ticket in the in app Resource Center
  • Chat: Available in the Resource Center
  • Slack: Users with a connected Slack channel will have a workflow added to their account. From here, you can either @CortexTechnicalSupport or add a :ticket: reaction to a question in Slack, and the team will respond directly.

Don’t have a Slack channel? Talk with your customer success manager.