# PagerDuty

{% hint style="info" %}
Cortex connects to many third-party vendors whose system interfaces frequently change. As a result, integration behavior or configuration steps may shift without notice. If you encounter unexpected issues, check with your system administrator or refer to the vendor's documentation for the most current information. Additionally, integration sync times vary and are subject to scheduling overrides and timing variance.
{% endhint %}

[PagerDuty](https://www.pagerduty.com/) is an incident response platform that allows developers to manage alerts, schedule rotations, define escalation policies, and more.

Integrating PagerDuty with Cortex allows you to:

* Pull in PagerDuty services, on-call schedules, and escalation policies
  * The on-call user or team will appear in the **Current On-call** block on an entity's details page.
  * You can also view on-call information on an entity page in its side panel under **Integrations > On-call**.
* [Trigger incidents in PagerDuty](#trigger-an-incident) directly from Cortex
* Automatically surface the most vital information about entity health and metadata when an incident is triggered by using [Cortex's On-Call Assistant tool](#enabling-the-on-call-assistant)
  * The On-Call Assistant automatically notifies users via Slack when an incident is triggered. The notifications include runbooks, links, dependencies, and key information about the affected entity.
* View incidents from PagerDuty in an entity's event timeline
* View on-call information from PagerDuty in the [engineering homepage](#engineering-homepage)
* Use PagerDuty metrics in [Eng Intelligence](#eng-intelligence) to understand key metrics and gain insight into services, incident response, and more.
* Create [Scorecards](#scorecards-and-cql) that track progress and drive alignment on projects involving your on-call schedules and alerts

### How to configure PagerDuty with Cortex

### Prerequisites

Before getting started:

* Create a [PagerDuty API key](https://support.pagerduty.com/docs/generating-api-keys).
  * When adding the API key, you have the option to set `read` or `write` permissions.
    * **Read**: Enables Cortex to read any and all data from PagerDuty
    * **Write**: Allows users to [trigger incidents](#trigger-an-incident) from an entity page in Cortex, and enables On-Call Assistant
  * To use PagerDuty [blocks in a Workflow](/streamline/workflows/blocks.md#configure-an-integration-block) in Cortex, enable the following permissions:
    * `incident.write` to create incidents
    * `escalation_policies.write` to create escalation policies
    * `schedules.write` to create schedules
    * `services.write` to create services
    * `teams.write` to create teams

### Configure the integration in Cortex

1. In Cortex, navigate to the [PagerDuty settings page](https://app.getcortexapp.com/admin/integrations/pagerduty):
   * Click **Integrations** from the main nav. Search for and select **PagerDuty**.
2. Click **Add configuration**.
3. Configure the integration:
   * **API key**: Enter the API key you created in PagerDuty.
     * If the **Read-only API key** option is togged off, Cortex will use assume the provided API key has `write` permissions.
4. Click **Save**.

If you’ve set everything up correctly, you’ll see the option to **Remove Integration** in settings.

You can also use the **Test configuration** button to confirm that the configuration was successful. If your configuration is valid, you’ll see a banner that says “Configuration is valid. If you see issues, please see documentation or reach out to Cortex support.”

To modify the integration configuration, see [Modifying an existing integration configuration](/ingesting-data-into-cortex/integrations.md#modifying-an-existing-integration-configuration).

## Enabling the On-call Assistant

At this stage, you can enable the Cortex On-call Assistant, which notifies users via Slack when an incident is triggered in PagerDuty. See the documentation for instructions: [On-Call Assistant](/ingesting-data-into-cortex/entities-overview/entities/oncall-assistant.md).

Note that On-Call Assistant will only work for [service-level PagerDuty registrations](#considerations-for-registering-pagerduty-entities) since these notifications are related to affected services.

## How to connect Cortex entities to PagerDuty

### Discovery

By default, Cortex will use the [Cortex tag](/ingesting-data-into-cortex/entities-overview/entities.md#cortex-tag) (e.g. my-entity) or its name as the "best guess" for PagerDuty services. For example, if your Cortex tag is my-entity, then the corresponding service in PagerDuty should also be my-entity.

If your PagerDuty services don’t cleanly match the Cortex tag or name, you can override this in the Cortex entity descriptor.

### Considerations for registering PagerDuty entities

Cortex recommends setting up PagerDuty at the service level by registering service entities with PagerDuty services, rather than configuring team entities with a PagerDuty schedule.

If PagerDuty is set up on a service level, you can see current on-call information listed within a given service's page. If PagerDuty is set up on the team level, you will only be able to view on-call rotation information from a team page.

Other benefits to setting up PagerDuty on a service level include:

* Structuring PagerDuty 1-1 with services enables better alert routing and analytics, something that organizations struggle more with when PagerDuty is set up on a team level.
* With a service-level setup, it’s easier to enforce all services to have a compliant on-call policy enacted in PagerDuty, especially when making use of Scorecards.
* The service-level setup is less reliant on team members tagging incidents with service information because services and incidents are already linked.
* You will gain the ability to get data from your Cortex catalog into PagerDuty, such as tier/criticality. By tying the service entities in the catalog with those in PagerDuty, you can automate processes and streamline severity protocols.

#### View on-call data only

If you want to only view on-call data for entities, and you do not want incidents displayed in Cortex, you can [register the escalation policy ID](#define-an-escalation-policy) for an entity.

### Editing the entity descriptor

For a given entity, you can define the PagerDuty service, schedule, or escalation policy within the entity’s YAML. **You can only set up one of these three options per entity.**

Each of these has the same field definitions.

| Field  | Description                                              | Required |
| ------ | -------------------------------------------------------- | :------: |
| `id`   | PagerDuty ID for service, schedule, or escalation policy |   **✓**  |
| `type` | `SERVICE`, `SCHEDULE` or `ESCALATION_POLICY`             |   **✓**  |

#### **Define a PagerDuty service**

Find the service ID value in PagerDuty under **Configuration > Services**. The URL for the service will contain the ID, for example: `https://cortexapp.pagerduty.com/services/`. You can only configure one service ID per entity.

```yaml
x-cortex-oncall:
  pagerduty:
    id: ASDF1234
    type: SERVICE
```

#### **Define a schedule**

Find a schedule ID in PagerDuty under **People > On-call schedules**. Click the desired schedule to view its ID in the URL, for example: `https://cortexapp.pagerduty.com/schedules#`. You can only configure one schedule per entity.

```yaml
x-cortex-oncall:
  pagerduty:
    id: ASDF1234
    type: SCHEDULE
```

#### **Define an escalation policy**

Find the escalation policy ID in PagerDuty under **People > Escalation Policies**. Click the desired policy to view its ID in the URL, for example: `https://cortexapp.pagerduty.com/escalation_policies#`. You can only configure one escalation policy per entity.

When linking a Cortex entity to a PagerDuty escalation policy, only on-call information will be surfaced in Cortex — incidents will not be shown. This is a useful alternative for teams that want to suppress incident visibility while displaying call schedules.

```yaml
x-cortex-oncall:
  pagerduty:
    id: ASDF1234
    type: ESCALATION_POLICY
```

{% hint style="warning" %}
You can only set up one of the three options above per entity.
{% endhint %}

### Identity mappings

Cortex maps email addresses in your PagerDuty instance to email addresses that belong to team members in Cortex. When [identity mapping](/configure/settings/managing-users/identity-mapping.md) is set up, users will be able to see their personal on-call status from the developer homepage.

## Using the PagerDuty integration

#### Entity pages

Once the PagerDuty integration is set up, you’ll be able to view current on-call information in the "on-call" block on an [entity details page](/ingesting-data-into-cortex/entities-overview/entities/details.md). In the left sidebar of an entity, click **On-call & incidents** to view on-call information, escalation policy, service, and incidents.

The escalation policy and PagerDuty service details are hyperlinked to the corresponding pages in your PagerDuty instance.

Click **Events** in an entity's sidebar to view recent events pulled in from PagerDuty.

#### Engineering homepage

The PagerDuty integration enables Cortex to pull on-call information into the on-call block on the [Engineering homepage](/streamline/homepage.md). On-call data from PagerDuty is refreshed every 60 minutes.

#### Eng Intelligence

Cortex also pulls in metrics from PagerDuty for [Eng Intelligence](/improve/eng-intelligence.md). This tool will display MTTR, incidents opened, and incidents opened per week.

#### Retrieve on-call information in Slack

If you have a Slack integration set up, you can also use the `/cortex oncall` [Slack Bot command](/ingesting-data-into-cortex/integrations/slack.md#cortex-bot) to retrieve current on-call information. This feature works for both services and teams with registered PagerDuty schedules or escalation policies.

#### Scorecards and CQL

With the PagerDuty integration, you can create Scorecard rules and write CQL queries based on incidents, escalations, and on-call metadata.

See more examples in the [CQL Explorer](https://app.getcortexapp.com/admin/cql-explorer) in Cortex.

<details>

<summary>Check if on-call is set</summary>

Check if entity has a registered service, schedule, or escalation policy. If the service does not have any registrations in its entity descriptor, Cortex searches for PagerDuty services matching the tag defined in the entity's `x-cortex-tag` field.

**Definition:** `oncall (==/!=) null`

**Example**

For a Scorecard focused an production readiness, you can use this expression to make sure on-call is defined for entities:

```
oncall != null
```

This rule will pass if an entity has a service, schedule, or escalation policy set.

</details>

<details>

<summary>Forbidden contact methods</summary>

Number of users in each entity's escalation policy with missing or forbidden contact methods.

Allowed contact methods:

* "SMS"
* "PHONE"
* "EMAIL"
* "PUSH\_NOTIFICATION"
* "SLACK"

**Definition:** `oncall.usersWithoutContactMethods(allowed=<allowed>, onlyCurrentOncall=<boolean>).length`

**Example**

For a Scorecard focused on ownership, you can use this expression to make sure users have required contact methods enabled:

```
oncall.usersWithoutContactMethods(allowed=["SMS", "PHONE"]).length == 0
```

This rule will pass if every user in an associated escalation policy has either SMS or phone calls enabled as their contact method.

You can also use this expression in the Query builder to find users that lack the required contact method:

```
oncall.usersWithoutContactMethods(allowed=["EMAIL"]) > 0
```

This query will surface users without email addresses.

If you want to check only current on-call users, you can use the `onlyCurrentOncall` parameter:

```
oncall.usersWithoutContactMethods(allowed=["EMAIL"], onlyCurrentOncall=true) > 0
```

When this parameter is set to `false` or omitted, the expression will check all users in the associated escalation policy for the next 3 months.

</details>

<details>

<summary>Incident response analysis</summary>

Get detailed [on-call analysis stats](https://developer.pagerduty.com/api-reference/694e92fe4f943-get-aggregated-service-data) for each entity:

* Mean assignment count
* Mean engaged seconds
* Mean engaged user count
* Mean seconds to engage
* Mean seconds to first ack
* Mean seconds to mobilize
* Mean seconds to resolve
* Total business-hour erruptions
* Total engaged seconds
* Total escalation count
* Total off-hour erruptions
* Total sleep-hour erruptions
* Total snoozed seconds
* Total incident count
* Up time percent

PagerDuty updates its analytics data once per day, and it can take up to 24 hours before new incidents appear in the analytics API.

**Only works if entity has a registered PagerDuty service ID or if the PagerDuty service name matches the entity tag.**

**Definition:** `oncall.analysis(lookback = <duration>, priority = <List<String>>)`

**Examples**

PagerDuty analytics can easily be used to craft rules for a DORA metrics Scorecard.

For mean time to acknowledge, you can use the `meanSecondsToFirstAck` schema definition:

```
oncall.analysis(lookback = duration("P7D"), priority = ["P1", "P2"]).meanSecondsToFirstAck <= 300
```

Entities will pass this rule if incidents in the last week were acknowledged within 5 minutes.

For mean time to resolve, you can use `meanSecondsToResolve` to make sure that incidents were handled within an hour:

```
oncall.analysis(lookback = duration("P7D"), priority = ["P1"]).meanSecondsToResolve < 3600
```

You can also use this expression to write a rule checking entities' change failure rate:

```
oncall.analysis(lookback = duration("P7D")).totalIncidentCount == 0
```

This rule will pass if there weren't any incidents in the last week.

</details>

<details>

<summary>Incidents</summary>

Get incident data for each entity:

* Assignee ID
* Created at
* Incident ID
* Last updated
* Resolved at
* Service ID
* Status

**Only works if entity has a registered PagerDuty service ID or if the PagerDuty service name matches the entity tag.**

**Definition:** `oncall.incidents(lookback = <duration>)`

**Examples**

For a Scorecard focused on service maturity or quality, you can use this expression to check the number of incidents opened in the last month:

```
oncall.incidents(lookback = duration("P1M")).length < 15
```

Entities will pass this rule if they have fewer than 15 incidents opened in the last month.

You can also use this expression to make sure there aren't incidents that remain open over the last month:

```
oncall.incidents(lookback=duration("P1M")).filter((incident) => incident.status.matches("TRIGGERED|ACKNOWLEDGED")).length < 1
```

Or you can check for incidents that took a certain amount of time to resolve:

```
oncall.incidents(lookback=duration("P1M")).filter((incident) => incident.createdAt.until(incident.resolvedAt) > duration("P-2D")).length < 2
```

Entities will pass this rule if there were 0 or 1 incidents in the last month that took more than 2 days to resolve.

</details>

<details>

<summary>Number of escalations</summary>

Number of escalation tiers in escalation policy.

**Definition:** `oncall.numOfEscalations()`

**Example**

This expression could be used in a Scorecard focused on production readiness or service maturity:

```
oncall.numOfEscalations() >= 2
```

This rule checks that there are at least two tiers in an escalation policy for a given entity, so that if the first on-call does not ack, there is a backup.

While making sure an on-call policy set is a rule that would be defined in a Scorecard's first level, a rule focused on escalation tiers would make more sense in a higher level.

</details>

<details>

<summary>On-call metadata</summary>

On-call metadata, including type, id, and name.

**Definition:** `oncall.details()`

**Examples**

To find all entities with a schedule-type on-call registration, you can use this expression in the Query builder:

```
oncall.details().type == "schedule"
```

If you're migrating on-call policies, you could use this rule to check for outdated policies. Let's say, for example, all outdated PagerDuty policies start with "Legacy" in their titles.

```
oncall.details().id.matches("Legacy*") == false
```

Entities with on-call policies that start with "Legacy" will fail, while those with other policy names will pass.

</details>

### Trigger an incident

As described above under [Editing the entity descriptor](#editing-the-entity-descriptor), a given entity can have a PagerDuty service, schedule, or escalation policy defined. Only entities with a PagerDuty service defined will include the option to trigger an incident directly from Cortex.

Your PagerDuty API key must include the `write` permission in order to trigger incidents from an entity.

While viewing an entity in Cortex, follow these steps to trigger an incident in PagerDuty:

1. In Cortex, navigate to an entity. On the left side of an entity details page, click **On-call & incidents**.
2. In the upper right side of the entity's "On-call" page, click **Trigger incident**.
3. Configure the incident modal:
   * **Summary**: Enter a title for the incident.
   * **Description**: Enter a description of the incident.
   * **Severity**: Select a severity level.
4. At the bottom of the modal, click **Trigger incident**.
   * A confirmation screen will appear. In the confirmation, click the link to view the incident in PagerDuty.

### View integration logs <a href="#still-need-help" id="still-need-help"></a>

## Background sync

PagerDuty performs the following background jobs:

* **On-call:** On-call information displayed on the developer homepage is refreshed **every 60 minutes**.
* **Services and incidents:** Services used for automapping and active incidents viewable in the catalog are fetched approximately **every 5 minutes**, or however long the refresh takes.
* **Users**: User data for identity mapping is synced daily at 10 a.m. UTC.

## Still need help?[​](https://docs.cortex.io/docs/reference/integrations/aws#still-need-help) <a href="#still-need-help" id="still-need-help"></a>

The following options are available to get assistance from the Cortex Customer Engineering team:

* **Email**: <help@cortex.io>, or open a support ticket in the in app Resource Center
* **Slack**: Users with a connected Slack channel will have a workflow added to their account. From here, you can either @CortexTechnicalSupport or add a `:ticket:` reaction to a question in Slack, and the team will respond directly.

Don’t have a Slack channel? Talk with your Customer Success Manager.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.cortex.io/ingesting-data-into-cortex/integrations/pagerduty.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
