Standardize and automate readiness

To configure your Cortex workspace for Production Readiness, we recommend the following actions:

Use Cortex features to meet Production Readiness standards

Expand the tiles below to learn about configuring Cortex features to meet Production Readiness standards.

Step 1: Ingest data and solve ownership 🔌

Before getting started on any use case, it is crucial to import your services, resources, infrastructure, and other entities, and to have clear visibility into the ownership of your entities.

Connecting your entities to Cortex establishes a single source of truth across your engineering organization. It enables the ability to track progress via Scorecards, automate Workflows, and gain insights from Eng Intelligence.

Setting ownership of entities ensures that every service and system is clearly linked to accountable teams or individuals, enabling faster incident response, reducing handoff friction, and making it possible to enforce standards consistently.

The more data you have available, the more actionable and insightful your Scorecards can be.

Relevant integrations

To focus on Production Readiness, Cortex recommends integrating with tools that provide visibility and control over code, deployments, monitoring, on-call, and documentation. Make sure you have configured integrations for the following categories:

Cortex also recommends linking to runbooks and documentation for your entities, ensuring your users have access to critical information.

With your data in Cortex, you have a jumping-off point to start driving production readiness.

Step 2: Configure a Scorecard for Production Readiness 📋

Scorecards automate the process of checking whether services meet criteria such as ownership, on-call coverage, runbooks, monitoring, and security requirements.

Cortex's Production Readiness template includes a set of predefined rules which can be customized based on your organization's requirements, infrastructure, and goals. It is structured into three levels — Bronze, Silver, and Gold — with each representing increasing levels of production readiness.

Step 2.1: Create the Scorecard and configure the basics

  1. On the Scorecards page in your workspace, click Create Scorecard.

  2. On the Production Readiness template, click Use.

Click "Create Scorecard" then click "Use" on the "Production Readiness" template.
  1. Configure basic settings, including the Scorecard's name, unique identifier, description, and more.

    1. Learn about configuring the basic settings in the Creating a Scorecard documentation.

Step 2.2: Review and modify the rules

While Cortex's template is based on common industry standards, you may need to adjust the rules based on which tools you use and how your organization prioritizes standards and requirements. You can reorder, delete, and edit rules, you can add more rules to a level, and you can assign more points to a rule to signify its importance.

The Scorecard template contains rules that enforce industry best practices, such as:

  • Enforce ownership, linked docs, and linked Slack channels to enable quick action during incidents.

  • Enforce having monitors documented to have visibility into service health, performance, and reliability.

  • Enforce CI/CD pipelines set, merge approval required, high code coverage, and last commit within 1 week, helping reduce the likelihood of bugs being shipped into your code.

  • Enforce SLOs met and 2 tiers in on-call escalation policy, ensuring reliability, stability, and reduced downtime in case of an incident.

When adding or changing the template rules, you can select from a list of available pre-built rules. Behind each rule is a Cortex Query Language (CQL) query; you can also write your own queries to further refine your rules.

Step 3: Configure data verification ✔️

Data verification is critical for ensuring the accuracy, consistency, and completeness of data before it is used in a production system. Accurate data enables faster incident response, operational excellence, and the success of data-driven decision-making.

  • Follow the Data Verification documentation to define verification periods for your entities.

    • You can select entity types that this verification process will apply to, or you can define a CQL expression to specify which entities the process will apply to.

    • You can choose who is responsible for verifying the entities. If you do not specify, then any owner of an entity can complete the verification.

Step 4: Automate processes via Workflows ⚙️

You can use Workflows to streamline and standardize Production Readiness processes by turning best practices and readiness checks into repeatable, self-service automations.

Workflows to establish adherence to best practices

  • You can add manual approval steps in a Workflow to require sign-off from specific team members before a service is considered production-ready, ensuring accountability and providing an audit trail.

  • When Scaffolding new services, you can use templates to ensure that every new service starts with baseline standards (e.g., on-call information, runbooks, SLOs configured, and more).

Workflows based on Production Readiness Scorecards

In a Workflow, you can use an HTTP request to get an individual entity's score or the latest scores for all entities on your Production Readiness Scorecard, then configure additional blocks to take actions based on the score.

For example, you could create a Workflow that blocks deployment based on Scorecard scores, ensuring that a deployment is blocked if the entity has not met your standards for Production Readiness.

  • See an example of this Workflow in the template "Deploy to prod based on Scorecard score" in your Cortex workspace:

    See the "Deploy to prod based on Scorecard score" template in Cortex.
    • When the Workflow runs, it checks whether the entity has achieved the "Gold" level standard in the Scorecard. If it has, the deployment continues. If it has not, the Workflow automatically sends a Slack message to notify the entity owner.

Obtain Scorecard scores within a Workflow to use in subsequent actions

See the example below demonstrating how to obtain an entity's Scorecard score in an HTTP block within a Workflow:

  1. Add an HTTP request block to your Workflow.

  2. Enter a name and unique slug for the block, then configure the remaining fields:

    1. HTTP method: GET

    2. URL: Enter the Cortex API URL for obtaining the scores, e.g., https://api.getcortexapp.com/api/v1/scorecards/<unique-scorecard-tag>/scores?entityTag={{context.entity.tag}}.

    3. Headers: Add the following headers:

      • Content-Type: application/json

      • Authorization: Bearer {{context.secrets.cortex_api_key}}

  3. Save the block.

You can reference the output of this block in subsequent blocks, allowing you to streamline the followup actions you take based on an entity's level of Production Readiness.

Step 5: Review and act on Eng Intelligence 📈

Use Eng Intelligence features — DORA dashboard, Velocity Dashboard, and Metrics Explorer — to surface and track key engineering metrics related to Production Readiness.

Review trends in Eng Intelligence graphs and metrics.

Review trends in areas such as deployment frequency, incident response, and other indicators that are important to your organization. This helps you identify areas where teams or services are not meeting readiness standards.

Production Readiness in action

Learn about what ongoing Production Readiness looks like in Production Readiness in action.

Last updated

Was this helpful?