Upgrade or rollback Self-Managed Cortex

This guide explains how to upgrade your self-managed instance of Cortex and how to roll back in case of issues.

Before getting started:

Understand the Rollback Procedure: Read the rollback procedure in this document to be sure you understand the steps to roll back a failed deployment.
- Learn about backup and recovery best practices in Self-Managed Cortex: Backup and recovery.
Review Release Notes: Check the Cortex release notes for breaking changes or special upgrade instructions.
Schedule Maintenance Window: Plan for potential downtime during the upgrade.

Upgrading a Self-Managed Cortex instance

Step 1: Verify your current deployment state

Identify the Cortex version you are running. The Cortex version appears at the bottom of the left-hand nav in the settings page in the Cortex UI.
- Cortex version numbers look like 0.0.411.
Ensure your current deployment matches your override configuration. Run the following command, replacing CURRENT_VERSION with the version of Cortex you are currently running:

# This should return empty (no differences)
helm diff upgrade cortex cortex/cortex \
  --namespace cortex \
  --version CURRENT_VERSION \
  -f overrides.yaml

If no differences are found, this command will return no output, and you can proceed with the upgrade. If differences are found, investigate and update your overrides.yaml accordingly before proceeding.

Step 2: Back up your database

Cloud vendor solutions like AWS RDS or Azure DB often offer point-in-time snapshot and restore capabilities. If that is available to you, create a snapshot before upgrading, and be prepared to roll back to that snapshot in case you need to roll back the deployment.

Otherwise, the example below uses PostgreSQL's basic client tools to create a database backup and restore it to a new DB. This can serve as a quick start, but isn't recommended for production use. Consider consulting with your database management team to identify a backup and restore strategy that's suitable for your environment.

Backup using pg_dump:

pg_dump -h <db-host> -U <db-user> -d <db-name> > cortex_backup_$(date +%Y%m%d_%H%M%S).sql

Restore to a new database on the same server:

# Create new database
createdb -h <db-host> -U <db-user> cortex_backup_<date>

# Restore backup
pg_restore -h <db-host> -U <db-user> -d cortex_backup_<date> < cortex_backup_<timestamp>.sql

Learn about backup and recovery best practices in Self-managed Cortex: Backup and recovery.

Step 3: Preview upgrade changes

Identify the version of Cortex you want to install. Consider consulting with your Cortex account team to identify the appropriate version. Alternatively, you can find the latest version by running helm search repo cortex/cortex. Cortex version numbers look like 0.0.411.
Run the following command, replacing NEW_VERSION with the version number you identified, to see what will change:

helm diff upgrade cortex cortex/cortex \
  --namespace cortex \
  --version NEW_VERSION \
  -f overrides.yaml

Carefully review the output for:
- Image version changes
- New or modified environment variables
- Resource limit changes
- Any unexpected modifications

Step 4: Check release history

Before upgrading, note your current release revision:

helm history cortex -n cortex

Step 5: Perform the upgrade

Run the following command, replacing NEW_VERSION with the version you identified in Step 3, to upgrade Cortex:

helm upgrade cortex cortex/cortex \
  --namespace cortex \
  --version NEW_VERSION \
  --description "Upgrade to NEW_VERSION" \
  -f overrides.yaml

Monitor upgrade progress

Run the following command to monitor the progress of the upgrade:

# Watch pod status
kubectl get pods -n cortex

# When backend pods enter Running state, tail logs
kubectl logs -f deployment/cortex-deployment-backend -n cortex

Look for:

Database migration messages (lines containing "migrating")
Some stack traces during startup are normal
Readiness probe success: "GET /actuator/health/readiness HTTP/1.1" 200 25

Verify the upgrade

Once pods are ready:

Access the Cortex UI at your frontend URL and log in.
Verify core functionality works as expected.
Verify that your integrations are functioning as expected.

Rolling back Self-Managed Cortex

If issues occur during or after upgrade, you may need to roll back the deployment.

Collecting logs and helm data before rolling back can aid in investigating a failed upgrade. Expand the tile below to learn how to collect logs for troubleshooting:

Gather diagnostic logs

When working with Cortex Customer Support, you will be asked to collect diagnostic data using a Cortex tool called brain-freeze.

Install brain-freeze.
1. Download the .tar.gz file that matches the operating system and architecture where you are running kubectl.
2. Extract the file on the machine where you are running kubectl, and put the enclosed brain-freeze binary somewhere in your path.
Run the following command:

brain-freeze k8s logs --namespace cortex --timeInMinutes 1440
brain-freeze k8s dump --namespace cortex --helm-deployment cortex

This will create a data subdirectory in your current directory, containing the Kubernetes logs from your deployment from the last 24 hours (1440 minutes) and the details of your Kubernetes deployment configuration.

Step 1: Identify your last working release

Run the following command to identify your last working release:

helm history -n cortex cortex

The second to last row in the output will be the helm revision you were running before the upgrade; make a note of its revision number, shown in the first column of the table, for use with the helm rollback command later.

Step 2: Pause services

Next, pause the services that are talking to the database by setting their replica counts to 0:

helm upgrade \
  -n martin-cortex \
  cortex \
  cortex/cortex \
  -f override.yaml \
  --set app.backend.replicaCount=0 \
  --set app.worker.replicaCount=0 \
  --description "Pause to fix bad deployment"

Step 3: Restore the database to its prior state

Once the backend and worker pods have terminated, you can restore the database to the state it had before the upgrade.

If you are using point-in-time snapshots, roll the existing database back to the snapshot that you took before performing the upgrade.
If you have created a live copy of the pre-upgrade database, switch to it by modifying the Kubernetes secret to point to the live copy, for example:

# Update the secret to point to backup database
kubectl edit secret cortex-secret -n cortex
# Change the base64-encoded DB_NAME value to the base64-encoded name of your backup database; save and quit

Then, roll back the deployment to the last good revision number you noted in the previous steps:

# Perform helm rollback
helm rollback cortex -n cortex REVISION_NUMBER

Monitor the deployment using kubectl get and kubectl logs as described above.

Next steps

After rolling back the upgrade, we recommend that you:

Investigate what caused the failure
Plan for data migration if you need to attempt the upgrade again
Contact Cortex Customer Support for guidance on upgrade issues

Last updated 2 months ago

Was this helpful?