The Scorecard use cases and examples on this page are based on engineering teams across a wide spectrum of sizes and maturity levels.
Starting with aspirational goals
Scorecards are often aspirational. For example, an SRE team may define a Production Readiness Scorecard with 20+ criteria that they think their services should meet to be considered "ready" for SRE support.
The engineering team may not be resourced to actually meet those goals, but setting objective targets helps drive organization-wide cultural shifts and sets a baseline for conversations around tech debt, infrastructure investment, and service quality.
DORA metrics example
See a DORA Metrics Scorecard as an example:
Assume the following for an entity:
Its last commit was within 24 hours
There were zero rollbacks in the last 7 days
The ratios of incidents and rollbacks to deploys in the last 7 days are both zero
It hasn't averaged at least one deploy per day in the last week
In this case, the entity would have No Level.
It's passing all of the rules in the Bronze level, but failing a rule in the Steel level. Because of this, it has not achieved the Steel level, the first one that an entity can pass.
Once the entity averages at least one deploy a day in the last week, it will automatically achieve the Bronze level, since it's passing the other four rules required for that level.
This kind of gamification motivates developers to not only progress through the levels, but to maintain the quality of their entities over time.
We recommend making each level of the Scorecard achievable, even if challenging, to keep developers motivated.
You can add as many levels as you want to a Scorecard. You can also add as many rules to each level as makes sense for the Scorecard, but keep in mind that an entity must pass all rules in a given level in order to progress to the next one.
Common Scorecard use cases and example rules
Cortex users commonly define Scorecards across several categories:
Development Maturity: Ensure services and resources conform to basic development best practices, such as established code coverage, checking in lockfiles, READMEs, package versions, and ownership.
Development maturity rules
Development Maturity: Ensure services and resources conform to basic development best practices, such as established code coverage, checking in lockfiles, READMEs, package versions, and ownership.
git.fileExists("package-lock.json")
Developers should be checking in lockfiles to ensure repeatable builds.
sonarqube.metric("coverage") > 80.0
Set a threshold that’s achievable, so there’s an incentive to actually try. This also serves as a secondary check that the service is hooked up to Sonarqube and reporting frequently.
git.lastCommit().freshness = 1
Ensure that a rigorous PR process is in place for the repo, and PRs must be approved by at least one user before merging.
git.fileContents("circleci/config.yml").matches(".*npm test.*")
Enforce that a CI pipeline exists, and that there is a testing step defined in the pipeline.
Operational Readiness: Determine whether services and resources are ready to be deployed to production, checking for runbooks, dashboards, logs, on-call escalation policies, monitoring/alerting, and accountable owners.
Operational readiness rules
ownership.allOwners().length > 2
Incident response requires crystal-clear accountability, so make sure there are owners defined for each service or resource.
oncall.numOfEscalations() > 1
Check that there are at least 2 levels in the escalation policy, so that if the first on-call does not acknowledge, there is an established backup.
links("runbooks").length >= 1
Create a culture of preparation by requiring runbooks to be established for the services or resources.
links("logs").length > 1
When there is an incident, responders should be able to find the right logs easily. Usually, this means load balancer logs and application logs.
embeds().length >= 1
Responders should have standard dashboards readily accessible for every service or resource in order to speed up triage.
custom("pre-prod-enabled") == true
Use an asynchronous process to check whether there is a live pre-production environment for the service or resource, and send a true/false flag to Cortex using the custom metadata API.
sonarqube.metric("vulnerabilities") < 3
Ensure that production services are not deployed with a high number of security vulnerabilities.
Operational Maturity: Monitor whether services are meeting SLOs, on-call metrics look healthy, and post-mortem tickets are closed promptly, gauging if there too many customer-facing incidents.
Operational maturity rules
oncall.analysis().meanSecondsToResolve < 3600
Make sure that issues are resolved in a reasonable amount of time. If they’re not, you can dig into the root cause.
oncall.analysis().offHourInterruptions startOfMonth(-3)")< 2
A reliable service or resource should not be a source of frequent customer-facing incidents.
jira.numOfIssues("labels=compliance") < 3
Make sure there are no outstanding compliance or legal issues affecting the service or resource.
snyk != null
The first step in monitoring security is making sure each service has as associated Snyk project.
git.lastCommit().freshness 0
Making sure each entity has at least one owner helps ensure updates don't fall through the cracks.
git.numOfRequiredApprovals() > 0
Changes should be pushed through unless there is at least one approval.
sonarqube.metric("coverage") > 70
By monitoring code coverage, you can get a sense of how much of your code has been tested — entities with low scores are more likely to be vulnerable to attack.
git.branchProtection() != null
Make sure that your default branch is protected, as vulnerabilities here are critical.
sonarqube.freshness() < duration("P7D")
And check to make sure a SonarQube analysis has been uploaded within the last seven days, so teams are monitoring for compliance to coding rules.
snyk.issues() < 5sonarqube.metric("security_hotspots") < 5sonarqube.metric("vulnerabilities) semver("1.1.3")
Having every CI pipeline send a current version to Cortex on each master build lets you catch services or resources that rely on outdated versions of tooling, like CI or deploy scripts.
package("apache.commons.lang") > semver("1.2")
Cortex automatically parses dependency management files, so you can easily enforce library versions for platform migrations, security audits, and more.
Best Practices: Define organization-wide best practices, such as infrastructure + platform, SRE, and security. For example, the Scorecard might help you ensure the correct platform library version is being used.
Best practices are unique to every organization and every application, so make sure to work across teams to develop a Scorecard measuring your organization's standards.
The following example uses JavaScript best practices:
JavaScript best practices
Best practices are unique to every organization and every application, so make sure to work across teams to develop a Scorecard measuring your organization's standards.
The following example uses JavaScript best practices:
git.fileExists("yarn.lock") or git.fileExists("package-lock.json")
Make sure a Lockfile is checked in to provide consistency in package installs.
git.fileExists(".prettierrc.json") or git.fileExists(".eslintrc.js")
Projcets should have a standard linter.
jq(git.fileContents("package.json"), ".engines.node") != null
Node engine version should be specified in the package.json file.
jq(git.fileContents("package.json"), ".devDependencies | with_entries(select(.key == \"typescript\")) | length") == 0 or git.fileExists("tsconfig.json")
Typescript projects should have a tsconfig checked in.
jq(git.fileContents("package.json"), ".engines.yarn") == null or jq(git.fileContents("package.json"), ".engine.npm") = "please-use-yarn"
If a project is using yarn, it should not allow NPM.
jq(git.fileContents("package.json"), ".engines.yarn") == null or !(semver("1.2.0") ~= semverRange(jq(git.fileContents("package.json"), ".engines.yarn")))
Finally, ensure that the yarn version being used is not deprecated.