Scorecards: a way to set goals and track progress when all else has failed.

Jan 10, 2023

TL;DR:

Scorecards are a way of setting goals and tracking progress…
1. before you’ve launched (and no one is using your product,
2. before you’ve got regular product metrics stood up, or,
3. when you need to aggregate progress across a broad portfolio where there’s no unifying metric that everything ladders up to.
Approach:
1. Create a table that captures the key dimensions of what you need to deliver, and create a scoring system (e.g. 0, 1, 2, 3) for each cell.
2. Compute your “score” by summing the value of the cells — and track that score over time.
3. Set goals based on increasing the value of the scorecard over time.

The best way to measure the impact and performance of products is through metrics. Ideally, we define the goal (the real world outcome we want to create) and then define a metric which represents the closest operable proxy for tracking your progress to achieving that goal.

This often means metrics like monthly active users, messages sent, impressions, retention rates, engagement rates, transaction rates, or revenue run rate.

But to have metrics like this you need two things which are not always possible: 1) a product that people are using and 2) an implementation of the metric itself!

What if your product is pre-launch; there’s no one using it yet? How might you track your progress in getting it ready to launch?
What if measurement takes a long time to setup and validate? How do you track your progress to having logging setup, pipelines running, curves calibrated, movements understood, and dashboards rendered?
What if your product space is diverse, and you have many metrics which aren’t directly comparable, or don’t ladder up to a single top-line value (e.g. revenue)?

How might we track progress in these cases?

One tool I’ve found very effective in these cases is the scorecard.

Scorecards are a way of tracking progress over time that’s granular enough to goal against when:

you don’t (yet) have traditional product metrics (e.g. user activity like visitation, clicks, impressions etc) setup, perhaps because you’re pre-launch or still need to build out those metrics, or;
you need a way to aggregate progress across a broad portfolio where there’s no single unifying metric that everything ladders up to.

I’ve used this framework again and again across multiple projects, and it’s worked for me both at the feature level (one PM, ~6 engineers), all the way up to the org level (~20 PMs, 180 engineers).

Let’s get into some examples!

Example: Workplace Invites

I worked for a time on something called Workplace, an enterprise communication and collaboration tool. We were working on a version that could be organically adopted by users and teams self-serve, rather than being setup and deployed by a company’s IT department. This meant we need to make it easy for people to invite their co-workers by email address.

Workplace was based on Facebook (Post in Groups in Feed, plus Chat), and a Workplace “instance” was essentially an empty, private version of Facebook for you and your company.

But Facebook, has never really been empty. All the inherited user invite typeaheads (e.g. add user to a group) were built to let you search for one of the 2+ billion existing Facebook users. There were some big product gaps:

We didn’t have a way for people to invite coworkers via email,
we didn’t have a way to send those invites with secure links,
we didn’t have a way for the recipient to follow the invite link, sign up to the same instance as their inviter, and
we didn’t have a way to notify the inviter that their invitee had accepted the invite and joined the instance - critical to enable them to then communicate with each other and built trust in the reliability of the product, especially as it was generally an async process.

To make matters worse, there were invite surfaces (invite to the whole Workplace instance, or invite to a specific group) and there were multiple platforms (web, iOS, Android, m-web etc). Multi-platform support was critical as the invitee could be on any platform, and we wanted to make it easy for them to join AND easy for them to invite the next wave of coworkers.

Not having a bulletproof invites mechanism is clearly a showstopper for a product that relies on organic growth, so we considered these gaps launch blocking.

Our goal was to get XXX,000 users using the product by the end of the half, but we couldn’t even get our FIRST user until we’d made it possible to invite their co-workers in the right places.

So how can a team measure it’s progress toward this primary goal when we have all this work to do before we can even start counting users?

Quick reminder: why goals matter.

They unify a team (or org) around a common objective. Everyone can (or should!) understand what they goal is, why it matters, and how their work (as a team, and at an individual level) ladders up to achieving that objective.
Tracking progress towards goal is important because it helps everyone have the same understanding as to if they’re on track to meet it. And everyone having that same understanding increases your chance of success.
1. Read some more of my thoughts on the importance of how to clearly track progress toward a goal in The Perfect Chart.

Back to the example: Enter the scorecard

So we had multiple invite sending and redemption experiences to build, on multiple platforms, so we needed a way to track granular progress towards shipping all those features before we could welcome our first user.

We created a table with rows for invite type, and columns for platform or notification type. Then we started tracking the state of each cell in our daily standup. In the first, simple version, we’d turn a cell green when the code was in production and had been tested and passed by QA. Done done.

Here is a screenshot of the scorecard as it was on 28th Jan, 2017…

And here’s the scorecard just three weeks later…

Look at all that sweet, sweet progress (note the extra mWeb column, we realised we also needed to add invites on mobile web too as as in testing, lots of users were opening the invite links on their phones before they’d got the app installed.

The team’s goal was simple: turn every cell green by the end of Q1 2018.

I, as the PM, kept track of this in that well known project management tool: Keynote! (It was literally a keynote slide).

We had a printout of this on our team’s whiteboard (remember, this is pre COVID when we were all in the office).
In the daily standup, we’d use a sharpie to update cell state. After, I’d go back to my desk, update the Keynote deck, and print out a new version ready for tomorrow.
We’d attach a screenshot of the scorecard to our team’s weekly status reports so our leadership team could see EXACTLY where we were at.

Modifier #1: Tracking progress to goal

This was great for capturing state, but what about progress? How do we know if we’re ahead or behind where we need to be?

Solution: track the SUM of the scorecard over time!

There’s 28 cells, the goal is to go from 0 green to 28 green by the end of the quarter. By just counting the number of green cells, we can plot a chart over time in Excel…

Following The Perfect Chart’s rules - the x-axis was pushes out to Q1-end from the start - so the whole team could see how much time we had left.

Yes, this is basically a burndown chart in reverse, but instead of tracking “tasks done”, we are tracking functional milestones reached. Progress, not motion.

Modifier #2: Getting more granular

One of the big drawbacks of this basic approach was a cell only goes green at the end of the design→build→ship→validate pipeline. There was no way to visualise or reward progress through that pipeline.

A modification (used on later projects) was to track granular state of each cell. Instead of a binary red/green, we could have used a framework like:

0: Cannot invite a user from this platform.
1: We have designs, we’re ready to code.
2: Code as been committed and pushed.
3: Code is validated as working and user-ready (done!).

Then the scorecard would have looked something like this:

Modifier #3: Weighting

Another limitation of the framework for this project was that each cell had the same significance: shipping email invites on iOS was given the same value in the system as email invites on mWeb, despite the former being nearly an order or magnitude more important than the latter.

A solve here is to add weighing. Lets say iOS and Android each have 5x the users of the other platforms, just 5x the scores for those columns. This should incentivise the team to focus on the most impactful work first

Hopefully you can see how a simple scorecard, and tracking the sum over the cells over time turned something hard to measure (are we on track to launch?) into a clear goal, and clear way to track progress over time.

Scorecards for Managing a Portfolio

In the example above, we were using a scorecard to track the progress of a pre-launch project — our journey to build the features necessary to launch.

But scorecards can also help when you’re managing a portfolio of efforts, where there’s no single unifying metric.

Trust and Safety is a good example. These kinds of problems are notoriously hard to measure.

Variable measurement maturity: Prevalence measurement (a metric that tracks “how much bad stuff is there out there” - the gold-standard of trust and safety measurement as used by Meta and YouTube) is hard, expensive, and time consuming to stand up. It requires intelligent, representative sampling; Labelling (for which you need many many humans, trained in the details of your policies); And constant iteration in response to shifts in adversarial behaviour. Some areas might be mature in their measurement, others may be nascent or non-existent.
Lack of a common currency: The measurement of different integrity problems is not often comparable. Viewership harms (e.g. hate speech, nudity, some forms of bullying) are denominated in impressions/views. Other harms like fake accounts are often counted as a % of active users. Interaction harms like scams and grooming are often counted in terms of unique victims. Representation harms might be counted through discoverability measures e.g. % of violating search results. You can’t just add these numbers together like you can when multiple teams are driving a revenue ($$$) number. There’s no universal currency.
Multiple apps/platforms/services. If you run a suite of products, you might need to understand how you’re doing on each of them — another dimension of complexity.
Relative weighting: Some harms are more severe than others which often needs to be factored into goaling and prioritisation.

Again, the idea state is that all harms of all forms and severities would be combinable into a single metric that represents “how much bad is there” and “have much less bad is there now vs last quarter” — but this is just super super hard to do.

The scorecard could help here too.

Imagine giving each of your areas a score based on their measurement maturity - say “0” for unmeasured (the default), “1” when basic tracking is in place, “2” when a data-science-approved prevalence metric exists, and “3” when the team has proven they can repeatably move that metric as they make changes to detection, enforcement, or content distribution systems.

You could then make statements of the form:

Across the portfolio,
3 areas are still at 0 (unmeasured), -20% y/y.
5 areas are at 1 (basic), +40% y/y.
6 areas are at 2 (measured), +50% y/y.
15 areas are at 3 (mature), +0% y/y

You could then set goals of the form:

“In H1, we’ll move 3 areas from 0 (unmeasured) to 1 (basic), and 5 areas from 1 (basic) to 2 (measured)”.

This turns something extremely complicated, into a form that more senior folks (who can’t be involved in the details) can easily understand and rationalise.

I’ve seen this technique used in multiple product domains - and it’s highly effective for understanding state, and tracking progress across a diverse portfolio of related but distinct efforts.

Again, the ideal state is where each team has a metric that’s a close, operable proxy to their goal, and those granular metrics ladder up to a single, universal metric that represents the ultimate goal for your org or company.

But when that’s not realistic or possible (yet!), the scorecard is here to help.