Building Great Metrics is Hard

Aug 18

Read Time: 7 Minutes

I spent 2 years building a custom analytics platform for a client. We were given the general framework and the freedom to build mostly whatever our client needed. With a blank slate, we ended up building a few HUNDRED metrics.

A big reason for this large number of metrics is that the platform contained a wide variety of data. From sales data to customer loyalty data, spanning different regions with tens of thousands of products. It adds up quickly.

I learned from that time that metrics need to be carefully considered and defined, and the assumptions need to be out in the open. For our platform, we had a master document that outlined all of this, which became increasingly important as time went on. We referred to it constantly.

The assumptions ended up being more important than I expected. Clearly defining them was necessary to keep metrics measuring what we intended. Thankfully during my undergrad in economics, I learned about the term “ceteris paribus”, which is used to say that we’re holding all else equal with our assumptions. This is important in analytics because we’re making assumptions and need to be aware of the effect of those assumptions on our metrics. A changing world affects us a lot (ever tried to use data from COVID for decision-making? You know what I’m talking about).

If you’re measuring ice cream sales, holding the weather constant would be important. We all know the lineup for an ice cream shop grows long as the heat beams down on us. That’s why ceteris paribus is important!

But how does that apply to building metrics?

We need to think about what we’re measuring and what else can affect our metrics to be confident that we’re measuring what we intend!

One of the things that is never constant is time. Time continues to crawl forward and as we get further away from the moment we created these assumptions, we need to be increasingly aware of them.

Let's dive into an example to make this more clear. Fanny’s Fancy Photos and Editing is a Software-as-a-Service (SaaS) company that allows you to upload a photo, edit it, and download the edited version. You also create an account to keep track of all your previous photos and edits.

If we were in the first few months of the company and wanted to measure growth, we would likely pick a simple metric like Unique Logins per month or week. In the beginning, this is great. As Unique Logins ticks upwards, we know that usage is growing!

But as the company matures, our metrics need to also.

Are all Logins considered equal? If we dig into the data, we’ll find that some people log in and only have a 5-minute session while others have a 4-hour session. Should we consider these as equal?

Early on we did, and that made sense. We simply wanted to get our product in front of more people. But now that the business is maturing, it’s worth developing a metric to identify power users. Those who love our product are most likely to want more in-depth features (that we can charge more for ;) ).

So we start to develop a power users metric. Being data analysts, we want this decision to be informed by data. The simple place to start is looking at some metadata:

How many logins do we have in total?

What’s the average session length and how does that compare to the median? Really we should look at the distribution of session times and see if we can identify a session time that we consider an outlier but that a chunk of users get above.

If our average session time is 5 minutes and 5% of our users average 2 hours, we can start with that as our cutoff. This gives us a group of customers to segment our data on and look at differently.

We could also create an average session time per login metric to give us more information on how the product is being used. But with that metric, we need to think back to ceteris paribus.

Are we changing anything about logins?

Many products allow a user to stay logged in for 2 weeks at a time - if some of our users do that and others log in each time, we’ll have metrics that don’t measure what we expect. Instead, for some users it’ll be session times per login, but for others, it may be session times per 2-week periods!

So we need to find a way to normalize this across our users. An easy thing to do would be to simply count the number of sessions we have and ignore logins. Assuming each session is independent this would be correct. But then we hit the issue that some folks may work for 2 hours, take a 15-minute break, and work another 2 hours. Should that be a single, 4-hour session or two separate, 2-hour sessions? It may seem like a trivial decision, but in the latter option, our average session time is HALF of the former.

Generally speaking, this sort of thinking continues and this is when I bring in experts from the business. I also go digging in the data to see if there’s some sort of indicator of a session ending, such as downloading an edited photo. The definition of a session is something to carefully consider, too!

How we end up defining the metric is important, but almost as important is stating the assumptions we made and how they influence the metric. Doing this allows other people to understand how they may be influencing a metric without realizing it, such as adding in a feature that recommends downloading 10 images at a time instead of each one as the editing is finished. Our user behaviour would change drastically as a result of that feature, causing longer session times than previously. Our job as data analysts isn’t to stop the business from making these changes but to ensure they’re aware of how that affects the metrics they’re tracking.

Eventually, we come to some sort of agreement on the definition. There’s always going to be people who are unhappy with the definition and that’s to be expected, especially if you ask more than a few people. For that reason, I try to ask multiple opinions but I keep the decision-making to a small group, usually 1-3 stakeholders and myself to ensure we’re measuring what we think we are. I also take the time to ask other data folks their thoughts, especially those who are close to the data.

Another aspect of building good metrics is ensuring they measure what we think we’re measuring. With our above example, we have a wealth of data. But what if we don’t? At times, we need to use a proxy because we can’t measure the exact thing itself. Like seeing how fast ice melts to check the temperature outside instead of directly measuring with a thermometer. The thermometer is better if it’s available, but the ice at least gives us an idea of how things are.

I’ll use a selfish question as an in-depth example here - is my LinkedIn content valuable? In an ideal world, I would have a metric that tells me this directly. Preferably on some sort of scale, like the thermometer. But that doesn’t exist, so how can I decide if it’s valuable?

I can look at a proxy metric or, more often, a few proxy metrics. For me, this is a mix of New Followers within 24 hours of a post (because I rarely post twice in 24 hours). I also look at reach, so how many likes came from outside of my 1st connections?

These don’t give me a clean number, and that’s often the hard part of data analysis. We need to make assumptions and can sometimes only measure success against past success, meaning we look at the same metric to see if it has improved. It would be hard to say that gaining 10 followers means a piece of content was twice as valuable as one that gained 5, but if it was 100 vs 5, that’s likely a sign that one was of higher value.

Of course, we need to be critical thinkers here. I know that if I post an entry-level role that pays $1,000,000 and is fully remote then I’ll probably get A LOT of followers, and that’s where we need to stay vigilant. A scenario where this WOULD mean more value is if I only post new roles. Then I can compare 1 post to another (ceteris peribus). If I went from 10 to 5,000 followers because of my first post and then the second one was shown to those 5,000 instead of 10, I’m not holding things constant. So comparisons would be incredibly hard and we’d need to take that into account.

Proxies get complicated quickly and need to be looked at closely because of that. While it could be argued that all metrics are proxies (because “growth” can mean more than just new users) it’s clear that some metrics more closely measure what they attempt to than others.

Metrics have been a thoroughly enjoyable part of my career. Creating one that closely measures an expected outcome is incredibly satisfying and powerful. It’s something that I hope to continue doing for a very long time. And you never know, YOU could be the next inventor of the thermometer in your company, giving them the ability to stop watching an ice cube melt. More ice cubes for after-work socials!

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Whenever you’re ready, here are 2 ways I can help you:

1. Data Analyst Launchpad - My course on how to build a resume and cover letter that gets results. I share 7 years of data analyst experience, including interviewing and hiring for most of 2023.

2. A Coaching Call - If you’re struggling with applications or want to level up your skills, I’ll give you a plan to get there and the resources that I’ve used to get there myself. If you’re unsure if I can help, send me a message and I’ll provide any guidance I can right then and there.

Dylan Deppiesse

Building Great Metrics is Hard

Senior Data Analyst’s Guide to Landing a Data Analyst Role

Think Like a Senior Data Analyst