Analyzing Metabase Usage

I am the head of data at Reforge (we’re hiring!) and data is critical to our success. We use it to track our business, individual teams leverage it in their work, and we believe in empowering people in the company with data to help them do great work. We use all sorts of great tools, but as the head of data I felt it was important to understand whether people were actually using our BI (business intelligence) tool.

I built a weekly report that lays out how my company is using our BI tool, Metabase. It has the following reports:

  • Dashboards created in the past 7 days
  • Questions created in the past 7 days
  • Individual Dashboard views by our exec team in the past 7 days
  • Dashboard view counts by user in the past 7 days
  • Activity by my boss (our CTO)
  • Metabase MAUs
  • Metabase WAUs

It helps me to understand at a high level what is happening inside the company from a data perspective:

  • Who is building reports (questions / dashboards)
  • What questions people are asking of our data
  • What is most important to our exec team
  • Who is looking at information on a regular basis
  • What is top of mind for my boss
  • How engaged our company is with our data & analytics

It was surprisingly easy to set this up. All we had to do was add the Metabase internal database as a source to Metabase (sounds confusing, but actually quite simple). It wouldn’t shock me if this became a paid feature long-term for Metabase on their enterprise plan, but it took less than 30 minutes to figure out their data model and build some quick and dirty SQL questions that produced answers to our most important questions. Metabase is already storing all of the data we needed to visualize to get answers to our questions.

I set this up as a dashboard subscription so it is delivered to my email every week and I can quickly scan it to understand what’s top of mind for the org.

Reforge is Data Driven

Since the beginning of the company we’ve looked at data to understand how well we’re tracking against key goals for the company (revenue, applications, content completion, weekly activity, etc). These stats span across the entire company in terms of usefulness (support, sales, marketing, product, engineering, design, finance, etc) and we strive to make as much information accessible to everyone so that they can be data informed.

If you’re considering joining a company – do you want to be at one that uses data to help inform decisions? You can see below that Reforge employees are using data on a regular basis and you can see how it scales as we’ve grown the team and hit key milestones. We’ve typically seen 75-100% of employees leveraging Metabase on a monthly basis.

Our Metabase monthly active users along with company milestones

The queries that we use to power the charts:

Dashboards created in the past week

select created_at, name,, left(description, 40) as description, 'https://BI_HOSTNAME/dashboard/' || as url
from report_dashboard d
    inner join core_user u 
        on d.creator_id =
where created_at > current_date - interval '7 days'
order by 1 desc

Questions created in the past week

select created_at, name,, 'https://BI_HOSTNAME/question/' || as url
from report_card d
    inner join core_user u 
        on d.creator_id =
where created_at > current_date - interval '7 days'
order by 1 desc

Exec team dashboard views in the past week

For this report, I wanted to know what the exec team was looking at in the past week. I want to know how much they’re looking at the metrics and whether they’re looking at the dashboards I think they should be looking at.

select,, count(
from view_log v
    inner join core_user u
        on v.user_id =
    inner join report_dashboard d
        on v.model_id =
where model = 'dashboard'
    and in 
    and v.timestamp > (current_date - interval '7 days')
group by 1, 2
order by 3 desc
limit 200

Dashboard views leaderboard (views in the past week)

This shows the number of views by user in the past 7 days. It’s always a good sign when your head of data isn’t the most prolific viewer of dashboards.

List of users and their views
select, count(
from view_log v
    inner join core_user u
        on v.user_id =
    inner join report_dashboard d
        on v.model_id =
where model = 'dashboard'
    and v.timestamp > (current_date - interval '7 days')
group by 1
order by 2 desc
limit 200

Most popular dashboards by views

select, count(
from view_log v
    inner join core_user u
        on v.user_id =
    inner join report_dashboard d
        on v.model_id =
where model = 'dashboard'
    and v.timestamp > (current_date - interval '7 days')
group by 1
order by 2 desc
limit 200

Questions created by a single person

select created_at, name,, 'https://BI_URL/question/' || as url
from report_card d
    inner join core_user u 
        on d.creator_id =
where created_at > current_date - interval '28 days'
order by 1 desc

Metabase MAUs

select date_trunc('month', timestamp), count(distinct user_id) 
from view_log 
group by 1
order by 1

Metabase WAUs

select date_trunc('week', timestamp), count(distinct user_id) 
from view_log 
group by 1
order by 1

Email Cohort Retention

I have used and love behavioral analytics tools like Amplitude, Mixpanel, Heap, and Pendo. They’re life-savers if you’re a product manager, marketer, designer, analyst, or engineer focused on improving the product experience. If I was dropped into any company’s product management team, it would be one of my initial asks: point me to your data system and let me understand your metrics. Last year as I was helping to launch an email newsletter, I wanted to leverage the same type of analyses I did for products, but for email. I spoke with a couple of experts in the email industry to get ready to understand what to measure, and they told me to:

  • Monitor my engagement metrics by email provider
  • Remove non-engaging contacts from our email distribution list
  • Monitor my long term retention of cohorts of contacts

These felt like classic behavioral analytic problems in the product space, but email focused. I assumed that somebody was enabling this kind of analysis for the email, space, right?

Nope. I worked at HubSpot for five years, and I have so much respect for that product team. They’re badass, plain and simple (crazy smart, humble, and get stuff done). They built some simple features to answer some of these questions, but don’t provide retention across all of your email campaigns. Does mailchimp offer anything like this? Nope. What about AutoPilot, the company we were using when I joined Reforge? Nope. I did a quick search and I didn’t find any company that provides this type of feature.

One of the core things we teach at Reforge is that retention is king – it makes or breaks your company (acquisition, monetization, payback period, competitive advantage). So I set out to measure it.

It was pretty simple, once I got the pieces working together:

  • I turned on the AutoPilot source in Segment, and piped the data to our data warehouse. Luckily we’re not Amazon, so a simple postgres database easily housed the data for this new product.
  • I turned on the Sendgrid source in Sendgrid, then spent weeks going back with Segment’s support department figuring out how to properly configure webhooks so email activity data flowed into our data warehouse.
  • I wrote a Jupyter notebook that bucketed contacts into their weekly subscriber cohorts and then built retention heatmaps based on the email activity data from both our email marketing system and our transactional emails.
  • I ran a script that queried DNS for a domain’s email provider so I could segment the retention curves by email provider (g-suite, microsoft, aol, yahoo, etc).

The outputs looked like this (non-segmented charts):

This helped us to answer key questions like:

  • What percentage of our subscriber cohorts were active N weeks after they subscribed?
  • Did we have a sticky email newsletter? Did people still around long term?
  • Would we be able to sustainably grow our subscriber base over time, if we were able to keep acquisition constant / grow it over time?
  • How did our retention curves look by email provider?
  • Who were our most prolific consumers (forwarding emails to others, consuming regularly, etc)?
  • Who should we be removing from our distribution lists (so that the email providers weren’t hurting our sender and reputation scores)?

It made me ask myself, why don’t email companies provide this kind of functionality? Some thoughts:

  • Mailchimp, HubSpot, and companies like it are focused on all of the other aspects of email: helping people design emails, setup automation, and measure individual campaign performance. The bigger problem is not having enough contacts to email in the first place, not having a well designed email, or wanting to analyze a single campaign rather than look at the health of an entire contact database.
  • Cohort analysis is not something many people find intuitive, and is a relatively advanced topic. There are still many product teams that don’t measure it, and I expect it’ll come to marketing tools eventually.
  • This is a big company problem, and they’ll end up writing custom software to solve it for themselves. For everyone else, this isn’t a must have.

Is there some easier way to do this? Is there a company that enables this? Let me know, I’d love to use their less-buggy code. I am trying to clean up the code so it’s half respectable and will try to post when I can.

Accounting for Growth

You’d be crazy not to measure your active users, at the appropriate frequency for your app. That may be daily, weekly, monthly, or yearly. (DAU / WAU / MAU).

As you monitor these key metrics, you’ll want to understand how and why it’s changing. I am a big fan of how Jonathan Hsu breaks it down in accounting for growth, breaking out the components that add or subtract from your top-level numbers. Understanding the dynamics that contribute to your growth (or lack thereof) are critical for members or your team to understand what drives success. Not just the PM that owns the feature, but many of the team members.

I’ve had conversations with PMs at many companies who may know their top-level numbers, but don’t have a good handle on why they’re going up or down. As Jonathan outlines in his post, two very different businesses can have the same MAU numbers, but one is much better than another.

There are a couple of scenarios I think are important to understand:

  • If your retention rates are poor and new signups are helping you grow:
    • When will your growth rate flat line?
    • Could signups go down? What would happen if your growth flat lines or if your user base shrinks?
  • If your active user numbers go up:
    • Is that because of a one-time increase in new signups? Do you expect that to continue?
    • Is it because of resurrecting users (making dormant users come back)? Is this a sign of a pattern you could invest in making stronger? Why do people stop using it in the first place?
  • If your active user numbers go down:
    • What component decreased? Was it isolated to resurrected users / signups / retaining existing users? If it was isolated, was there a product issue that contributed to it?
  • Are you soliciting feedback from each of these buckets of users? There is gold to be mined by segmenting and surveying the users with the most valuable feedback.

In addition to knowing your high-level active user numbers, you should know why you’re growing. It’s helpful to understand what the future looks like if things stay the same or worsen. Even if you know, does the rest of your team know? When everyone understands how your product and business will be successful, they’ll have the context to more effectively make the critical decisions in building tech products.

If you’re looking to get started with this type of analysis, Jonathan Hsu’s 8-ball analysis or the lifecycle feature in Amplitude is extremely helpful.

Flying blind: not setting or measuring product metric goals

I love building new products. Ever since I was building junky web apps as a geeky high schooler, I always get excited the first time something actually works. It has always felt like magic. Now that I’m older, I increasingly feel the pressure of showing my impact. After the initial euphoria passes, I now immediately measure the metrics that represent success. Something that has been bothering me lately is that regardless of your methodology (waterfall, agile, scrum, burndown, trello anarchy, etc), I never hear others talk enough about product success metrics.

When I joined HubSpot I learned from many others about behavioral analytics. Sadly, I find myself constantly fighting responses when I speak with friends in the industry such as:

  • “We forgot to add tracking”
  • “We want to ship it and see how it does”
  • “We don’t have any specific goals for this release other than to improve the design”
  • “What should we measure?”
  • “We can’t afford to use behavioral analytics, it’s too expensive”

This is how I want to react every time I hear one of those answers:

A guy in a panda suit breaks a computer on someone's desk

Just kidding. I am always asking questions to understand the rationale so I can try to help add perspective.

These are the tough questions I want to ask in response:

  • What’s more expensive? A behavioral analytics system or shipping the wrong features / wasting the time of your product and engineering team?
  • If you hear feedback from a couple of customers, is that representative of all users?
  • How do you know that the users are actually doing what they say they’re doing?
  • Do you think you’ll get a team’s best work if the only goal is to release their work?
  • What do you think will garner more resources in the future? “We improved the experience, just look at it!” vs. “I increased conversion rates of signup to value by 10%, with an expected lift in revenue of Y”.

I don’t think you need to spend weeks off in a corner crunching numbers to come up with the answers to these questions. My suggestion is to spend 30 minutes thinking about a goal, why you’re working on something, and then a simple mechanism to measure success.

I push teams to answer these questions:

  1. What represents success for this release/feature?
  2. What is the current baseline?
  3. What is the hypothetical ceiling of improvement?
  4. Given the baseline and ceiling, how much do you think you can improve the metric?
  5. What will be the mechanism to track success/failure?
  6. When should you evaluate progress?

You don’t have to be super fancy and build Excel models, but at least spend 15-30 minutes thinking through the basics for a new feature. Regardless if you’re building something brand new or iterating on an old feature, I always think it’s worth considering the above questions.

As the saying goes, “if you can’t measure it, you can’t improve it”.

Segment Your User Base: Depth of Engagement

If you haven’t read Jonathan Hsu’s 8 part Medium series on Social Capital’s diligence process, add it to your reading list. I didn’t immediately grok all of the concepts in the post, but it has had an incredible impact on how I look at product metrics.

It appears it’s a big part of their recent announcement of how they are able to fund early-stage companies focused exclusively on their metrics.

One of the concepts that struck me was the depth of engagement. It shows you how engaged different portions of your user base are. You don’t need a ton of fancy data science techniques to get a glimpse into what your user base is doing. All you need a fairly straightforward SQL query to get you started.

It starts with a fairly simple concept: how many users are active for 1 day in the past month? How many are active for 2 days in the month? It’s really simple to generate a histogram (this is fake data) that looks like this:

Count of Users by Days Active

In this fake example there are 100k monthly active users (MAUs) in this hypothetical product. I think this is very telling and interesting from a strategy and operational perspective, but there’s a different view that I now prefer. I prefer to look at this chart on a percentage basis (the % of MAUs), and look at it cumulatively. This is what it looks like:

CDF of Monthly Active Users by Days Used

How to read this chart: 33% of the MAUs are active for a single day of the month. It may be the first day of the month or the last, but the people that fall into this bucket were only active for a single day in the month. 53% of the MAUs were active for 2 or fewer days – you add up the 33k and the 20k from the histogram to get the 53%. In Jonathan’s example there’s a little bit of a spike of users that are active every day of the month – in a bunch of the examples I’ve seen in the B2B space there’s a nice healthy bump around 20 days, which makes sense when you consider that B2B apps are most likely used every business day, rather than every day.

This is a powerful way to slice up your install base very quickly. I push for taking the MAU install base and slicing it up into types of users. Here’s a hypothetical set of groupings:

  • Low engaged users (66%): 3 days of activity or less
  • Medium engaged users (14%): 4-10 days of activity
  • Highly engaged users (10%): 11+ days of activity

There are a bunch of plays that I could see happening for each of these buckets:

  • Sales: I could see sales following up with customers that fall into the highly engaged bucket. If they’re free, I could see them seeing value in paid tiers of your product. If they’re already paid customers, they are probably the most likely bucket to see value in additional paid options.
  • Services: I could see customer success reaching out to the low engaged bucket to understand why they aren’t using the tool more frequently. In a B2B company where customer success is focused on retention, this is an area of high potential churn.
  • Product: I could see the product team looking to build features that address the missing functionality users need to use it more. They could also work on retention hooks that pull users back into the product / get them to see more value in the tool.
  • Marketing: I could see the marketing group targeting users based on the bucket they fall into and how they might see value from additional features.

If you’re interested in doing this yourself, check out this Jupyter notebook for sample code.

© 2024 Dan Wolchonok

Theme by Anders NorénUp ↑