I’ve written about how we use Metabase at Reforge, and how I’m a big fan. It has allowed us to make data accessible to anyone in the organization, whether it’s for a deep analysis or for a quick status update on an important initiative.
We use Segment as one of the key pieces of data our data infrastructure, and I recently turned on the Stripe integration. I was pleasantly surprised by how well it works. I authenticated into our Stripe account to configure it as a source within Segment. Segment then updates tables in our data warehouse with the latest and greatest data from Stripe. This is a screenshot of our database and the tables under the Stripe schema:
On one of our key dashboards showing our progress in generating revenue, I wanted a cumulative revenue chart. This is helpful to see how quickly we’re generating revenue, what our total revenue in a period of time is, and how our revenue growth compares to previous periods.
This is a sample chart that’s easy to setup in Metabase:
To generate it, this is the SQL I wrote to generate it:
The way it works:
The first statement generates a table (that’s the generate_series function) that has a start date and end date
I join from that table with a left join to the stripe payments table. This ensures that if we don’t have a day of revenue, the day still shows up in our table. This table gives me a total amount of revenue per day.
In the final part of the statement, I use a window function to do a cumulative sum of all of the revenue per day, so that day 2 has all of day 1’s revenue along with day 2’s revenue added in.
I hope this helps you create similar charts of revenue per day or cumulative revenue reports. I’ve worked with teams where we’ve manually hit the Stripe API to pull all of this information, which is always a pain in the butt. The nice part of this solution is that Segment keeps the table up to date automatically, and then anyone can run these reports on the latest values in the data warehouse.
While we’re still an early stage company, the type of integration is super powerful. There are a lot of interesting applications of being able to access failed charges, successful charges, refunds, and the various types of paying customers when joined to other data sources.
When I joined Reforge a year ago, I found that we were querying our databases manually to do routine analysis. If we wanted to update the team on the number of people who had applied or paid for our programs, we’d run a query against the database and then put the results in a spreadsheet. If we wanted a list of users from our programs by company, we’d run a query and put it in a spreadsheet. While this answered our questions at the time, I felt like we could do a lot better. After having used Looker in my time at HubSpot, I wanted a lightweight solution to help us enable the entire company to have access to critical data about the business and make data-informed decisions. We started using metabase.
Metabase has been a huge help for me in democratizing access to our data. Metabase connects directly to any databases you want, and it allows anyone in the company (I have chosen not to set up advanced permissions yet) to manually inspect data, do advanced reporting, or view dashboards.
This is an example of what it looks like when someone looks at our program applications table (simple database table that keeps track of applications to our growth programs):
Team members can look at the table as if it were a spreadsheet
They can apply filters as if it was a spreadsheet
They can visualize the results more easily than in a spreadsheet
I routinely build reports in metabase that filter to people with a certain condition and send it to teammates. It’s so easy to report on people who work at company X that are in participating in one of our programs. Much better to generate a simple report and then share it with a colleague knowing that it should always be up to date, even if our underlying data is updated.
You can also easily switch between a table view, and many other ways to visualize the data (table, line chart, area chart, bar chart, row chart, scatter chart, pie chart, and a map):
Once you’ve filtered your data set and chosen how to visualize it, it’s easy to then add it to a dashboard of other reports. It’s really nice that you can combine data from multiple databases into the same dashboard, and drag and drop the charts in any configuration you want. This is a dashboard that I setup to monitor the performance of applications submitted to our recent cohort of programs, as well as how people were paying for their spot. It has a segmentation of which programs they’re applying to, how much revenue we’re generated, how we’re comparing to previous periods, and where people are submitting applications from:
There are a ton of other features that I am a huge fan of. Some of them:
Posting questions to slack at a regular interval:
Or via email:
It has been a huge help for me personally, and this doesn’t even cover all of the ways in which we use it. Best of all, it’s free and open source. We pay to host it ourselves via Amazon Elastic Beanstalk.
This kind of solution comes in incredibly handy in our overall data pipeline, especially when we can point it to a copy of our production database and our analytics data warehouse that is populated by Segment.
I have used and love behavioral analytics tools like Amplitude, Mixpanel, Heap, and Pendo. They’re life-savers if you’re a product manager, marketer, designer, analyst, or engineer focused on improving the product experience. If I was dropped into any company’s product management team, it would be one of my initial asks: point me to your data system and let me understand your metrics. Last year as I was helping to launch an email newsletter, I wanted to leverage the same type of analyses I did for products, but for email. I spoke with a couple of experts in the email industry to get ready to understand what to measure, and they told me to:
Monitor my engagement metrics by email provider
Remove non-engaging contacts from our email distribution list
Monitor my long term retention of cohorts of contacts
These felt like classic behavioral analytic problems in the product space, but email focused. I assumed that somebody was enabling this kind of analysis for the email, space, right?
Nope. I worked at HubSpot for five years, and I have so much respect for that product team. They’re badass, plain and simple (crazy smart, humble, and get stuff done). They built some simple features to answer some of these questions, but don’t provide retention across all of your email campaigns. Does mailchimp offer anything like this? Nope. What about AutoPilot, the company we were using when I joined Reforge? Nope. I did a quick search and I didn’t find any company that provides this type of feature.
One of the core things we teach at Reforge is that retention is king – it makes or breaks your company (acquisition, monetization, payback period, competitive advantage). So I set out to measure it.
It was pretty simple, once I got the pieces working together:
I turned on the AutoPilot source in Segment, and piped the data to our data warehouse. Luckily we’re not Amazon, so a simple postgres database easily housed the data for this new product.
I turned on the Sendgrid source in Sendgrid, then spent weeks going back with Segment’s support department figuring out how to properly configure webhooks so email activity data flowed into our data warehouse.
I wrote a Jupyter notebook that bucketed contacts into their weekly subscriber cohorts and then built retention heatmaps based on the email activity data from both our email marketing system and our transactional emails.
I ran a script that queried DNS for a domain’s email provider so I could segment the retention curves by email provider (g-suite, microsoft, aol, yahoo, etc).
The outputs looked like this (non-segmented charts):
This helped us to answer key questions like:
What percentage of our subscriber cohorts were active N weeks after they subscribed?
Did we have a sticky email newsletter? Did people still around long term?
Would we be able to sustainably grow our subscriber base over time, if we were able to keep acquisition constant / grow it over time?
How did our retention curves look by email provider?
Who were our most prolific consumers (forwarding emails to others, consuming regularly, etc)?
Who should we be removing from our distribution lists (so that the email providers weren’t hurting our sender and reputation scores)?
It made me ask myself, why don’t email companies provide this kind of functionality? Some thoughts:
Mailchimp, HubSpot, and companies like it are focused on all of the other aspects of email: helping people design emails, setup automation, and measure individual campaign performance. The bigger problem is not having enough contacts to email in the first place, not having a well designed email, or wanting to analyze a single campaign rather than look at the health of an entire contact database.
Cohort analysis is not something many people find intuitive, and is a relatively advanced topic. There are still many product teams that don’t measure it, and I expect it’ll come to marketing tools eventually.
This is a big company problem, and they’ll end up writing custom software to solve it for themselves. For everyone else, this isn’t a must have.
Is there some easier way to do this? Is there a company that enables this? Let me know, I’d love to use their less-buggy code. I am trying to clean up the code so it’s half respectable and will try to post when I can.
If you read about the strategy of successful tech companies today, it’s all about having “obsessive customer focus” (Jeff Bezos’ 2016 Amazon annual shareholder letter). You’ll hear that “whoever gets closest to the customer wins” (Drift), and that companies want to “solve for the customer” (HubSpot culture code). Ultimately, the question isn’t about whether you’re focused on your customers, but how you go about evaluating who they are, what they’re trying to accomplish, and how they’re interacting with your product. I recently started using a workflow that gives me a continual stream of feedback, allows me to go back and forth to dig deeper and clarify important questions, and then also easily share the results with my team. It also required no effort from our engineering team to setup and didn’t require additional budget.
I work at Reforge, and we’re an education company. We offer programs for those located in SF, but we also have an equivalent online-only experience. For a bunch of our key initiatives this year, I felt that I didn’t fully understand what brought people back to our web app after some time away, and I wanted to dive deeper to improve it. Rather than do a one offsurvey, I setup a campaign that runs continuously to deliver this feedback on a daily basis.
Using Segment to create our list of “alumni” to survey
When I arrived at Reforge we were already using Segment, and we ended up buying their personas add-on. I’m a relatively happy customer (and happy they just raised $175 million), but used it because we were already paying for it. I’m a big fan of using the tools available to you.
Segment has a feature called Audiences that lets you create lists of people. Since most of our important data attributes and events were already flowing through Segment, it was very easy to define small segments or large swathes of our user base through a simple editor. While I live and breathe SQL, sometimes it’s really nice to build it out in a GUI. Here’s what I built:
The nice thing about this is that when someone enters this audience, it means that they’re an alum, they’re not in our most recent cohort, and they have viewed our online material. The cool thing is that you could specify anything about the users that you have available (role, country, seniority, depth of engagement, type of user, organization, etc).
A feature of Audiences is that you know when someone enters the audience. The way I’ve structured the audience, this will mean that they came back to our site and it has been more than 90 days since their last visit – otherwise they’d already been a member. So it’s a cool way to know when someone has come back.
Send the list of people to Zapier
Segment then allows you to send this information to any of their supported destinations. I sent this information to Zapier:
Segment tells Zapier that the user has come back to our site, and Zapier then writes it to a google spreadsheet. This is what our looks like:
What you can see here is that a user from HubSpot named Kieran has revisited how to build a qualitative growth model. I am also using a Segment API to pull in their first name and the title of the last page they visited, in case I want to include that in my outreach asking for feedback. Segment is continuously sending data about people coming back to Reforge, and each time it happens Zapier is writing it to a Google Spreadsheet for me.
Email the people revisiting the material
I then have another Zap that takes the rows from this spreadsheet and emails the person from my personal G-suite account.
You can see what it looks like if I look in my sent folder within gmail (and you can see that people have replied to the email from Gmail’s threads):
Via this workflow, I’m automatically emailing people that are coming back to our site asking them what brought them back. I love getting a continual stream of this feedback. Because it’s in Gmail, I can go back and forth with them to clarify what they mean and to dive deeper to understand. If you have a huge user base, you can easily filter down the number of people you email with another step in Zapier (mod their user id by a number to make sure you don’t email too many people at once).
Collect all of the feedback in a Google Doc
At this point I’m automatically emailing everyone from the segment I care about, and I’m able to go back and forth to clarify any questions that I have or dig deeper. Rather than copy and paste their responses into a google doc to share snippets of feedback / soundbites, I hooked up another Zap to automatically pull in their feedback and put it in a google doc.
When the feedback emails come in, I setup a gmail filter to automatically apply a label to the email. I use Zapier to look for new emails under that label, and I exclude any emails from me (my replies to them). Zapier then puts the emails as a spreadsheet row in a google sheet:
Then I used a couple of simple formulas to combine all of the emails from a single user into a single row in another spreadsheet:
I separate the replies with a ———, so the above row represents multiple emails back and forth with this person. You can see that my first question was about uses cases, and then the third one was about the ones that come up frequently.
Column A is set to be “=UNIQUE(Emails)”. That means that there will only be one row for each email address I have feedback from. The formula for column B is “=ARRAYFORMULA(TEXTJOIN(CHAR(10) & “——–” & CHAR(10), TRUE, IF(Emails=A2,Response,””)))”
Array formulas are really cool, I’ve only had the need to use them a couple of times but I am always so impressed with their functionality. Basically this formula tells Google Sheets to combine all of the emails from users (remember, each row is a single user) together with the “——–” separator.
This is pretty powerful. Now I have a spreadsheet that has the entire conversation with someone in a single spreadsheet row, and I can share that spreadsheet with my entire team. We can then add columns to categorize feedback into buckets for easy filtering / reading.
Why I love this approach:
I get to read feedback from a critically important segment of users every day. I can define multiple segments to run simultaneously. The only limit is the limit on emails I can send from my Google account (2,000 messages), and the number of emails I have time to respond to.
I get to follow up with them in my main email tool
Their feedback then gets pulled into a spreadsheet automatically that I can share with my team members, categorized, and filtered.
While this isn’t the easiest thing to put together, this is so much easier than it used to be: writing complex SQL by hand, setting up a cron job to run this, writing a custom Gmail script, and then store this information in some database / google sheet). This is so much easier. If you have read this far and are thinking about building this – let me know I’d love to try it out first.
Is there an easier way to accomplish this? Let me know, I’d love to switch to it.
You’d be crazy not to measure your active users, at the appropriate frequency for your app. That may be daily, weekly, monthly, or yearly. (DAU / WAU / MAU).
As you monitor these key metrics, you’ll want to understand how and why it’s changing. I am a big fan of how Jonathan Hsu breaks it down in accounting for growth, breaking out the components that add or subtract from your top-level numbers. Understanding the dynamics that contribute to your growth (or lack thereof) are critical for members or your team to understand what drives success. Not just the PM that owns the feature, but many of the team members.
I’ve had conversations with PMs at many companies who may know their top-level numbers, but don’t have a good handle on why they’re going up or down. As Jonathan outlines in his post, two very different businesses can have the same MAU numbers, but one is much better than another.
There are a couple of scenarios I think are important to understand:
If your retention rates are poor and new signups are helping you grow:
In addition to knowing your high-level active user numbers, you should know why you’re growing. It’s helpful to understand what the future looks like if things stay the same or worsen. Even if you know, does the rest of your team know? When everyone understands how your product and business will be successful, they’ll have the context to more effectively make the critical decisions in building tech products.