Email Cohort Retention

I have used and love behavioral analytics tools like Amplitude, Mixpanel, Heap, and Pendo. They’re life-savers if you’re a product manager, marketer, designer, analyst, or engineer focused on improving the product experience. If I was dropped into any company’s product management team, it would be one of my initial asks: point me to your data system and let me understand your metrics. Last year as I was helping to launch an email newsletter, I wanted to leverage the same type of analyses I did for products, but for email. I spoke with a couple of experts in the email industry to get ready to understand what to measure, and they told me to:

  • Monitor my engagement metrics by email provider
  • Remove non-engaging contacts from our email distribution list
  • Monitor my long term retention of cohorts of contacts

These felt like classic behavioral analytic problems in the product space, but email focused. I assumed that somebody was enabling this kind of analysis for the email, space, right?

Nope. I worked at HubSpot for five years, and I have so much respect for that product team. They’re badass, plain and simple (crazy smart, humble, and get stuff done). They built some simple features to answer some of these questions, but don’t provide retention across all of your email campaigns. Does mailchimp offer anything like this? Nope. What about AutoPilot, the company we were using when I joined Reforge? Nope. I did a quick search and I didn’t find any company that provides this type of feature.

One of the core things we teach at Reforge is that retention is king – it makes or breaks your company (acquisition, monetization, payback period, competitive advantage). So I set out to measure it.

It was pretty simple, once I got the pieces working together:

  • I turned on the AutoPilot source in Segment, and piped the data to our data warehouse. Luckily we’re not Amazon, so a simple postgres database easily housed the data for this new product.
  • I turned on the Sendgrid source in Sendgrid, then spent weeks going back with Segment’s support department figuring out how to properly configure webhooks so email activity data flowed into our data warehouse.
  • I wrote a Jupyter notebook that bucketed contacts into their weekly subscriber cohorts and then built retention heatmaps based on the email activity data from both our email marketing system and our transactional emails.
  • I ran a script that queried DNS for a domain’s email provider so I could segment the retention curves by email provider (g-suite, microsoft, aol, yahoo, etc).

The outputs looked like this (non-segmented charts):

This helped us to answer key questions like:

  • What percentage of our subscriber cohorts were active N weeks after they subscribed?
  • Did we have a sticky email newsletter? Did people still around long term?
  • Would we be able to sustainably grow our subscriber base over time, if we were able to keep acquisition constant / grow it over time?
  • How did our retention curves look by email provider?
  • Who were our most prolific consumers (forwarding emails to others, consuming regularly, etc)?
  • Who should we be removing from our distribution lists (so that the email providers weren’t hurting our sender and reputation scores)?

It made me ask myself, why don’t email companies provide this kind of functionality? Some thoughts:

  • Mailchimp, HubSpot, and companies like it are focused on all of the other aspects of email: helping people design emails, setup automation, and measure individual campaign performance. The bigger problem is not having enough contacts to email in the first place, not having a well designed email, or wanting to analyze a single campaign rather than look at the health of an entire contact database.
  • Cohort analysis is not something many people find intuitive, and is a relatively advanced topic. There are still many product teams that don’t measure it, and I expect it’ll come to marketing tools eventually.
  • This is a big company problem, and they’ll end up writing custom software to solve it for themselves. For everyone else, this isn’t a must have.

Is there some easier way to do this? Is there a company that enables this? Let me know, I’d love to use their less-buggy code. I am trying to clean up the code so it’s half respectable and will try to post when I can.


  1. Interesting post.

    Email analytics is kind of still stone-age in a lot of respects compared to Web/Product Analytics, with very few out of the box tools that help you measure what matters.

    I make my living from email marketing (and make my partners/clients a lot of money from it). Looking at engagement is good, but how does it tie back to revenue? (Assuming this email newsletter is going out to prospects)

    I’d be more interested in retaining/increasing engagement of subscribers who make me money (or might become customers in the future). Now THAT’S something worth optimizing for!

    – What’s common among subscribers who turn into customers? How can we identify similar subscribers on our list so we can focus our efforts on them?
    – How can we optimize our email flows to retain & convert more of these folks?
    – How can we attract MORE of them and add them on to our email list?

    Improving email engagement in general, without tying it directly to revenue, is a dangerous pursuit in most cases.

    It’s like when people say “OMG, people are unsubscribing when we’re sending sales emails!” … who cares if they’re not buying!

    Curious to hear any thoughts you have on this and if you’re doing any work around revenue from email?

    Keep up the great work!

    • danwolch

      April 25, 2019 at 2:16 pm

      We ended up ending this project ( earlier this year, as it didn’t result in increased revenue for our business. People loved the content, but we it wasn’t growing at the rate we wanted and didn’t result in net new applications to our educational programs. We wanted to invest that time, energy, and money into other marketing efforts that would likely have a bigger impact on top of the funnel growth and long term revenue. To your other points, I think increasing engagement is a good goal and the main area you’d spend your time, but it can be helpful to do this kind of analysis first to understand if you have a healthy long term email database/strategy.

  2. Totally agree that cohort analysis is not intuitive. And even with that heat map, you probably don’t know what questions you can possibly ask. The questions in the post are good reference! Thanks for sharing.

  3. Awesome post! Given the continued rise of email newsletter monetization & automation tools (as well as the Drift/SurveyMonkey reports that a third of people use email more often in the past year), this seems quite relevant nowadays.

    You mention that the Reforge Brief project was ended this year, but do you still apply these analyses to other marketing campaigns/newsletters? I’d love to hear if you’ve progressed on this approach or are still looking for alternatives to enable this. I think the last question (who should we remove from the list to maintain our sender and reputation scores?) is especially important and has been something I’ve looked into.

    Also curious to hear if you found differences in retention curves by email providers.

    • danwolch

      September 27, 2019 at 3:14 am

      We haven’t done much on the marketing front this year. I expect that we will want to do something like this in the future.

      I haven’t found any tools that do this (I don’t think anyone will do it anytime soon), and I didn’t build in the ability to do it by email provider. I have the data, I just haven’t updated my notebook to take advantage of it.

Leave a Reply

Your email address will not be published.


© 2024 Dan Wolchonok

Theme by Anders NorénUp ↑