Using analytics before any code is written

I was very lucky to have worked at HubSpot during a pivotal transition in its evolution. I managed a team of Product Analysts and data scientists, and we were charged with improving our product and the overall customer experience. We did this by analyzing what our customers did in our product.

This isn’t helpful when you are a brand new startup without a product. At that stage, you should be sitting with your customers or talking to them all the time. You should be taking advantage of your ability to do things that don’t scale.

Once you have more users than you can speak with, behavioral analytics are crucial to being successful. I often have conversations with people that aren’t sure how to apply data enhanced methodologies when launching new features. They bring up a good point:

How are you supposed to leverage behavioral analytics when the feature doesn’t exist?

It’s a good question. At this stage of your lifecycle, your most important task is to speak with customers that may be a good fit for what you want to build. While you may not have people using this new feature yet, there are usually groups of people who could use it within your existing user base. You just need to find the right people to speak with.

These are the types of questions I ask:

  • Which of your existing users are most likely to use the new feature?
  • What characteristics do those users have?
  • What actions are they taking?
  • Do you know why they’re taking the actions they are and whether this new feature would be useful to them?

Take advantage of the existing users you have and their patterns of usage. If you’re building a new feature or iterating on an existing one, it’s helpful to understand the high-intensity users or the infrequent users. They are a goldmine of information and you’d be crazy if you didn’t ask them for feedback.

Talk to your existing high intensity users and your infrequent users, they are a goldmine of information. You’d be crazy to ignore their feedback when building something new.

I typically push PMs / researchers / marketers to spend 30 minutes looking at their analytics system to pick out a group of people they want to speak with. I think that anyone involved in building new products is already overloaded with too many tasks, and this can easily feel like unnecessary work or an endless process that will take too much time. I think the key is to timebox this type of analysis and have it point you in the right direction.

I agree with this Intercom post that more than half of a PM’s time should be spent understanding customers’ problems, doing research, and thinking about how design can be applied to those problems. In order to spend that time most effectively, I think behavioral analytics systems are crucial to making sure you maximize that time and understand what your existing users are doing.

Snapchat’s Secrecy and DAU Metrics

I was pretty interested to read about Snapchat’s DAU numbers and their culture of secrecy. At first, I was pretty shocked to read about the stories employees told about the lengths the company goes to in order to keep its information private.

It’s consistent with anecdotal stories I’ve heard about Snap (they’re serious about privacy and keeping their own information a secret), but I always try to take these stories with a grain of salt.

I immediately thought about how we try to do things differently at HubSpot. Two elements in the HubSpot culture code are using metrics and transparency in the organization. I thought “we’re totally different than that here at HubSpot”, and I bet many of you thought the same thing when reading the Snapchat article. While I think we strive to be different, we’re far from perfect and are constantly trying to improve. Some of the questions I asked myself (and ways I want to hold myself and our teams accountable):

  • Does everyone in the organization have access to data (behavioral analytics, data warehouse) that helps them make better decisions?
  • For those that aren’t technical, is it accessible with non-technical tools?
  • Just because they have access to the data, do they leverage it in their ideas, analysis, and proposals?
  • Do we have sufficient documentation about how to use the data that’s available to all employees?
  • Do we make an effort to train people on using the data so they are as self sufficient as possible?
  • Do we create a culture of sharing and encouraging others to showcase their findings?
  • Do we enable others to reproduce analysis that has been done in the past?


While I like to think we’re better than the portrayal of Snapchat in the article, I’m not 100% satisfied with the answers to the questions above.

Choosing a behavioral analytics system: our journey to Amplitude

As part of my role at HubSpot, I run a team of analysts and data scientists that leverage quantitative analysis to inform our product development and improve the customer experience. It’s our goal to make teams self-sufficient in answering questions like: “How many people are using this feature?” or “What percentage of signups do X?” or “How sticky is this feature?”. In addition, we perform analysis and build models to identify and act upon areas of opportunity. One of the main tools we use on a daily basis is our behavioral analytics system, which helps us understand what our customers are doing inside our product.

I’ve become increasingly obsessed with behavioral analytics over the years. Here’s a brief timeline of my experience with them:

  • 2013:
    • Join HubSpot, start building a new product with Mixpanel
    • Blown away by the type of analysis it enables. Mind. Blown. It revolutionizes how I think about building products
  • 2014:
    • Grow the new product to hundreds of thousands of users, start to get nervous about our Mixpanel bill (this was a huge mistake in hindsight)
  • 2014 – 2015:
    • HubSpot decides to build its own internal behavioral analytics system
    • Rationale:
      • HubSpot is a public company at this stage, it’s a competitive advantage to have complete ownership and control over this system
      • If it costs as much as an engineer’s salary, why not pay someone to build a system customized for us?
      • We could solve our own problem, then turn the solution into a solution that could be sold to customers
  • 2016:
    • Perform a vendor assessment of our internal tool vs. a vendor (for a variety of reasons, to be explained in a future post)
    • We choose to go with Amplitude as our new behavioral analytics system
  • 2017:
    • Finish our migration to Amplitude, we currently have 250-300 HubSpotters using Amplitude on a monthly basis

Why did we pick Amplitude? Some key reasons:

  • They allowed us to create charts that count by users or by other arbitrary identifiers. Since HubSpot is a B2B company, we want to track active companies, look at the conversion rates for key actions for all users in a company, and look at company retention. Amplitude had the best solution: it allowed us to change one option in an existing chart to toggle between users and organizations. Other companies could technically solve this, but I thought it was too cumbersome.
  • They had an option to store our data in a SQL database (at the time Redshift, now it’s Snowflake). The important piece is that it allows our business intelligence team to ingest the data at a regular interval so it could be combined with other data sources. We use Looker internally, and we want to take behavioral data and combine it with financial data, CRM data, support data, and any other data loaded into our data warehouse.
  • They were focused on product analytics. We felt that their roadmap aligned perfectly with our priorities and long-term goals.
  • We had a team of 3 engineers and some of a PM’s time devoted to our internal tool. Amplitude has a much bigger engineering team and we didn’t think the customizations we would build were worth it. We felt the product team’s efforts were better spent generating value for the company, not in building a tool that was (at best and probably not the case) slightly better than Amplitude.
  • Their dashboard and behavioral cohort features were just what we wanted
  • It was fast. Our internal system had been plagued by slowness and outages (we had turnover on the team that built the internal tool and had then understaffed the team)

No solution is a panacea and I won’t say that Amplitude is perfect in every way, but I have been personally very happy with the decision we made. I’m pretty bullish on all of the companies in this space (I think they’re all powerful and worth the money), and unless there’s a fundamental shift in the technology required for these kinds of systems, I don’t want to be involved in building another one from scratch.

Influencing a Boston Angel: Dharmesh Shah

I took a class in my last semester of business school in 2013 that was about analyzing networks. For our final project, I worked on a team that analyzedDharmesh Shah’s twitter followers. The goal was to identify who might be a good candidate to make an introduction. Below is the complete writeup as we submitted it to our professor, Marissa King.


Dharmesh Shah is an influential entrepreneur and angel investor. He is the co-founder and CTO of HubSpot, a technology company headquartered in Boston. Dharmesh invests in many early-stage startup companies each year, and entrepreneurs routinely court him as a mentor and investor. As an incredibly busy executive and investor, Dharmesh is not an easy man with whom to get an audience.

Our group set out to analyze Dharmesh’s network to find the most influential people. By identifying the most connected people in his circles and the networks in which they operate, someone could prioritize their efforts in getting introductions.


A strategy to influence Dharmesh starts with influencing those who can influence him. Therefore, we built our analysis on two hypotheses:

  1. Dharmesh’s network looks similar to our own, in that it has important sub-networks.
  2. Within these communities, there are people who can influence Dharmesh.

If network analysis can identify these influential individuals, one could effectively surround Dharmesh, gaining connections to him from a variety of his networks.


Our analysis uses information gathered from Twitter rather than LinkedIn or Facebook. Twitter is unique and different from these two social networks because it is public by default. Twitter has an asymmetric follower pattern where anyone can subscribe to the updates of another person; both parties do not have to choose to connect. Since many in the technology community use Twitter as a news and information service, it would be a good indication of whom someone respects and looks to for interesting and influential information.

To analyze who is influential to Dharmesh, the analysis focused on people Dharmesh currently follows. Through the twitter API, we downloaded:

  1. The twitter accounts that Dharmesh follows
  2. The twitter accounts that follow those accounts

We downloaded over 10 million pieces of follower information as pairs of directed edges (the people that influence Dharmesh, and the people that follow those influencers). We put the data into a relational database so that we could model the edges and query it on an ad-hoc basis.

In order to determine the influencers within the network of people that Dharmesh follows, we created a graph of the mutual connections. We only graphed a connection between two people if they both followed each other. This removed many edges in our graph because many relationships only had a single directed edge. We felt that this was a better indication of a relationship and would highlight communities of influence more effectively.


Looking at Twitter data instead of Facebook or LinkedIn has the advantage of portraying what Dharmesh is currently working on and thinking about, as opposed to his entire personal or professional network. This has the benefit of identifying what will pique his interest, since we assumed that he only follows people that share content that is interesting to Dharmesh.

In analyzing the network graph, there are clusters that represent the Boston startup community, Silicon Valley, and HubSpot employees and alumni. Contrary to the graphs of our personal network analysis, the groups were highly integrated with one another and were hard to distinguish. We believe this to be the case because anyone can follow anyone else on Twitter; there is no expectation of being friends or having worked together professionally. If someone is sharing interesting content on Twitter, individuals are accustomed to following others they may not have met in real life. We believe that explains the lack of separation of subgroups that are present in the Facebook and LinkedIn network graphs.

Our group expected to see more subgroups that are distinct in Dharmesh’s network of influencers. In our professional and personal graphs, we each had communities that represented high school, college, professional groups, and graduate school networks. We were only able to identify three separate sub groups in Dharmesh’s graph, with only one company and two regional communities. While surprising at first, we believe it is driven by the interaction that takes place on Twitter. Rather than accumulating contacts, Twitter is about what is interesting to you at the current time and many users regularly unfollow others based on their tweets. This is very different from Facebook or LinkedIn, where you rarely remove a friend.

It is not surprising that Silicon Valley represented a significant element of Dharmesh’s graph of influencers, since the region is the largest in terms of venture capital and startups. Boston did not represent more of the influencers graph, but again that may have been influenced by the fact that Silicon Valley is responsible for a majority of the innovation in the technology and startup industry. There were two subgroups that represented members of the Boston startup community, which is interesting considering that HubSpot has been one of the fastest growing startups in Boston for many years. For any entrepreneur looking to gain access to Dharmesh, it represents two opportunities for identifying individuals. Additionally, it may indicate that there are few influencers in the HubSpot community that have a strong following in Boston or Silicon Valley. That makes sense given that Dharmesh is one of the most highlighted entrepreneurs in Boston and one of the biggest public faces of his company.

In order to identify the people in each subgroup that would be helpful in influencing Dharmesh and the people he follows, we analyzed thebetweenness and closeness centrality for each person in the graph. In this analysis, we sought to identify people who could influence Dharmesh, but would be accessible because they are not as popular and sought after as a mentor and investor. As an entrepreneur would, we inspected the highly ranked individuals to determine who would result in the best outcome. If the person were as popular as Dharmesh, it would not make sense to reach out to him or her.


To get in front of Dharmesh, it is important to look not only at the closeness centrality of the target contact, but also his or her role in the industry and how likely he or she would be able to connect you to Dharmesh. The person with the highest closeness centrality is actually an industry analyst for Altimeter Group, which is a research and advisory firm. Without additional knowledge, a target such as Jeremy Owyang would likely offer industry insight, but not potential contacts, since many would likely look to him for market research.

Instead, we would recommend trolling through the list ranked by closeness centrality and cross-referencing it with information that can be gathered elsewhere. With this approach, Jeremy Levine, David Hauser, Dan Abdinoor and Eric Paley seem to be the best targets. Jeremy Levine and David Hauser are both young entrepreneurs that Dharmesh follows that are deeply connected to the Boston startup community. While they have already established a reputation in the Boston VC community, they are likely more open to and available for meetings. Dan Abdinoor has worked at several startups in Boston, and is at the center of the HubSpot subgroup even though he is no longer employed by HubSpot. This is most likely due to the fact that he was one of the first ten employees hired and stayed through tremendous growth.

Potential Contacts that influence Dharmesh:

Dharmesh’s Twitter network graph:

Thanks to Jason Koster, Jake Berliner, and Seth Taft for letting me publish our analysis on my account here.

Running Retention Experiments

In my time working at HubSpot and on our sales products, I’ve been lucky enough to be exposed to high growth products. In growing these products, I’ve had a chance to run many experiments aimed at retaining users.

I spoke at SaaSFest 2015 sharing these experiments and the ways our team breaks down optimizing for these types of metrics. There were two great blog posts that followed up my talk breaking it down further, one from ProfitWell and AppCues. It was based off a talk Brian Balfour prepared for the Weapons of Mass Distribution 2015 conference.

If you have a chance to head to SaasFest 2016 (Patrick, you better do the conference again), I highly recommend it. The venue and atmosphere was intimate and the conversations I had were top notch. It was great to learn from others in the SaaS industry. The accessibility of people like Hiten Shah, David Cancel,  Patrick Campbell, and many others was phenomenal.

Check out the recaps:

© 2019 Dan Wolchonok

Theme by Anders NorénUp ↑