How we use Reverse ETL

We use reverse ETL here at Reforge for a bunch of different use cases. They are:

  • Sending data about our customers to our email automation system, Iterable
  • Sending data that is aggregated in our data warehouse (snowflake) to a Google sheet
  • Sending data to our behvaioral analytics system, Amplitude
  • Sending data to our CRM (HubSpot) from our data warehouse
  • Sending data to Slack from our data warehouse

Examples

We use Segment and Personas features to send data to our email system, Iterable.

There are cool features in Personas that allow you to use a GUI to define things like:

  • Counters (example: how many page views that meet a certain criteria in the last X days)
  • Aggregation (example: how much revenue have they generated in the past quarter)
  • Most frequent (example: what’s the most frequent blog post they’ve visited)
  • First (example: what’s the first marketing campaign that got a user to visit your site)
  • Last (example: what’s the last marketing campaign that got a user to visit your site)

They also have the ability to write arbitrary SQL to pull a list of people and any columns in the SQL can be traits associated with a user. So you can use GUI to define basic things or use complicated SQL to pull sophisticated stats.

An example of a user trait we have in Segment:

A SQL user trait in Segment that pulls a list of applicants to Reforge and values associated with their application

Important caveat: it’s on our roadmap to pull all of this logic out of SQL and put it into a DBT model. The benefits there are better testing, understanding of data lineage in a DAG, and we have all of it under source control. In the future the SQL for this trait could be something like applications.user_application_values.

In our data dictionary we define these properties:

Screenshot of our data dictionary covering the same fields

Then we can use the traits to send personalized campaigns in Iterable for many different scenarios:

  • Application acceptance or denial
  • Payment nudges to remind someone they’ve been accepted
  • Stats about how many people from their company have participated

Here’s an example of a campaign in Iterable:

Sample campaign in Iterable that uses the fields from the SQL trait / data dictionary

You can see how we have a dynamic template in the email above that personalizes the email subject to have the company name they select as well as stats about the company with things like:

  • How many people from their company have participated in Reforge in the past
  • How many people from their company are current members
  • How many people from their company have applied recently

I think it’s pretty great that Segment has this functionality especially when you combine it with their other platform tools like Protocols. That said, I have some gripes about the tool as of November 2022:

  • On our current tier the most frequent traits can run is hourly. Maybe I can pay for a higher frequency, but that drives me nuts and I’d have to negotiate with a sales rep for that.
  • It’s really hard to test a trait is working the way you want it, you can’t trigger a sync. This drives me nuts because I always make a simple mistake and then either try to get it to trigger or have to wait an hour to see if an update fixed it.
  • The destinations you can sync to are the typical ones for a CDP. I wish I could sync a result set with a spreadsheet, but that’s not possible.
  • I can’t sync non-contact objects to our CRM. For instance, I can sync contacts and their properties to HubSpot but I can’t sync subscriptions to a custom object or companies and their details to a company object in HubSpot.
  • It feels like Segment has abandoned working on this feature. They’re busy integrating with Twilio and building things like Twilio Engage rather than investing in making Personas great. I might have missed it, but it feels like they haven’t made changes to the product in years.

How we use Hightouch

We use a second ETL provider at Reforge. It sucks that we have two, but my hope is that it’s more of a transitionary period until we ultimately simplify with one. I started using Hightouch for free to sync data to a google sheet / to slack, and it became clear it was the vastly superior product to Segment Personas.

The world runs on spreadsheets, and often times it’s really helpful to give teams a constantly updated resource that has data synced from your data warehouse. It’s a great free path to get using their tools because once you have it connected, it’s so easy to add additional sources.

Here’s an example of how we send paid data into a spreadsheet for the marketing team (we pull paid numbers across all channels into our data warehouse and this helps them understand all of it and play with it in a spreadsheet environment rather than pulling it from a BI tool constantly):

Sending aggregate paid spend numbers into a spreadsheet for the marketing team

The model can be custom SQL, a table in a DB, or a model in dbt or Looker. It makes it really easy.

We also use Hightouch to sync data to our CRM, HubSpot. It’s just so easy.

We also use Hightouch to send customized messages to Slack. I’ve done this using Zapier in the past but it’s nice to be able to have all of these syncs in one place. Here’s an example of a daily ARR report that the team built:

Our daily update that sends data from Snowflake to Slack

Here’s what it looks like in Slack:

An example of a daily summary the team built via Hightouch

I spoke with the co-founders of Hightouch when it was early days and that’s when I found out they were part of the early team that built Personas at Segment. It feels like they were able to address a lot of their frustrations about how the product was built and make big improvements on it. Things that are just so much better than Segment:

  • Their support is so much better than Segment. This should be every startup’s advantage against the company acquired by a big behemoth, and they do a good job. I have been so disappointed with Segment’s support over the years and Hightouch makes the overall experience so much better because you can get an answer to why something is broken quickly. With Segment support, I often times have to wait days which is so painful when you’re working on a deeply technical integration like this.
  • They have a “run now” button where you can run a sync anytime you want. This sounds so simple, but I’m sure it makes the technical implementation much harder. I can’t tell you how many times I’ve messed up a piece of a sync in Segment personas and then I just have to wait until the sync runs again. It baffles me they haven’t changed this.
  • It feels like it’s being worked on all the time. Lots of changes and improvements, new features, and integrations with the latest and greatest tools.
  • The logging and insight into what happened is so much better with Hightouch. This is a real interaction I had with Segment’s support team (no judgement on the support rep, this is a reflection of the state of their product). This type of response is infuriating when you are working to integrate two systems together and you don’t know what’s wrong or when it’ll retry some logic.
A screenshot of my support ticket with Segment. I was very frustrated!

Conclusion

I look forward to moving all of our ETL infrastructure over to Hightouch from Segment at annual contract renewal time. While there are benefits to having the reverse ETL logic and elements all in Segment (we pay for their protocols feature and there are benefits in blocking bad data / being alerted when your data doesn’t adhere to the spec), I would rather use the better tool.

How do others do this type of work at their companies? I’d love to know if others solve their challenges similarly and if there are even better ways.