Thoughts on Fivetran and similar tools
I saw this post on LinkedIn this morning. I could not help but have my thoughts run when I saw it. The meme has done the rounds in many forms, but what struck me in this incarnation was the text and comments around Fivetran and Airbyte.
As you know, I have some experience with Airbyte's open-source version but have yet to use their cloud offering. I have written a few connectors for it and have Airbyte's open-source solution on multiple projects. I have also worked with Fivetran on a few projects. Naturally, I have some opinions on both.
The author of the post correctly states that Airbyte has challenges. The project has many open issues, and most relate to the connectors provided. Now many of these connectors are open-source contributions, and hence there are some rough edges. It also makes it rather challenging to build connectors for many of the targeted SaaS tools, as you need access to them. Something you usually only get when you are paying for them. Few people can pay for them personally as they are tools made for business use.
I am by no means making excuses for Airbyte. I want to highlight why you will find open issues and encounter rough edges. What is good, though, is that when you run into these, you can look at the code and fix things, typically quite quickly. Again, this comes from my experience having done that for clients.
Now Fivetran is a whole other beast. It does a fantastic job of getting your data into your cloud data warehouse. But there are a few gotchas that you should be aware of.
It will not load the raw data from the source for you. It will be loading data based on its own data model for that source. These models make sense as they are relational models for otherwise messy nested JSON from the source API. However, it means the data is processed before you receive it and in a structure that won't match the raw data. When you decide to one day move off of Fivetran you have a challenge. You don't have the original raw data, your existing pipelines are using the Fivetran relational model, and thus and thus you cannot simply swap Fivetran out with another tool. You will need to do a fair bit of work to replace it.
Fivetran sometimes does not provide all the raw data from a source's API. This has happened to my clients for sources like Shopify and Klaviyo. Sometimes this is a bug, which can take a lot of time to get fixed, and sometimes it is due to an API change stuck somewhere on the Fivetran team's backlog.
Security is another part of this that gives me pause. You effectively have to provide a Fivetran account access to your sources and cloud data warehouse for it to write data to it. This means your data flows through a 3rd party's systems whom you need to trust will not be doing nefarious things with that data. Call me paranoid, but there is a big difference between being able to look at the code being used and trusting that you will do the right thing as you specify in your EULA. (Just think about the recent issues with Tesla sharing sensitive images recorded by customer cars)
Pricing for Fivetran can be somewhat obscure and work out quite expensive when you start to scale. I say pricing can be obscure because you are charged for the number of records you sync, but you can *negotiate* pricing in certain circumstances with them.
My experience with Fivetran has made me very cautious of closed-source tools for data ingestion. You get an incredibly fast start when you use Fivetran or similar tools, but it comes at a cost. The cost of being in control of your Extract and Load part of your platform and being at the mercy of someone else's backlog, priorities, and pricing.
Does this only apply to Fivetran? No, it certainly does not; it applies to any tool you use to ingest data and includes things such as the SaaS version of Airbyte.
Be vigilant when making your tooling choices. Ensure you understand the price you will pay further down the line for the control you give up. The speed you gain at the start might be dwarfed by the effort needed to change later.