This episode of Building the Backend features CEO & Co-Founder of Airbyte Michel Tricot. Michel and I will be talking about data ingestion and the exciting parts of running an open-source data integration solution that allows a community to prosper around data technology and exploring what an open-source tool like Airbyte brings to the table. 

Who is Michel Tricot?

Michel Tricot is the CEO & Co-Founder of Airbyte – an open-source data integration solution for data teams. Before Co-founding Airbyte, Michel worked for over 15 years in the data space for different industries. Michel handled financial data, data collection, website integrations, data integrations, IoT data, and engineering. Michel also led a team of over 30 engineers tasked to build, maintain, and scale thousands of integrations while moving hundreds of terabytes of data every day.

What is Airbyte?

Airbyte’s main differentiator from other integrations is that it’s an open-source platform dedicated to working with the community to advance data technology further. Airbyte’s focus is moving data from point A to point B, making it a valuable tool for any organization collecting massive amounts of data from various sources and struggling to centralize that data into their data warehouse/lake. With Airbyte, you won’t have to worry about the complexity of data centralization, such as creating connectors, building API integrations, etc. 

What Are Your Guiding Principles with Designing Modern Data Architecture?

Michel explains that the principle of focusing on modularity instead of “finding one solution end to end, that will be your silver . . .” This is crucial to keep your morale high and to keep going.

This means that you should be focusing on creating building blocks for companies and letting them pick the most important things they want to prioritize and optimize. In the case of Airbyte: 

“. . . one of the things we’re very careful about is we really want to be the best at extract and load. We don’t want to do transformations. There are a lot of things we don’t want to do. We would just want to offer the highest quality open-source connectors. And the area of data movements, that’s it, that’s all. . .”

Where Does Airbyte Come into Modern Data Stack?

Michel explains how Airybte is an orchestrator of data movements. Michel adds, “Airbyte is here to sequence all these operations and tell you when something is broken or not.” 

Airbyte comes in very handy since most data teams struggle to fix data errors. This struggle often comes from having no idea where the error is or not knowing the underlying issue. It is important to remember that issues only become issues because of other issues. If you know what issue is causing another issue, this creates an easier time for data teams to debug the error. With Airbyte, you get to know when something is broken and what is broken, not to mention that Airbyte also handles most of the data movement side of things.

This makes Airbyte a tool to consider when thinking about your modern data stack since it saves you time in the future, and the amount of headache you avoid in the future is uncountable when you have Airbyte working with you.

Airbyte also has the added benefit of being an open-source tool which means it has a helpful community that is ready to answer any questions you might have. Not to mention, there are a lot of experienced data practitioners in Airbyte’s Slack community. Airbyte’s Slack makes for a great place to learn from other people.

How Do You Think Open Source Is Affecting The Data Ecosystem Today? What Trends Are You Seeing?

The biggest problem with non-open-source tools is that you either have to build from scratch or buy the solution. The former is cheaper initially but gets pricier long-term, while the latter is the complete opposite. However, the biggest strength of open-source tools is you get the best of both worlds. You can choose to run everything internally and not share data with anyone else, or you can choose to push to the main branch. Open-source also benefits from having a community that supports each other; this creates comradery and a place to ask for help when problems arise.

Where do you see data architectures heading over the next 2-5 years from now?

Michel sees that data warehouses are becoming a cornerstone for any data team but more importantly, the trend of who has access to the organization’s data is becoming broader. He sees a lot more folks that are less technical having access to these data sets, and “Some of the layers that I think are going to really come up in the next five years are going to be around like auditing and logging permission of data on warehouses.

Michel adds, “I also think that data security governance, auditing, logging and permissioning is going to explode over the next few years.”

Do You Have A Favorite Data Book or Resource That You Recommend To Our Listeners?

Designing Data-Intensive Applications: The Ideas Behind Reliable, Scalable, and Maintainable Systems

Resources