In this episode of Building The Backend we hear from Prukalpa Sankar – Co-founder of Atlan, where we talk all about data quality/governance, common issues organizations face when implementing data quality and much much more. Below are top 3 value bombs:
- Data Governance has a bad reputation. It should not be a bureaucratic controlling process that is pushed from the top down.
- Active Metadata is key to modern data architectures, essentially it’s putting together all the human and machine generated metadata together to derive insights.
- One of the most difficult metadata attributes to capture is the context for the data as this almost always requires input from humans and tribal knowledge is often lost and is not documented.
What led you to co-found Atlan?
Before this, Prukalpa and her co-founder Varun founded Social Cops, which focused on the data science field for social matters. Organizations like the United nations, Gates foundation, Non-profits, and several large governments didn’t have data teams or technology teams. Therefore, SocialCops became the end-to-end data team for these organizations solving problems such as national healthcare and poverty alleviation.
Through this experience, Prukalpa learned everything about running data teams and how complex and chaotic they can be. Due to the work SocialCops was doing, they had the chance to work with a wide variety and scale of data.
“At one point we were processing data for 500 million Indian citizens, billions of pixels of satellite imagery, and all that sound like really cool projects. And I guess our dream projects for the data practitioner in some ways. But the reality for us every day was chaos. Slack channel is consistently filled with messages. Like what does this column name mean? What’s the final version of this setting?”
Prukalpa makes a great point on how managing data teams and the challenges it brings are usually not technology problems, but often it is a human collaboration problem. This means communications, onboarding errors, collaboration etc.
Prukalpa adds: “The unique thing about data teams is the diversity of stakeholders. By nature, data is at this intersection of technical and business, which means you need diverse people, analysts, engineers, scientists, and business users all need to come together and collaborate effectively.”
A diverse set of people means you receive a diverse set of preferences, skillsets, the way they work, and unique life stories.
This led Prukalpa and her team to come up with a strong data platform and where they could’nt buy products to solve there challenges they built them. A couple of years passed by; Prukalpa and her team “realized that we’d build tooling, that was more powerful than we had only intended. And we were like, Hey, is there a way we can help the teams around the world with this stuff?”
And that was how Atlan was born.
As for how Prukalpa built their tooling, she suggests not be closed off in the idea of using open-source technologies. Since building that already exists in the world is pointless. This is why a huge component of Atlan’ success even today is scattered throughout several open-source tools.
As much as possible, Prukalpa explains, “. . . as much as possible, we tried to leverage existing open-source components.”
What are the most common misunderstandings about data governance and data management overall that you see organizations face?
Prukalpa explains that the most common problem is that it has got a really bad branding problem in the first place. These organizations often sound restrictive, bureaucratic, and controlling processes. Unfortunately, the reality is, in many cases, that’s actually how it’s ultimately implemented in these large top-down sort of way. Prukalpa adds, “The purpose of this is not to control. The purpose of this is creating better data teams.” When businesses and organizations start realizing and implementing this mindset in an agile and collaborative fashion, rather than one top-down governance team. The problems and chaos reduce, and productivity increases.
What is Atlan?
Atlan is a platform focused on data quality and governance by acting as a central hub for passive and active metadata. Atlan has enabled teams to create and collaborate effectively across their different data stacks through integrations such as Slack, BI tools, and more. Atlan provides a crucial step towards good governance through which data teams can monitor everything internally and effectively monitor data infrastructures.
Prukalpa explains that Atlan is a “very holistic tool in the metadata ecosystem to enable better collaboration. So, the best use cases that we enable are the quality single source of truth, documentation, catalog discovery; that’s what we enable the best.”
Suppose You Were To Start Again. What would be the first things you’d tackle?
Prukalpa emphasizes the importance of prioritizing business problems first. This is because if you start with focusing on projects, toolings, etc. “Your team is probably taking twice or thrice the amount of time that they’re going to take because you have all these trust and accuracy issues and things like that,” Prukalpa suggests prioritizing your top business projects such as your customer 360 algo, recommendation system, etc. Keep it simple; focus on the business.