Data governance used to be a side hustle that Data Engineers would tackle as they were doing their “real jobs”–building their pipelines or warehouse size corrections or indexes, their views, raw zone, presentation layer, or data contracts. In between, they’d mask some data or throw in a row-level policy. But as data regulations have become more strict, numerous, and prominent, data governance has become a real job of its own, with data stewards or compliance teams focused on determining policies.
In addition, data users have proliferated across the enterprise. Now every “line of business” user must have access to data to improve results. This has led to a situation where data is moving from one end of the company to the other, but the rules around it are stuck in silos, with each team moving, touching or using data unaware of how what they’re doing fits into the whole.
Imagine a data engineer in the middle of this data flow, in charge of a warehouse where trucks keep showing up and dropping off data pallets. Where did the data come from? Who sent it? What kind of data is it? What are the requirements for storing and sharing it? Brick-and-mortar warehouses have this down to a science through their supply chains. Enterprises need to make sure the same rigor around their data supply chain.