Foxtail Research

Market research Indiana Mid West
This is all about becoming aware, knowledgeable. compliant and protective with your data assets

Data creation, data storage, data processing and data consumption have become more efficient and easier for businesses of all sizes. Businesses plan to use their data assets in building AI driven organizations. AI models and LLMs are becoming more accessible with tools released by technology companies. In this scenario, what is becoming essential is to become more knowledgeable about these data assets such as what information is in the data, what is sufficient in the data, what is insufficient in the data and so on. This will enable businesses to know the quality of the data, to plan use cases for big data & AI projects and to fine-tune, interpret model data outputs that will facilitate data-driven decision making and thereby help businesses to reap benefits from AI & big data investments. An MIT study shows 80% of AI projects fail and one of the reasons mentioned is businesses not knowing well enough about their data assets. Here in this blog, I discuss a fast growing job role/occupation that is addressing this business challenge. 

In one of my previous blogs on responsible-ai – https://www.foxtail-research.org/if-ai-is-first-then-responsible-ai-is-foremost-here-is-why-and-how-to-become-ai-responsible/, I mentioned three approaches to become a responsible AI user (1) careful selection of AI use-cases (2) understanding well the basics and working of AI applications (3) learning to use AI tools appropriately and efficiently. While the latter two approaches align with technical skills, the first approach requires business(domain) and data knowledge. To frame efficient AI use cases and even more importantly to know how to use the AI outputs for decision making, it is absolutely essential to know about the data.  

Data Governance is a growing field that will enable businesses to thoroughly understand and evaluate their data assets. Further, an efficient data governance system will also ensure the business is using the data securely and that it is complying with regulations. According to a market analysis report from Grandview research1 , the global data governance market growth is estimated at a CAGR 21.7% between 2024 and 2030.

What is Data Governance for?

Data Governance in simple words is creating rules and policies for using and maintaining the data assets that a business possesses. The rules and policies will differ for each organization and even by each division within the organization. These rules will take into account the business needs, confidentiality/security needs, storage, processing and compliance needs set forth from regulations such as HIPAA, GDPR, etc.. Any breaches in compliances will incur a huge penalty and reputation of the business. Creating clear Data Governance policies will allow everyone working with data not only to gain knowledge about the data assets but also to take responsibility for security, accuracy, usage, sharing and consumption of data. With increasing usage of AI and in particular Generative AI LLM models, it is critical to create an organization that responsibly handles their data assets.

Why is Data Governance gaining importance?

Now comes an interesting question. Wasn’t such rules and regulations in place for a long time? Yes, they have been in place. But the growth, sharing and consumption of data was lesser. An efficiently designed database system with well-designed integrity constraints was sufficient to maintain and understand the data. Reports and outputs were generated in batch mode at fixed frequencies. Now data is growing at a rapid rate. Many big businesses process streaming data where the processing of data is done in real-time. And particularly in this AI age, these data are fed into models as training data sets. There may be fields which are confidential and the model is not supposed to use it even for modeling purposes. More so, If the business is accessing the AI and/or LLM models through an application interface, there is a chance of leaking confidential information outside of the business. An effective internal Data Governance will avoid such issues by setting rules for data usage. It gives a roadmap for the business to securely and efficiently use its data assets.

A big challenge in AI projects is assessing the model outputs, i.e.whether the model has correctly classified if it is a classification model or has closely predicted the values if it is a regressor model. These can be known with metrics such as accuracy and with several metrics which give error estimates. Overfitting or Underfitting is a big problem in AI/ML model building. In such situations if only we know about the data that was fed into the models, will we be able to understand the model output and fine-tune the models. This step is called training the models and it is THE step in the AI model building process. It is important to mention that it is we humans who have the natural intelligence about the data and with this natural intelligence we have to train the machine to learn the data (Machine Learning)  and thereby impart Artificial Intelligence to the machines. Say, we are injecting a prompt into a LLM model during the training process and it generates an output. How will we know the output is correct if we do not know about the data or the domain?

Similarly, data governance serves a crucial role in big data projects. Say, an e-commerce company is generating several thousands of records of data everyday by tracking user browsing details in their site and is then generating a dashboard to display their analysis. In such tracking and analysis, there can be several policies and restrictions such as hiding user personal information, masking details, hiding payment information, preventing usage of certain fields for analysis, restricting access to certain divisions within the company and many more. Businesses and internal teams have to comply with such rules to maintain compliance. Further if the e-commerce company plans to send text messages with promotions and coupons based on their analysis, then the business may also have to comply with any regulations from other sectors such as the telecommunications.. Some users would have opted out of receiving text messages, some carriers would charge for sending such text messages. Depending upon the country/region/state, there could even be restrictions in the time of delivery of such messages. As data is becoming real-time and is captured/processed in streaming mode, integrating tightly such policies and regulations is crucial for sustenance of the business.

Data governance helps define policies and procedures for maintaining data security and compliance. Data Governance creates a blueprint that has the rules and policies to use and process data. While an organization can still manage these data without data governance, it may lead to less efficient usage of data and failure in AI and big data projects. Encountering a model output that is not interpretable or that cannot be used for decision making or automation is a huge loss in investment.  In the AI age, data governance is important and is becoming the first step in implementing AI projects. Data Governance helps create a data culture that drives AI success. Until AI models and AI agents attain production-quality and become production-ready, they go through so many iterations and having a good understanding of the data will help improve the models accuracy and predictions.

With data governance policies, a business can create rules for (1) managing metadata (metadata comes handy in adding or removing features, i.e. feature engineering during model training process),  (2) tracking the data lineage, i.e. the transformation of data or in other words history of the data, (3) cleaning, discarding, or masking data as per company rules and business regulations, (4) crafting rules for data access for different uses and for various levels of data users. All these policies will provide guidance for using the data in an age where there is abundance of data. 

Growth in Data Governance opportunities

Data Governance can be a good role fit for anyone who has (1) the mindset for protecting data, keeping data secure and treating data as an asset, (2) passion in mining data, (3) passion for paying attention to details and in understanding data, (4) interest in keeping abreast with internal policies and external regulations, and (5) interest in integrating such policies and regulations with available data assets.

Data governance involves data management, IT governance, Information governance, knowing business objectives, understanding the business, having the data-sense to use it as an asset for revenue generation. It essentially requires both business and technical skills; however having knowledge of the business, data, internal policies, external regulations and being able to communicate with technical professionals will be helpful too to pursue a career in Data Governance. 

To elaborate further – Data management is about collecting, processing and using data and data governance as a critical part of it  is about setting rules, policies and processes in carrying out data management tasks; IT governance is about organizing technology investments to support business objectives and data governance specializes in the security, usage, and distribution of data; Information governance is about management of all information assets and data governance is about usage of digital data assets. 

The demand for Data Governance is increasing and is driven by increasing data volumes, AI adoption, and strict privacy regulations across various industries. An internet search lists various job roles in data governance that includes Data governance analyst, Data steward, Data governance manager, Data architect, Data specialist, Data coordinator. The industries that hire for data governance includes finance, insurance, healthcare, technology, software, government, manufacturing, professional services, management. Any industry or business that has huge data volumes and is subject to strict compliance/regulations will have opportunities for data governance.

Conclusion

Data is evolving and is in abundance. Data is an asset/wealth a business has and every business tries to derive maximum benefit and revenue from it. At the same time, businesses have to comply with strict regulations in using, storing and distributing the data. Data Governance as a field focuses on setting rules, policies and processes in using the data as a wealth without compromising on security and compliance. Data Governance helps businesses to gain thorough understanding and value of their data. Having this understanding is absolutely critical for AI and big data analytics projects as it helps in crafting effective use cases and in fine-tuning and interpreting model outputs for business decision making and automation. Pursuing a career in Data Governance will be rewarding as more and more businesses plan on AI adoption and in making use of their data assets for growth and expansion. The recent release of America’s AI Action plan2 places emphasis on building AI capabilities to achieve global dominance in Artificial Intelligence. Data Governance will be the first step in such AI implementations by setting a safe ground for implementations. Further Data Governance creates data awareness, enhances data knowledge and sets data roadmap across the organization.

Bibliography

1.https://www.grandviewresearch.com/industry-analysis/data-governance-market-report

2. America’s AI Action plan – https://www.whitehouse.gov/wp-content/uploads/2025/07/Americas-AI-Action-Plan.pdf

Image courtesy:https://www.freepik.com/free-vector/data-compliance-cartoon-banner-template_38345952.htm#fromView=keyword&page=1&position=42&uuid=394a9130-9bba-471a-a86c-1e7edd716936&query=Data+governance

Leave a Reply

Your email address will not be published. Required fields are marked *

I’m Ramaa

Welcome to Foxtail Research, my cozy corner of the internet dedicated to all things research – market, data and insights. I invite you to join me on a journey of understanding markets, their behaviors, my models, and theories with a touch of mathematics and some computer programming. Let’s get geeky!

Let’s connect