Thoughts from "Datatastic"
- johnhauxwell
- Dec 27, 2024
- 4 min read
Updated: Feb 24
Data is a tricky thing. It’s nebulous, ubiquitous and quite often undefined so even though it is everywhere it's intangible and somewhat nebulous even vague. This lack of clarity, in both definition and location, inhibits its accuracy and usefulness for most business tasks.
“Data can be a liability”
Data governance and ethical considerations must be fully considered when tackling the complexities of AI readiness. Well-formed, well-structured, verified ethical data is the cornerstone of data for LLM ingestion. Failing to do this may well lead to errors and “hallucinations”. Preventing any bias or cognitive errors should be a primary consideration for data users, and a point of fixation for data custodians and a sharp point of focus for data councils. Leaving foundational data best practices to anyone else is a very bad idea. Preventing your data from being a liability should be your key data objective and to do this we introduce GoPES, and initially, GoPES Lite.


Lets examine where data comes from. There are several key questions we must ask:
a) Is it reliable?
b) Can we validate its provenance?
c) Can we trust this data?
If the answer to any of these is “Erm….” Then the quality of the source data, and anything it is used in, is in doubt. Rectifying the issues caused by poor data quality is a lengthy and costly experience and one that must be avoided whenever possible.
Let us look at these questions in turn, and think about the inherent issues
a) Reliability If we think back in time (not that far back) the five V’s of data were all the rage… those being Volume Veracity Velocity Variety Value. I believe these are component elements of the reliability equation and each should have a well-defined case and process. Too much volume …Did we miss something? Too little, is this data relevant? Do we know where this came from? What are the data sources? Is it a singular source or is it blended? Does the data set contain any synthetic data?
b) Provenance
Like our food, we want to know what we are ingesting. Would you take the risk with your diet or would you rather be assured that what you're putting in your system is “wholesome”? At its heart, data provenance provides a clear and visual representation of the data source and any changes made. This is essential for any organisation. It offers a continuously updated record, from which the user can then;
Determine the data’s point of origin (Source)
Track data throughout an organisation
Track its transformations throughout the process and maintain a comprehensive record of where the data is accessed during the journey.
Keep records of other vital metadata.
Most importantly this is not just about tracking your data but also the intricate processes and dependencies involved in data flow, no matter where it exists. However, at its most complete, data lineage is a full map of all the direct and indirect relationships between data entities within an organisation. This is the foundation of a modern data stack, providing complete visibility, and, identifying and addressing data gaps. thus ensuring ethical, compliant, and efficient data management.

) Trust.
Trust is a necessary condition of trade/commerce/use of a service where parties enter into it voluntarily. Specifically, trust, within the bounds of commerce, can be defined as a belief that any party has in entering a trade, commerce or service that the stated intentions of the other parties involved in that interaction are true. In this respect, consent is a subcategory of trust. For most aspects of simple commerce, it is often implicit rather than explicit. For example, if I pick up a bottle of water, bring it to the till and pay the retailer the demanded amount, it is usually a signal I have consented to this trade. I don’t need to make my consent explicit.
Whether explicit or implicit, consent is, almost always, a necessary but not sufficient condition for garnering trust. There are other aspects to building trust like identity, creditworthiness, and even higher order factors such as the governance of the infrastructure and systems around those interactions etc.. (I might not walk into a certain shop because it looks like a front for criminal activity for example). But often, consent is the most important factor in garnering trust. If one party does not agree to be part of the trade/ activity then there will be no trust.
Cookies, advertising and data trade have for almost two decades now, have gotten around consent by effectively issuing “contracts of adherence”. “Sign this contract if you want to use our software” within an EULA. Ordinary users are not in a position to refuse. On one hand, they do not have the resources - e.g. time or lawyers - to renegotiate the terms of the contract. And the alternative of simply saying no is being cast into digital oblivion. People are powerless to refuse the terms of the trade. And yet they must take part if they are to maintain their social capital. This is clearly not an equal relationship.
Imagine walking into a shop, asking for a bottle of water, and in return they demanded your purchase history for the last thirty days and tracked you for another 30 days. Who would say yes? Of course, if every shop offered water on these terms eventually you’d give up and say yes. But is that genuine consent? So although technically consent has been given, - we all clicked yes - it is not genuine consent. Users need to be able to say no as readily as they can say yes. There needs to be a balance of power.
Data Quality Management provides this power. In a sense DQM tools have always acted in this capacity. Representing large groups of Data Users in order to leverage the power of their demands as a collective, or as a viable data set requires a lot of trust, as well as strong data governance, privacy ethics, and security (GoPES).
No one is forced to adhere to a Trust based model. However, companies should, through their fiduciary duties, act in the data owners' best interests. Therefore being part of a trusted data environment becomes an act of rich/genuine consent. This becomes the first step in building out a sustainable trust-based economy.





Comments