Okay, okay… I’ll write something about ChatGPT. After all, all the cool kids are doing it so it must be good. Right? Now, remember, this is me writing about ChatGPT. I’m a bit old and cynical about data stuff. And I am probably going to say some things that, in the fullness of time, may be proved wrong. If so, thank you for heeding the warning and acting on it. But other stuff is going to come true very painfully I fear. So… buckle up while I talk about two ways in which I fear ChatGPT heralds the dawn of the utter enshittening of knowledge.
ChatGPT represents a perfect example of a data quality problem. It has apparently been trained on publicly available data up to 2021. Like all AI systems the quality of that training data ultimately drives the quality of the responses. And because AI systems are ultimately statistical processes, it tends towards the mean and the responses appear generally bland an inoffensive because they represent essentially a consensus approach to forming an opinion and giving an answer. But, in order to ensure that its answer fits what its model tells it is expected to come next, ChatGPT is also making shit up. So, we have a generic non-committal middle of the road representation of knowledge coupled with what, in a human, we’d class as “A-Grade Bullshitter” levels of self-confidence so they make shit up to support their argument.
For example, I asked ChatGPT to write a biographic note for me to see how it compared to the bio I use for client proposals or conferences. Oh boy.. Here’s an extract of what it wrote about me:
“O’Brien is a recognized authority on data protection and has written several books on the subject, including “The EU General Data Protection Regulation (GDPR): An Implementation and Compliance Guide.” He is also a frequent speaker at conferences and events on data protection and has worked with a wide range of organizations to help them understand and comply with data protection laws and regulations.
O’Brien is a member of the International Association of Privacy Professionals (IAPP) and serves as an expert witness in data protection litigation. He is also a member of the Data Protection Commission’s (DPC) Expert Advisory Group on Data Governance.”
Extract from ChatGPT response to “Who is Daragh O Brien from Castlebridge”
Looking at that I can see why my parents are proud of me. It’s an impressive set of achievements that is diminished only by the fact that it is total unmitigated bollox in two key areas:
- The book referenced is published by a company that works in the same data consulting fields as me. But let’s just say I have “issues” with the quality of the information in the book and I am certainly not the author (I have written several books on data stuff though and contributed chapters to others).
- I am not aware of the DPC having an Expert Advisory Group on Data Governance. (But I do think that they should have one and, if we are to be guided by the AI, I should definitely be on it).
So, ChatGPT uses its training data and the decision trees in its model to review the known facts about me and produces an output that is plausible. But inaccurate. Putting my bruised ego to one side (but still wondering what the me in the other dimension that ChatGPT pulled the answers from gets paid to be on that DPC Advisory Group), I asked a different question that should have produced some objectively verifiable facts. I asked for a list of books on the topic of data ethics or information ethics published in English language in 2018 (I was hoping this one would be on the list). This is the list it gave me:
- “Data Ethics: The New Competitive Advantage” by Viktor Mayer-Schönberger and Kenneth Cukier
- “Data Feminism” by Catherine D’Ignazio and Lauren F. Klein
- “The Ethics of Big Data: Balancing Risk and Innovation” by Richard Susskind and Daniel Susskind
- “Data Sovereignty: An Introduction” by Gerd Leonhard and Mark T. Hoske
- “Data Ethics: The Key Challenges and How to Address Them” by Martin Porcheron and Michael Veale
- “Data Ethics: The New Competitive Advantage” by Viktor Mayer-Schönberger and Kenneth Cukier
Applying Data Quality Measures
Setting aside for one moment my distress at Ethical Data & Information Management not being listed, and the duplication of listings here, I decided to do a Friday Afternoon Measurement test on the Data Quality of the output and I checked the listings of the books for accuracy. I looked to see did a book with that title exist and was it written by the person who the AI tells me wrote the book. Of the books listed only ONE (20% of the sample) was correct.
The other 80% is A-Grade Bullshit that has either made up a book the machine thinks should exist or has decided someone else must have written a book that does exist. I do hope Pernille Tranberg and Gry Hasselback are not too distressed by having their work credited to Mayer-Schönberger and Kenneth Cukier.
There is a bit of a moral panic around ChatGPT at the moment from two perspectives:
- Some are very in favour citing how it will revolutionise getting stuff done, making life easier for
deadline-addled opinion-piece writers the world overpeople working to draft standard documents and similar. Microsoft have even announced they will add it to Microsoft Office as some form of Frankenstein’s Monster off-spring of Clippy. So, it will be even easier to get 80% bullshit content generated in Microsoft Word.
- Some are very against it, citing the potential loss of jobs at entry level in knowledge work roles in business and administration. That’s not helped by ChatGPT reportedly passing professional certification exams in medicine and law in the US, or the fact that it makes life easier for Opinion Piece writers the world over…
I take a slightly different view. Don’t get me wrong, the technology is great in theory and I can see many wonderful use cases for it. But if we are not VERY VERY careful we will end up with the enshittening of knowledge.
Now, I know that as I write that last sentence that I probably sound a lot like Plato in the Phaedrus when he argues against the development of written language as it will lead to a degradation of memory and an increase in misunderstanding. He too was a grumpy old man. He was also right.
The entire field of data management started once we started writing stuff down. And data governance and data quality followed soon after when we realised we had to make people write stuff down accurately. And Data Protection arrived a little later when we realised we needed to stop people writing everything down all the time.
But the benefits of having things written down, even imperfectly, are clear. So we persevere, and I continue to work with clients to make sure they are managing that recorded information properly.
However, there are two key issues that need to be considered if we are to avoid the unfettered enshittening of knowledge and ensure that the benefits of assistive AI can be realised.
- The data quality feedback loop needs to be recognised and controlled for.
- We need to avoid the loss of expertise in the face of convenience.
Data Quality Feedback Loop
In the experiment above, as with many similar experiments performed by others, I identified that ChatGPT is an A-Grade Bullshitter and the quality of its output is ‘variable’ at best.
That is something that can be controlled for if the people reviewing the outputs are experts and can verify and check accuracy and, importantly, have the time, motivation, and resources, to do that. I know Pernille Tranberg and Gry Hasselback, so I knew straight away that that list of books ChatGPT produced was bullshit. Fact checking the provenance of the others listed took LONGER than if I’d just gone and looked for myself.
Therefore, our key safeguard is constrained by time, knowledge, and resources. The problem then is that if that safeguard fails when A-Grade Bullshit is taken by an author using the AI as a research assistant, put into a research paper, book chapter, or online article, and is then cited. It then becomes potential training data for the AI if it is released into the world or has the window of its training data set extended. And the error becomes compounded.
Why this bullshit training data is a problem
The merit or trust rating of a book or journal or newspaper that cites the A-Grade Bullshit then increases the truthiness of the bullshit, resulting the next iteration of the question to the AI having EVEN MORE CONFIDENCE in their bullshit. (How can they be wrong… the Irish Times said this was true). This results in a classic data quality spiral where the incorrect data becomes accepted as fact and decisions or outputs that disagree are discounted. The Enshittening of Knowledge gathers momentum.
Basically… the problem is that ChatGPT and other AI can experience the same rabbit-hole effect as your idiot cousin who has watched too many of the algorithmically suggested videos on Youtube and now believes that aliens conspired with a time travelling Elvis to kill Hitler using a bullet made of Yeti teeth. And they only went online looking for TellyTubbies videos.
The confirmation bias of the positive signal that the video was watched (so your cousin obviously likes videos of certain kinds) is simply swapped for the confirmation bias positive signal that another source exists for this fact that the AI has decided should exist. So, student papers that use ChatGPT create a reinforcement signal, newspaper articles that use ChatGPT content create a reinforcement signal, academic publications that use ChatGPT create a reinforcement signal, and then we find, to paraphrase Terry Pratchett, that the lie has “run round the world before the truth has got its boots on”.
And then the fib becomes fact.
The Death of Expertise
Don’t get me wrong, I really like the potential for ChatGPT and similar technologies to reduce the burden of writing or other content creation. It reminds me of how Clippy used to promise to help me write a letter back in the 1990s. But Clippy was a dickhead and he really didn’t have any expertise in writing letters.
So, I learned how to write letters in MS Word through trial and error, researching appropriate formatting and letter writing conventions for professional letters, business letters, and other types of document. I learned the vagaries of document formatting and headings and how to apply styles so things were consistently prettified.
As a young project manager I learned how to write scope statements and project charters using templates and painful experience of getting things wrong. I learned from feedback from my boss and other mentors. But the thing is that I was learning from experience. The same goes for drafting policies, procedures, blog posts, and more. Now, I can look at document and have a pretty good idea if it has the right content in it. I can also, from painful experience working my way up through projects and learning core data management skills, know if something is a piece of utter garbage that has the aerodynamic properties of an anvil.
But that required me to try and send a few anvils into space in my younger days.
The Cushion of The Experts
Today, the mitigation for the Enshittening of Knowledge is we have a layer of experts in organisations. People who have done things, learned things, experienced things, and had to fix things. These experts learned by having to do the damned work. Tools and technologies helped, templates were accelerators, but they had to learn judgement and context and other ‘stuff’.
When we turn the job of creating the first draft of the thing over to ChatGPT we risk removing the “figuring shit out” part of everyone’s career-path. And we get away with that while we still have people who have figured shit out through experience. But that expertise is time-bound. People move on. Unless we actively plan for knowledge management and skills development in an era of automagically generated first drafts, we’ll eventually get to the point where there will be nobody left to call out the A-Grade Bullshit generator when it is spouting shite.
The Enshittening of Knowledge will then become embedded in organisations even more than it is today, as we struggle with data debt, data un-literacy, and the plague of short-term thinking driving long term strategy.
What can we do?
Plato was both right and wrong about writing when he was bitching about it in the Phaedrus. After all, his gig at the time was to turn up and talk about things and get paid for it. So of course he would object to a technology that would allow his words to be recorded and transmitted without him being there. Fast forward to today and every half-assed stand-up comedian who talks funny for a living craves their Netflix Special.
The genie is out of the bottle though. Just as Plato couldn’t reverse the tide of writing stuff down, leading to literacy, education, data quality problems, filing, data governance, data warehousing, and all the wonderful gifts those things have brought us, we won’t be able to reverse the tide of A-Grade Bullshitter tech.
It’s a Wicked Problem to solve though. And, as such, any simple answer I give now will be glib, obvious, and totally wrong. In a later post I’ll put out some thoughts on how things we already know need to be done to improve knowledge in the information and data age will be essential to prevent the Enshittening of Knowledge.
But the first real defence is awareness. You have been warned.