Have I hardened against LLMs?
The other day a reader of _The Intelligence Illusion_ sent me a short email that outlined their takeaway from the book and ended it with a simple question.
Slightly paraphrased:
> Would it be correct to say that your views on LLM’s/Transformers have hardened since you wrote your book?
My answer is below.
* * *
That’s a good question.
My views on the technology itself are roughly the same as when I published the first edition of the book. The downside pretty comprehensively outweighs the upside and, to echo your own summary, the technology is only narrowly useful for a very specific set of use cases, and even then you need to take care.
That’s still my position. What’s hardened are my views on the tech industry, software, management, and influential members of the software developer ecosystem.
This will make more sense if I explain to you what the past few years have been like from my perspective, starting with the time I first began to research this new wave of generative models.
I’ve been somewhat interested in “AI” since my career began. I got my start in multimedia around 2000 which, along with that sector being somewhat adjacent to games development, meant I’ve been keeping an eye on “AI” and procedural media generation since the early 2000s, albeit always from an interactivity and media perspective. I’ve long since lost most of the books on the topic I had back then – except for a copy of Norvig’s _Paradigms of Artificial Intelligence Programming_ because it used Common LISP, which always seemed fun to keep around – but it meant that it’s usually been fairly straightforward for me to dip in once in a while and catch up on what the field has been up to.
I’ve generally made sure to be in the position where if I had to use current tech for something, I’d know enough to be dangerous.
So I wasn’t coming at generative models entirely unfamiliar with the field.
What I discovered during my research appalled me. This was a piece of technology that obviously and seemingly deliberately played into and supported some of the worst elements of the human psyche:
* Deceptive design – playing into anthropomorphism and confirmation biases.
* Political extremism – that is, an all-out assault on labour – baked into the product at the foundation.
* An outright attack on education. Instead of trying to help schools, colleges, and universities navigate issues introduced by the technology, every vendor seemed (and seems) to be intent on making it _impossible_ to manage to the point where it’s now outright threatening our education systems as a whole.
* So much Child Sexual Assault Material (CSAM). Way more than anybody could reasonably expect. It’s all over the training datasets. It keeps happening in the output. In at least a couple of cases that seems to be the point. The vendor seems to _want_ the model to be able to generate these materials.
* Nondeterministic behaviour, making the tech unusable from a modern management perspective.
* Insecure on every level.
* Grossly mediocre output.
* Incredibly poor quality overall once you account for security, accuracy, and fabrications.
* Vendors persistently and deliberately ignoring the law, leading to numerous lawsuits, some of which might have liability implications for many end users. (See Grok. Even if you have the sociopathic stomach to look past the ethical and moral implications, the prevalence of CSAM on many of these platforms exposes anybody who uses them to liability.)
And more. So much more.
I lay much of this out in the book. Some of it got a chapter. Some of it only got a paragraph. But the book overall lays out the risks of the tech from the perspective of modern management and software design and, towards the end, I describe ChatGPT as “the opposite of good software”. As in, it’s not just bad software, it’s as if they wrote up an inventory of what makes software good and then decided for each and every entry in the list to implement the exact opposite in their app and service design.
That already isn’t a ‘soft’ view on the technology by any measure.
But, as I wrote the book, I always tried to adopt as neutral a tone as I could. CSAM is obviously bad so I shouldn’t have to tell people that it is very bad. Frequent fabrications in knowledge work, research, and education is very bad so pointing out that it’s happening should be enough, I shouldn’t have to hammer home _why_ any of it is bad. Nor should I have to adopt a vulgar tone to make it obvious to the reader that this is all pretty thoroughly bad.
Many of those who read the book and saw the inventory of technical flaws and issues came to roughly the same conclusion you did. In short, roughly paraphrasing you (if you don’t mind), they decided that:
> This tech is only useful for a couple of very specific use cases that I care about and even then only if I don’t mind the inevitable faults.
Add to that the caveat that this only applies if _the current price point is maintained_ which is not going to be true in the long term.
This is a rational conclusion to arrive at after reading an inventory of harms that includes, among other things, hard-to-detect fabrications and massive software insecurity. This is what I had hoped for when I wrote the book. I don’t expect everybody to be in a position where they can unilaterally reject the use of the tech – many people can’t risk their livelihoods and I’m not one to judge people if that’s their only option for putting food on the table – but I had hoped that people would come away from the book and the essays in my newsletter with about that level of understanding of the implications.
Personally, my personal conclusion was that the only usable tool to come out of this all are the speech recognition and transcription models. They aren’t great, you need to edit the output a lot to make it usable, but they reduce the work of transcribing audio by a substantial margin _as long as you don’t use OpenAI’s Whisper_. **OpenAI’s model fabricates in its transcripts.** To this day, it still regularly makes shit up in its transcripts. That it’s being adopted in healthcare around the world should terrify you. That it’s being sold into these sensitive industries _by_ OpenAI even though they seem well-aware of these flaws should make you question the integrity of the people running that company.
So, transcription models: save money, pretty much the only useful thing (IMO) to come out of this, as long as you don’t use OpenAI’s models. There are quite a few alternatives to their models.
Your overall take on the book is, roughly, what I had hoped for from a reader when I wrote it. I’m very grateful to hear that.
What I hadn’t expected was the reaction of the tech industry, managers, and most journalists – the people driving online discourse in the field – that read my book or my newsletter essays.
It’s as if I outlined the risks of using lead paint in consumer products. In the outline would be a list – written in neutral language to emphasise that this is institutionally and economically serious writing and not punditry “serious” writing – and it would be almost entirely cons.
Some of the “cons” would include, for example, lead paint literally making people so sick they die and that it’s children who are most at risk.
But one of the very few “pros” in this hypothetical list would be a short note, included to show that I’d done my homework, saying that using lead paint might make production a few percentage points cheaper, but that this claim came directly from vendors and there would be good reason to be sceptical.
Then imagine that most of the reactions to the “the risks of lead paint” piece went: “Five per cent cheaper, you say? Interesting. I need to look at using lead paint in our products.”
Imagine that much of the subsequent discourse then showed a complete disregard of the harms, the cost in terms of human misery, and instead use the piece as an argument for increasing the use of lead paint just “more safely because now we’re aware of the issues and the hazards”.
Imagine what that would feel like as a writer of that piece?
The more I wrote about generative models, the more appalled I became at the response from the industry, to both my writing and that of others actively highlighting the risks. Few people who have any influence in tech and software seem to care about the harms, the political manipulation, the outright sabotage of education, the association with extremism, or the _literal_ child abuse.
They _say_ they care, but then continue to support and promote the CSAM machine, the platform that’s insecure by design, the software that’s so psychologically manipulative it’s driven people to suicide, and the generative output that is unsafe and filled with fabrications at every level.
They say “oh, no” even as they keep pressing the “do horrible things with a machine made by horrible people” button again and again, just because they think it’ll boost their productivity by 5-10%.
Every time I lay out the harms in straightforward and neutral language, the response from most in the industry – management especially – has been to ignore the harms and focus either on the hypothetical unproven benefits advertised by “AI” vendors or the incremental subjective benefits they _think_ they’re getting and would be minor even if they were true. When I explain in unambiguous terms what those harms _mean_ , I get labelled an extremist with hardline views.
Tech companies have done everything they can to maximise the potential harms of generative models because in doing so they think they’re maximising their own personal benefit. More use equals more profit. But it also equals more harm.
When I point this out, I get dismissed as a crank. I’m being “unreasonable”.
I am so utterly disappointed in my peers, especially those in web development which is a field that has gone for LLMs in a big way.
So my views on LLMs or Transformers haven’t hardened. They’re roughly the same as they were four years ago. The tech is what it is and while the exact details vary from version to version, its fundamental issues remain roughly the same as they were four years ago.
But my views on the tech industry and my peers in the industry have changed. They’ve changed dramatically.
I never had high expectations of this industry, but it still managed to disappoint me.