Since my previous post on AI skepticism, I’ve been keeping my eyes peeled for more posts and articles in that category. Assuming the peeledness of my eyes has remained constant, it seems that the volume has roughly halved.
The reason I’m writing this now is that I saw a very good video the other day by eisfrosch, titled: “AI Singularity Is A Smokescreen” (transcript). The main point of the video is not skepticism of AI in particular, but more about that the goal implied by the hype, the so-called “Artificial General Intelligence”, is not a useful one when discussing AI. As a side-effect, it is a nice round-up of articles and research that nuance the benefits of AI often touted by hype. This shows us that while AI will definitely leave a lasting mark on the world, it will not be the panacea the large companies are promising: the negative influences of AI need to be investigated, monitored, and kept in check.
The video mentions two interesting articles that I also want to highlight.
In the preprint titled “Hallucination is Inevitable: An Innate Limitation of Large Language Models”, Xu et al. argue that any LLM will always have some degree of hallucinations. On an intuitive level, this does not surprise me: LLM hallucinations can be seen as a way for LLMs to be creative and have new ideas. Similar to human ideas and creativity, these insights are just not always realistic or applicable.
A counterpoint to this conclusion is made in another preprint, where Suzuki et al. argue that, yes, there will always be hallucinations, but the chance for hallucinations to occur can be made arbitrarily small if you can also improve the algorithm and training data by a proportional amount. While this might seem to contradict the result by Xu et al., it actually does not, and the results can co-exist. This makes sense if you view these two works through the lens of the original breakthrough of deep learning, which was that you can improve the performance of a deep neural net arbitrarily, if you don’t mind increasing the model size and increasing the volume of training data.
Summarizing, if the following are true:
Then it makes sense that,
Once you accept that, the next question becomes: how much money do you have available to reach the quality you need? Which perfectly reflects the current state of VC investments in the AI industry.
In his blog post titled “I’d Rather Read the Prompt”, Clayton argues that he has never seen an instance of AI-generated media that had more value than the prompt used to generate it. I don’t share this sympathy as strongly, a picture says more than a thousand words, after all. However, it is an idea that I have mentioned to colleagues in the past, and that people before me probably have had as well: if we allow students to use AI in their assignments, we might as well just grade their prompts.
I admit that pratically speaking, this will be difficult, especially for easy courses. In addition, probably any prompt will include most of the course material. Still, I think, as a thought experiment, there is an interesting middle ground. Imagine a student having to submit a prompt, which can consist only of text. The grading criteria is that, when this prompt given to a determininistic AI, it produces a report that satisifies some rubric. I think most good prompts would probably be good summaries of the key points of the course, and what the interesting and complicated parts are. The only caveat is that students might start submitting prompts like, “the report has to get a top grade given the rubric for the rubric of this course of this university”, which could work if the rubric was part of the training data. So for the thought experiment to work you’d need an LLM that is deterministic, available to all students, smart enough to write reports, but not knowledgeable about course organization details (e.g. rubrics). I think of those conditions, only the determinism aspect I’m not sure how to enforce. Maybe have the students choose a seed, as well?
There’s another point Clayton makes, which has been stuck in my head for the past few days. He writes there are two categories of useless work: the truly useless kind, and the actually useful kind. Paraphrasing from the post, an example of the useless kind is the AI-generated summary in a comment on a Reddit link submission. If the summary captures the message of the linked page, the page probably was not that interesting to begin with. However, it’s more likely that the summary actually does not capture the message of the linked page, in which case the page is valuable enough to warrant the attention of a real human, and the summary hence has no added value. You might expect such a summary to be helpful in deciding if you should follow a link or not. But then again, if the AI is hallucinating part of the summary, how useful is it?
The other kind of useless work is the work that is actually useful. Paraphrasing from the post again, a good example is that of students writing essays or reports for a course. The document will be thrown away after the course, or perhaps archived, probably never to be looked at ever again. Yet this is okay, because the true benefit is the brain of the student (hopefully) having grown from the experience. If students outsource the effort to an LLM, and only inspect the result, they will never, or at best very slowly, learn how to write properly themselves.
If after that you still want more articles to read, here’s the rest of this period’s list. If anything, have a look at the last two items on the list.
Generated with BYOB.
License: CC-BY-SA.
This page is designed to last.