One of the most salient features of our culture is that there is so much bullshit.” These are the opening words of the short book On Bullshit, written by the philosopher Harry Frankfurt. Fifteen years after the publication of this surprise bestseller, the rapid progress of research on artificial intelligence is forcing us to reconsider our conception of bullshit as a hallmark of human speech, with troubling implications. What do philosophical reflections on bullshit have to do with algorithms? As it turns out, quite a lot.
In May this year the company OpenAI, co-founded by Elon Musk in 2015, introduced a new language model called GPT-3 (for “Generative Pre-trained Transformer 3”). It took the tech world by storm. On the surface, GPT-3 is like a supercharged version of the autocomplete feature on your smartphone; it can generate coherent text based on an initial input. But GPT-3’s text-generating abilities go far beyond anything your phone is capable of. It can disambiguate pronouns, translate, infer, analogize, and even perform some forms of common-sense reasoning and arithmetic. It can generate fake news articles that humans can barely detect above chance. Given a definition, it can use a made-up word in a sentence. It can rewrite a paragraph in the style of a famous author. Yes, it can write creative fiction. Or generate code for a program based on a description of its function. It can even answer queries about general knowledge. The list goes on.
GPT-3 is a marvel of engineering due to its breathtaking scale. It contains 175 billion parameters (the weights in the connections between the “neurons” or units of the network) distributed over 96 layers. It produces embeddings in a vector space with 12,288 dimensions. And it was trained on hundreds of billions of words representing a significant subset of the Internet—including the entirety of English Wikipedia, countless books, and a dizzying number of web pages. Training the final model alone is estimated to have cost around $5 million. By all accounts, GPT-3 is a behemoth. Scaling up the size of its network and training data, without fundamental improvements to the years-old architecture, was sufficient to bootstrap the model into unexpectedly remarkable performance on a range of complex tasks, out of the box. Indeed GPT-3 is capable of “few-shot,” and even, in some cases, “zero-shot,” learning, or learning to perform a new task without being given any example of what success looks like.
Interacting with GPT-3 is a surreal experience. It often feels like one is talking to a human with beliefs and desires. In the 2013 movie Her, the protagonist develops a romantic relationship with a virtual assistant, and is soon disillusioned when he realizes that he was projecting human feelings and motivations onto “her” alien mind. GPT-3 is nowhere near as intelligent as the film’s AI, but it could still find its way into our hearts. Some tech startups like Replika are already working on creating AI companions molded on one’s desired characteristics. There is no doubt that many people would be prone to anthropomorphize even a simple chatbot built with GPT-3. One wonders what consequences this trend might have in a world where social-media interactions with actual humans have already been found to increase social isolation.
At its core, GPT-3 is an artificial bullshit engine—and a surprisingly good one at that.
OpenAI is well aware of some of the risks this language model poses. Instead of releasing the model for everyone to use, it has only granted beta access to a select few—a mix of entrepreneurs, researchers, and public figures in the tech world. One might wonder whether this is the right strategy, especially given the company’s rather opaque criteria in granting access to the model. Perhaps letting everyone rigorously test it would better inform how to handle it. In any case, it is only a matter of time before similar language models are widely available; in fact, it is already possible to leverage open services based on GPT-3 (such as AI Dungeon) to get a sense of what it can do. The range of GPT-3’s capacities is genuinely impressive. It has led many commentators to debate whether it really “understands” natural language, reviving old philosophical questions.1
Gone are the days of “good old-fashioned AI” like ELIZA, developed in the 1960s by Joseph Weizenbaum’s team at the Massachusetts Institute of Technology. ELIZA offered an early glimpse of the future. Using carefully crafted “scripts,” ELIZA could exploit superficial features of language, by latching onto keywords, to produce predetermined answers in written conversations with humans. Despite its rudimentary, programmer-created ruleset, ELIZA was surprisingly effective at fooling some people into thinking that it could actually understand what they were saying—so much so that Weizenbaum felt compelled to write a book cautioning people to not anthropomorphize computer programs. Yet talking with ELIZA long enough could reveal that it was merely parroting human prose. ELIZA couldn’t parse natural language, let alone understand it, beyond simple and repetitive keyword-based tricks.
Computer science has made staggering progress since then, especially in recent years, and the subfield of natural language processing has been at the forefront. Rather than relying on a set of explicit hand-crafted instructions, modern algorithms use artificial networks loosely inspired by the mammalian brain. These learn how to perform tasks by training themselves on a large amount of data. The sole purpose of this process, known as machine learning, is to find the optimal value of a mathematical function roughly representing how good or bad each output of the model—each attempt to complete the task over some part of the data—is. While artificial neural networks performed poorly when they first came onto the stage in the 1950s, the availability of increasing amounts of computational power and training data eventually vindicated their superiority over traditional algorithms.
Recognizing sentences written by humans is no longer a trivial task.
Giving machines speech has, of course, long been considered a significant landmark on the winding path to developing human-level artificial intelligence. Much of the intelligent-seeming things we do, like engaging in complex reasoning and abstract problem-solving, we do using natural language, such as English.
An old idea, the distributional hypothesis, guided the machine-learning revolution in the realm of natural language processing. Words that occur in a similar context, according to this idea, have a similar meaning. This means that, in principle, an algorithm might learn to represent the meaning of words simply from their distributions in a large amount of text. Researchers applied this insight to machine-learning algorithms designed to learn the meaning of words by predicting the probability of a missing word, given its context (the sentence or group of words in which it appears).
In 2013, one such algorithm called “word2vec” was trained on a large corpus of news articles. During training, each word from the corpus was turned into a vector (also called an embedding) in a high-dimensional vector space. Words that occurred in similar contexts ended up having neighboring embeddings in that space. As a result, the distance between two word embeddings (measured by the cosine of the angle between them) intuitively reflected the semantic similarity between the corresponding words. The more related the meanings of two words were, the closer their embeddings should be in the space.
After training, word2vec’s embeddings appeared to capture interesting semantic relationships between words that could be revealed through simple arithmetic operations on the vectors. For example, the embedding for “king” minus the embedding for “man” plus the embedding for “woman” was closest to the embedding for … “queen.” (Intuitively, “king” is to “man” as “queen” is to “woman.”)
GPT-3 is significantly more complex than word2vec. It is based on an artificial neural network architecture called “Transformer,” introduced in 2017. Neural networks based on this architecture can be “pre-trained” on an enormous amount of text to learn general properties of natural language. Then they can simply be “fine-tuned” on a smaller corpus to improve performance on a specific task—for example, classifying news articles by topic, summarizing paragraphs, or predicting the sentences that follow a given input. While GPT-3 does not revolutionize the Transformer architecture, it is so large, and was trained on so much data, that it can achieve performance near or above previous fine-tuned models, without any fine-tuning.
Weizenbaum’s old worries about people anthropomorphizing ELIZA are all the more pressing when it comes to GPT-3’s vastly superior abilities. But does GPT-3 understand what it says? The answer largely depends on how much we build into the notion of understanding.
GPT-3 seems to capture an impressive amount of latent knowledge about the world, knowledge that is implicitly encoded in statistical patterns in the distribution of words across its gargantuan training corpus. Nonetheless, there are good reasons to doubt that GPT-3 represents the meaning of the words it uses in a way that is functionally similar to humans’ word representations. At the very least, children learn language through a rather different process, mapping words to concepts that embed knowledge acquired not only through reading text, but also crucially through perceiving and exploring the world.
Consider how you learned what the word “dog” means. You presumably did not learn it merely by reading or hearing about dogs, let alone remembering the statistical distribution of the word “dog” in sentences you read or heard, but by seeing a real dog or a picture of one, and being told what it is. Your lexical concept dog does not merely encode the similarity between the meaning of the word “dog” and that of other words like “cat.” It embeds structured knowledge about dogs partially grounded in perceptual experience, including the knowledge that dogs have four legs, eat meat, and bark—all things you probably observed.
GPT-3’s word embeddings are not perceptually grounded in the world, which explains why it often struggles to consistently answer common-sense questions about visual and physical features of familiar objects. It also lacks the kind of intentions, goals, beliefs, and desires that drive language use in humans. Its utterances have no “purpose.” It does not “think” before speaking, insofar as this involves entertaining an idea and matching words to the components of a proposition that expresses it. Yet its intricate and hierarchically-structured internal representations allow it to compose sentences in a way that often feels natural, and display sophisticated modeling of the relationships between words over whole paragraphs.
If the family of GPT language models had a motto, it could be “Fake it till you make it.” GPT-3 is certainly good at faking the semantic competence of humans, and it might not be an exaggeration to say that it has acquired its own form of semantic competence in the process.
In the first season of the TV show Westworld, the human protagonist visits a dystopian amusement park populated by hyper-realistic androids. Greeted by a human-like android host, he asks her, incredulous, whether she is real. She replies in a mysterious voice: “If you can’t tell, does it matter?” Whether or not GPT-3 understands and uses language like we do, the mere fact that it is often good enough to fool us has fascinating—and potentially troubling—implications.
This is where Frankfurt’s notion of bullshit is helpful. According to Frankfurt, bullshit is speech intended to persuade without regard for truth. In that sense, there is an important difference between a liar and a bullshitter: The liar does care about the truth insofar as they want to hide it, whereas the bullshitter only cares about persuading their listener. Importantly, this does not entail that bullshitters never tell the truth; in fact, good bullshitters seamlessly weave accurate and inaccurate information together. For this very reason, as Frankfurt puts it, “Bullshit is a greater enemy of truth than lies are.”
GPT-3 can generate fake news articles that humans can barely detect above chance.
At its core, GPT-3 is an artificial bullshit engine—and a surprisingly good one at that. Of course, the model has no intention to deceive or convince. But like a human bullshitter, it also has no intrinsic concern for truth or falsity. While part of GPT-3’s training data (Wikipedia in particular) contains mostly accurate information, and while it is possible to nudge the model toward factual accuracy with the right prompts, it is definitely no oracle. Without independent fact-checking, there is no guarantee that what GPT-3 says, even if it “sounds right,” is actually true. This is why GPT-3 shines when writing creative fiction, where factual accuracy is less of a concern. But GPT-3’s outputs are distinct enough from human concerns and motivations in language production, while being superficially close enough to human speech, that they can have potentially detrimental effects on a large scale.
First, the mass deployment of language models like GPT-3 has the potential to flood the Internet, including online interactions on social media, with noise. This goes beyond obvious worries about the malicious use of such models for propaganda. Imagine a world in which any comment on Twitter or Reddit, or any news article shared on Facebook, has a non-trivial probability of being entirely written by an algorithm that has no intrinsic concern for human values.
That scenario is no longer science fiction. Just a few weeks ago, a self-help blog post written by GPT-3 reached the top of Hacker News, a popular news aggregation website.2 Hardly anyone suspected the hoax. We have to come to terms with the fact that recognizing sentences written by humans is no longer a trivial task. As a pernicious side-effect, online interactions between real humans might be degraded by the lingering threat of artificial bullshit. Instead of actually acknowledging other people’s intentions, goals, sensibilities, and arguments in conversation, one might simply resort to a reductio ad machinam, accusing one’s interlocutor of being a computer. As such, artificial bullshit has the potential to undermine free human speech online.
GPT-3 also raises concerns about the future of essay writing in the education system. For example, I was able to use an online service based on GPT-3 to produce an impressive philosophical essay about GPT-3 itself with minimal effort (involving some cherry-picking over several trials). As several of my colleagues commented, the result is good enough that it could pass for an essay written by a first-year undergraduate, and even get a pretty decent grade. The Guardian recently published an op-ed on artificial intelligence produced by stitching together paragraphs from several outputs generated by GPT-3. As they note, “Editing GPT-3’s op-ed was no different to editing a human op-ed”—and overall, the result is coherent, relevant and well-written. Soon enough, language models might be to essays as calculators are to arithmetic: They could be used to cheat on homework assignments, unless those are designed in such a way that artificial bullshit is unhelpful. But it is not immediately obvious how one could guarantee that.
To conclude this article, I prompted GPT-3 to complete the first sentence of Frankurt’s essay. Here is one of the several outputs it came up with: “Bullshitting is not always wrong, though sometimes it can be harmful. But even when it is harmless, it still has some serious consequences. One of those consequences is that it prevents people from being able to distinguish between what’s real and what isn’t.” That’s more bullshit, of course; but it fittingly rings true.
Raphaël Millière is a Presidential Scholar in Society and Neuroscience in the Center for Science and Society at Columbia University, where he conducts research on the philosophy of cognitive science. Follow him on Twitter @raphamilliere.
1. According to philosopher John Searle’s “Chinese room argument,” no computer could ever understand a language by running a program. This is because such a computer would be analogous to a human operator in a room following a set of English instructions for manipulating Chinese symbols on the basis of their syntax alone, taking Chinese characters as input and producing other Chinese characters as output, without understanding Chinese in the process. Searle’s argument was originally aimed at old-fashioned symbolic algorithms like ELIZA. It could be adapted to modern language models (but the thought experiment would be all the more difficult to conceive).
In any case, many philosophers rejected Searle’s conclusion for various reasons, including the suggestion that the human operator in the room is merely analogous to a specific component of the computer (the central processing unit, or CPU), and that a complete natural-language processing system—including not only the CPU but also the instructions it follows, and the memory containing intermediate states of its computations—could genuinely understand Chinese. Nonetheless, those who reject the conclusion of Searle’s argument still have room to disagree on which system would qualify as understanding natural language, and whether a computer specifically running GPT-3 would make the grade.
2. Porr, L. Feeling unproductive? Maybe you should stop overthinking. https://adolos.substack.com/ (2020).
Lead image: studiostoks / Shutterstock