Has ChatGPT refuted Noam Chomsky?

Following the shock release of the latest version of the ChatGPT text generator, cognitive psychologist Steven Piantadosi has been quick to claim that this large language model is a theory of language that disproves Noam Chomsky’s hypotheses about the nature of human language and mind. I show that such a claim is unfounded because, to make any sense, ChatGPT would have to be a model of how the human brain acquires, produces, and understands language, but it is no such thing at all.

I don’t mean to detract from this language generator, which is certainly a sophisticated and potentially very useful engineering tool (as well as, as has also been pointed out, potentially dangerous). It looks like, for both good and bad, ChatGPT is going to be a tipping point in natural language processing technology, but I think it won’t be the case when it comes to our understanding of human language and mind.

Computational linguist Emily Bender has popularized the name of stochastic parrots for large language models such as ChatGPT because the expression precisely combines the two essential features of their operation: they do not understand what they say, and they are essentially based on calculating the probability of appearance of a word after another. And indeed, as Stephen Wolfram has explained in detail, what ChatGPT actually does is calculate the probability of a word appearing based on the words that the system has already used.

In fact, unlike what happens with human language, ChatGPT does not work with real words (linguistic entities with form -signifier- and meaning -signified-), but with tokens, that is, sequences of very frequent combinations of characters (a token is equivalent to about “three quarters” of a graphic word in English). The tokens handled by ChatGPT are, therefore, the equivalent in written language of the syllables of spoken language: recurrent sequences of basic units with no correlation with meaning.

Technical details aside (which can be enjoyed in Wolfram’s essay), what ChatGPT knows how to do is to create (seemingly) coherent text from the user’s prompt. To guess how to continue a piece of text, the program uses a model (a huge mathematical function) to estimate the probabilities of certain token sequences appearing after certain token sequences. In reality, ChatGPT is a mathematical model of the distribution of tokens in the huge corpus with which it has been trained. Its core is a enormous neural network that has been trained on a text corpus of a size of about 500 billion words.

ChatGPT operates with a repertoire of about 50,000 tokens (a magnitude comparable to the lexicon of a language like English), and each token is characterized by a number vector. When interacting with the user, ChatGPT takes the prompt given to it and converts it into number vectors that it processes. Since there are about 50,000 units in the dictionary, what it gets is about 50,000 magnitudes corresponding to the probabilities of the possible sequences. In a very ingenious way, ChatGPT does not always use the most probable word, but chooses, from time to time, one of lower probability, which makes the texts less “flat” and more original, providing the “more human” experience of the conversation with ChatGPT as compared to previous versions.

GPT stands for Generative Pre-trained Transformer and the last part is the newest technology. The transformer is the neural network architecture resource that allows the prediction of the next words to take into account the ones it has been processing before. Note that ChatGPT does not know what it is talking about, nor does it have any communicative intent, so the trick for it to maintain some coherence is to, as it were, “make it go back” and recalculate the probabilities of the next graphic word.

It is curious that it is being said that pre-trained generative transformers refute Noam Chomsky’s generative-transformational grammar. Terminological ironies aside, an essential conclusion is that ChatGPT does not operate with linguistic units, but with groups of characters. Therefore, it has no way of linking sequences of letters with meanings of any kind. The only thing ChatGPT “knows” is how often certain groups of characters appear next to other groups of characters, and it relies on this to generate “plausible” strings of characters. Any meaning its responses may have is contributed by the users reading them (and which, inevitably, they tend to attribute to their silicon interlocutor).

Linguists and psychologists who claim that ChatGPT is a model of human language seem to overlook that all the enormous training and computational effort that ChatGPT involves (including a remarkable carbon footprint) is designed to be able to generate coherent text without understanding a word of what is being said!

In order to claim that ChatGPT is a model of human language, one would have to believe that humans speak by constructing sequences of words based solely on the probability of their co-occurrence and without regard to their meaning. Thus, the reasoning would run as follows: since ChatGPT actually produces grammatical sentences, and since it only knows about probabilities of co-occurrence of words, then the grammar of languages is a matter of probability of occurrence of some words next to others.

That is, one would have to believe that human beings are also stochastic parrots. Of course, there may be a level at which we are (ultimately, brains are made up of individual cells incapable of knowing anything), but in reality any model of human language would be drastically incomplete if it did not include the human ability to create complex meanings (thoughts) from simpler meanings (and to intend to communicate those thoughts).

Although ChatGPT and its relatives are called language models, in reality they are not, nor do they claim to be, models of human language understood as a cognitive capacity, but (mathematical) models of the probability of occurrence of groups of characters together with other groups of characters. Perhaps what actually underlies the erroneous conclusion that they are models of language is the inability of many authors to distinguish between language as a capacity (the internalist view) and language as a product (the externalist view).

From the internalist point of view, any language is a knowledge system that exists in the brains of the people who speak it. Chomsky called this the I-language (i for internal and intensional). An I-language consists, minimally, of a repertoire of concepts or meanings associated with strings of phonemes (words in the strict sense) and a set of generative mechanisms for constructing sentences (= thoughts) with them. E-language (e for external and extensional) would be the sum of all sentences produced (orally or in writing) by the speakers of a given language.

Internalist linguists believe that the object of study of linguistics as a cognitive science is the I-language (the system of knowledge in people’s brains), while E-language is an incoherent and unmanageable object. Where a language really exists is in the brains of its speakers, not outside them, in grammars, texts or dictionaries. For their part, externalist linguists believe that what really exists is the E-language, a kind of cultural institution that is transmitted from generation to generation, while the I-language would be an imperfect copy of the E-language in people’s brains. Regardless now of who is right in this “chicken and egg” affair, what seems clear is that ChatGPT is a model of the E-language, and not of the I-language.

ChatGPT is very efficient at creating E-language fragments from other (written) E-language fragments, but it is not a model of human language, because what humans do is to use I-language to create the E-language fragments that ChatGPT has been trained with.

We know that human beings can create and understand hundreds of thousands of different sentences that they have never heard before, and we know that (unlike ChatGPT) they have not had the time or opportunity to learn them. According to calculations mentioned by Steven Pinker, even if a sentence of any human language were limited to a maximum length of twenty words (the one you are reading now already includes 24 and will end up having 57), the number of sentences a person could understand and produce would be about one hundred trillion (1020), which, assuming five seconds to learn each one, would require a childhood of one hundred billion years. Given the average human life expectancy, and considering the much smaller number of neurons in our brains, to explain this capacity we must assume that people have a recursive generative mechanism that would allow us to create those hundred trillion different sentences, if we had the time to do so.

The relevant question now is whether ChatGPT has that capacity. There is no evidence that it does. The generative mechanism of human language produces sentences that are not linear sequences of words (although they appear so in speech or writing), but hierarchical groupings of words into constituents that determine their meaning. The phrase Deep blue sea is ambiguous because the linear sequence of words can reflect both the hierarchical structure in which deep modifies blue (and the sea might not be deep) and the structure in which the adjective deep modifies blue sea (in which case the sea might be light blue).

Syntactic structure is inaudible and invisible, but it is crucial for interpreting the meaning of the sequences of words that come to us and for constructing the complex thoughts and propositions that we humans create and (sometimes) communicate. There is no way to explain these facts by analyzing syntax as a linear sequence based on the probabilistic stringing of words, which is what ChatGPT does. Chomsky’s famous example Colorless green ideas sleep furiously, semantically absurd but syntactically correct, was not intended to show that syntax has no relation to meaning, as it is usually said, but precisely to argue, already in 1957, that human language is not a system of stringing the most expected word with respect to the previous ones, but includes formal mechanisms of combination independent of the meaning of each word. It is this design feature that explains the ability of humans to create and communicate new ideas and entertain thoughts free from the control of external stimuli or context. Such a syntax is necessary to create different meanings (that there are seas that are dark blue, and seas that are blue and also deep), but not to string words from left to right (or in any other direction). The reason engineers haven’t installed a generator of constituent syntax structure in ChatGPT is that it’s simply not necessary, since it has nothing to say.

Humans create character sequences as a result of previously combining concepts using a hierarchical recursive syntax, while ChatGPT creates character sequences from other character sequences created by humans. ChatGPT is therefore a mathematical model of the product of (written) human language use, not a model of the ability to produce human language.

It could be concluded, therefore, that stating that ChatGPT speaks (or writes) in English (or in any other language in which it has been trained) is the same as equating the procedure Velázquez used with the brush to paint Las Meninas with that of the photographer who reproduces the famous painting. The result may be superficially similar, but the process of creation is radically different, and we will not learn much about Velázquez’s creative technique by analyzing the process of developing a photograph of Las Meninas.

Leave a comment