An equation for Plato’s problem

Some linguists are envious of physics. We love showing off to other practitioners of the so-called humanities how we can make our field an empirical natural science that uses the hypothetico-deductive method. But, at the same time, we know that things in linguistics (as usually happens in cognitive sciences) are very different from how they are in physics, in chemistry, or even in biology. Our objects of study are much more elusive and abstract, the proofs of their existence are more convoluted, and our capacity for observation and measurement is (even) more limited and indirect than in the so-called hard sciences.

One of the deficiencies of our science, when placed before the mirror of physics, is that our theories are not formulated mathematically with equations (while biology also has problems with this, it has been done for a while). Maybe that’s why Ernst Rutherford said that science was either physics or collecting stamps.

But in recent years the linguist Charles Yang (who trained as a computer scientist at MIT and is now a professor at the University of Pennsylvania), an indisputable and influential specialist in the study of the mechanisms of language acquisition, has provided us with a beautiful equation that, unlike others (such as Zipf’s famous law), allows us to better understand some aspects of the faculty of language. Although Yang’s equation appears in several previous articles, the most complete and rounded presentation of his model can be found in the excellent book The Price of Linguistic Productivity (2016), justly awarded the Bloomfield Prize of the Linguistic Society of America this year.

Before going into details, it is worth remembering that an equation expresses a mathematical equality between two expressions, and that science uses them to enunciate laws in a precise way. The equation that concerns us does not have the beauty and relevance of some others, such as the famous ones by Newton, Maxwell, Einstein or Schrödinger, but even so it has major significance for the science of language. It can help us to better understand how is it possible for children between the ages of zero and four to discover productive rules in their language, overcoming numerous exceptions and an incomplete and non-systematic exposure to the necessary data (that is, what Chomsky called Plato’s problem: how we know so much with so little information).

Yang’s equation expresses what he calls the principle of tolerance and, in simple terms, establishes with surprising precision what is the threshold of tolerance to the exceptions that a child’s language acquisition mechanisms are capable of dealing with in order to induce a productive rule.

There is no doubt that children are especially gifted in searching for regularities in the linguistic input that surrounds them (in a more pedantic way, we might say that they try to build the most efficient grammar to understand and use the language of the environment). But such a cognitive task is far from simple, especially when the rules must be induced from a limited and contradictory sample. So, if in Spanish we have como ‘I eat’ from comer ‘to eat’, and bebo ‘I drink’ from beber ‘to drink’, it is not surprising that children, although they have never heard it before, produce rompo ‘I break’ from romper ‘to break’. It seems that they have detected that the verbs of the Spanish second conjugation make the first person of the present indicative by adding –to the root (in this case the result of removing –er from the infinitive form). The problem is that alongside bebocomo or rompo in the learner’s environment there are also quepo  (from caber ‘to fit in’) –and not *cabo–, sé (from saber ‘to know’) –not *sabo–, muerdo (from morder ‘to bite’) –not *mordo–, traigo (from traer ‘to bring’) –not *trao–, and he (from haber ‘to have’) –not *habo–, all of these irregular forms that discredit the hypothesis that the present is formed simply by adding –to the root of the verb. The question posed by Yang (following a vast tradition of discussion and controversy in this area, usually around irregular verbs in English) is how the child manages to formulate a productive rule.

It might seem a simple question: given that regular forms are much more abundant than irregular ones, the most common pattern, that is, the regular one, is imposed. But things are not like that at all: children do not have access to an unlimited number of data or, of course, access to corpora and statistical tools that allow them to reach such a conclusion. Quite on the contrary, as Yang has shown by analyzing in detail the input received by children and their own production, the exposure they have to the data is incomplete and necessarily reduced. In fact, although irregular verbs are, by definition, less abundant than regular verbs, they are used more frequently than regular ones (which is precisely why they remain irregular, because they are used a lot and learned very quickly). Up to 54 of the 100 most frequent verbs in the English corpus used by Yang are irregular. If the reader thinks about the Spanish verbs ser ‘to be’ and haber ‘to have’ (to remain with the second conjugation) it will be seen that this is so without the need for any statistics.

But, of course, in the child’s input there must be a sufficient number of regular examples for the formation of the rule to be stimulated. What is that number? Or, better stated, how many exceptions or irregular examples is the child able to overcome in order to build a productive rule?

This is where the principle of tolerance comes in, as expressed in the aforementioned equation. Let us see it in all its glory:


The tolerance principle predicts that for a rule R to be productive, the number of exceptions (e) must be equal to or less than the number expressed by the function N/LnN, where is the total number of input examples (including the exceptions) and where Ln is the natural logarithm. The equation thus establishes precisely how many exceptions within the total number of occurrences can be tolerated by our instinct to formulate a productive rule.

Another way to understand this is to consider that, although a productive rule is more efficient than the memorization of each form (which seems to be common sense, otherwise there would be no verbal conjugation), the rule will only be formulated if there is a reward: the resulting grammar is more efficient than memorizing each form. And for this to happen, certain conditions have to be met. Note that once the rule is formulated in the child’s internal grammar, it will have to be applied to forms not memorized as irregular. But to know if a verb is irregular or not, the child should check all the memorized irregular verbs. If the list of irregular verbs is very long, then formulating the productive rule would no longer have a computational advantage. Yang’s equation determines with amazing precision the critical length of the list from which children formulate the productive rules (and also explains, of course, when they do not do so).

Yang (2016) collects an impressive number of real empirical studies of acquisition from morphology, phonology and syntax in various languages in which the formula works with mathematical precision. In order to avoid boring the reader, I will focus only on the stellar case, English past formation. Imagine that a child who is acquiring English knows 120 irregular verbs (that is, e=120), which are more or less what Yang finds in the CHILDES corpus of English addressed to children, a corpus of around five million words. Using the equation, we can then infer that (the total number of regular and irregular verbs that the child has to know) is 800. This implies that for children to produce the rule of regular past formation in English (roughly, the “add –d” rule), they must know at least 680 regular verbs. And that’s the way it is: the aforementioned corpus contains about 900 regular verbs conjugated according to the “add –d” rule. And it is precisely at that moment when children who are learning English come up with the rule and begin to apply it productively (even erroneously to irregular verbs, such as *holded instead of held). Until the critical threshold of tolerance is reached, children are conservative and limit themselves to repeating the forms they hear, that is, operating only with associative memorization. Children who say holded (or “cabo” or “sabo” in Spanish), words they have not been able to memorize previously, have already passed the critical threshold and have stopped memorizing words unnecessarily, and have come to trust in productive rules. And that is why they conjugate perfectly those verbs that they have never heard before.

In the following table, adapted from Yang, we see some specific examples of the application of the formula that will allow us to capture another remarkable property of the principle of tolerance:


In the first column we have the total number of cases (N), and in the second column the result of the function N/LnN, which defines the tolerance threshold (remember: the number of exceptions, e, must be equal to or less than that number). It is striking to observe the third column, which expresses the percentage of exceptions to the rule that each case tolerates. Note that this proportion is not constant, but depends on the size of N: the smaller is, the higher the percentage of tolerated exceptions. Thus, for N=10 up to 40% of irregular cases are tolerated, while for N=5,000 only 11.7%. This is most remarkable.

In fact, it is very tempting to interpret this as an adaptation for the early acquisition of language, that is, to explain how is it possible that productive rules such as those that we have briefly reviewed here are obtained at the age of 2-3 years, when the life experience and the size of the received input are necessarily restricted. It could be said that the language faculty is not only designed to learn the most efficient grammar possible in view of the data, but also to take advantage of the window of opportunity defined by the critical period for language acquisition. The principle of tolerance allows the maturing brain (when it really can fully acquire the native language) to overcome the reduced input that is necessarily associated with early age. As Yang says, sometimes less is more.

In a later work, Yang has gone further and has proposed that the human capacity to count to infinity (instantiated in the successor function) has a linguistic basis, something already suggested by Chomsky (2007) when relating the operation Merge (which is at the base of syntax) with this function. Yang postulates that the successor function develops in children simultaneously with the acquisition of the numerical lexicon, that is, the morphosyntactic system that allows us to productively construct words for the next number. And in this context, the principle of tolerance again makes accurate predictions.

Reviewing the literature on the subject, Yang noted that in studies on the ability to count of English-speaking children, there is a qualitative jump from 72. Children develop their capacity in various phases, slowly and with regressions, but once they know how to count to 72, there is no longer an upper limit (that is, they have found the successor function).

But why 72? The answer, of course, is in the principle of tolerance. Consider the first hundred numbers in English (writing irregular words with letters and regular words with numbers):

one two three four five six seven eight nine ten eleven twelve thirteen 14 fifteen 16 17 18 19 twenty 21 22 23 24 25 26 27 28 29 thirty 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 fifty 51 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, etc…

There are a total of 17 unpredictable forms that must be learned by heart (the rest are formed by composing on these, such as twenty-nine for 29). This allows prediction, applying the principle of tolerance, that if is 17, the lowest value for is 73. So, if a child learns to count in English up to 73, he/she will no longer have limits. Yang points out that in a more regular language, like Chinese, only the first 12 numbers have to be memorized, so children learning Chinese will be able to count without limit once they have acquired 46 number words. This fact explains the previously observed greater precocity in the ability to count in children who speak Chinese compared to those who speak English.

It is clear that this issue does not equate linguistic theory with physics in the use of mathematics and the ability to predict, but it is inevitable that we recognize it as an important step forward in the ideal path of the development of science, which is no other than integration. As Chomsky (2005) has pointed out (and it couldn’t be otherwise), the human faculty of language is the result of three fundamental factors: the biological endowment, the influence of the linguistic environment, and the general principles of nature, including the principles of simplicity and computational efficiency (the so-called third factor which the minimalist Chomskyan program emphasizes). Yang’s work is undoubtedly a first-order contribution in the arduous task of unraveling how these three factors relate to each other when producing the faculty of language treasured by each speaker, the essential objective of linguistic theory.

And since I mentioned the third factor, it should be noted that although Yang does not mention it (to my knowledge), it is even more intriguing to know that the N/LnN function is used to calculate how many prime numbers a number contains. Thus, if we go back to the previous table, we can predict that 10 includes 4 prime numbers ( 2, 3, 5 and 7) and that 20 includes 7 prime numbers (2, 3, 5, 7, 11, 13, 17 and 19), etc. I am not qualified to assess the importance of this coincidence (and even whether it is indeed a coincidence), but I am able to appreciate the beauty behind it. Who knows if the statistical distribution between regularity and irregularity that favors the acquisition of language by maturing brains has any relation with the essential mystery of prime numbers, but it would certainly be nice to think so.

[Author’s English version of a blogpost published in Spanish (10 October, 2018):]


4 thoughts on “An equation for Plato’s problem

  1. thanks for sharing this! One thing though.. 1 is not a prime number (cause a prime number must always be greater than one), but 5 and 19 are (so the equation works out nonetheless 😉


    1. Thank you for your message and for your observation. Of course, you’re right. I correct it for the benefit of future readers.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s