On the incompleteness of syntax

Chomsky’s program for linguistics

It is no coincidence that Chomsky’s two first published books (Syntactic Structures and Aspects of the Theory of Syntax) both allude to syntax in the title. He has recently pointed out that the essential basis of his approach to language (in addition to the naturalistic approach) is that each language “makes available an unbounded array of hierarchically structured expressions that have determinate interpretations at the interfaces with other internal systems”, and that “we may call this core feature of language its Basic Principle” (Chomsky 2015: vii). As he explicitly argues, the Basic Principle encompasses the computational aspect of language “including the narrow syntax that provides the expressions mapped to the interfaces and the mappings themselves, and of course the lexical atoms of computation and their various configurations” (Chomsky 2015: vii). Thus, the basic principle of language is syntax because syntax mediates in the leap from the finite to the infinite or, to use Wilhelm von Humboldt’s expression, because human language makes an infinite use of finite means. In fact, it is tempting to think that syntax is not only the basic principle of human language, but the basic principle of human cognition. As Chomsky (2007) has also hypothesized, the link between the conceptual-intentional system and the computational system forms an “internal language”, a system of thought exclusive to our species that not only manifests itself in language (in the usual sense of the languages that people speak), but also in logical reasoning or in mathematical ability.

The main source of opposition that Chomskyan linguistics has attracted over the years comes from those theories that show a preference for the semantic (symbolic) dimension of language. Indeed, one common property of the various functionalist/cognitvist traditions that oppose the generativist research program (or that simply ignore it) is the preference for placing semantics and communication as the essential core of language, considering syntax as ancillary or secondary. In fact, even within generative linguistics, Jackendoff (2002) censures the excessive syntactico-centric character of mainstream Chomskyan linguistics, although in this case not to deny the basic property, but to point out that the generative character that underlies this property would not be exclusive of syntax.

In my opinion, part of such a confrontation (although, without a doubt, it does not boil down to this) has its origin in a difference as to what ‘syntax’ means. Simply put, when Chomsky and other linguists say ‘syntax’, they are referring to what for other linguists is ‘semantics’. In fact, as can be seen in the previous quote from Chomsky, according to recent developments of this model we could say that whenever there is compositionality, there is syntax. Therefore, syntax is not limited to the construction of phrases and sentences, but also builds (or underlies) words, morphemes, and any other entities that are not semantically atomic (to the extent that they exist, a difficult issue, and one yet to be resolved).

Clearly, this is not just a matter of labels: what is really involved is the empirical hypothesis that what some authors believe to be semantics is actually syntax. Note that from a naive point of view the linking of the utterance The screen shines in the dark with its meaning is mysterious, magical, like the feeling we might have had in the 4th century if we had been able to turn a switch and see a light bulb become illuminated. Of course, as we analyze the components of the utterance and their structure and syntactic relations, the mystery decreases, and our knowledge of how such linking occurs increases. The Chomskyan tenet, then, is to fully squeeze syntax to be able to explain something of semantics. For that reason, I would go so far as to claim that Chomsky’s program for linguistics could be understood as a program of the reduction of natural semantics (which is essentially diffuse and intuitive) in terms of natural syntax (which is essentially discrete and formal). Within an approach that adopts the cognitive perspective and that proposes the construction of an explicit model of the cognitive anatomy and physiology of the language organ, advances in the reduction of semantics in terms of syntax would represent a sign of progress. Perhaps an analogy with the evolution of modern mathematics can make this more understandable and convincing.

Hilbert’s program for mathematics

In 1930 the German mathematician David Hilbert brought together the elite of his discipline to formalize what would be called Hilbert’s program for mathematics. This program was based on two principles: all of mathematics follows from a correctly chosen finite system of axioms, and such a system of axioms can be proved to be consistent. In other words, what Hilbert proposed with his formalist program was that any truth of a mathematical theory should be demonstrable from the axioms by a process of reasoning, the validity of which can be verified mechanically in a finite number of steps, that is, algorithmically. The motto that summarizes this attitude (and which appears on Hilbert’s epitaph) is Wir müssen wissen. Wir werden wissen (‘we have to know and we will know’).

By Kassandro – Own work, CC BY-SA 3.0

It is important to bear in mind that Hilbert’s formalist program was proposed against the so-called ‘intuitionists’ or ‘constructivists’, who rejected the existence of numbers that cannot be generated algorithmically (such as Cantor’s de facto infinity). Hilbert’s maneuver, then, was clear: by considering his own demonstration as algorithmic, he could justifiably introduce mathematical entities such as Cantor’s infinity, while overcoming the reluctance of intuitionists, who made what amounted to an official surrender at the Königsberg conference in 1930.

However, at the same conference, a young Kurt Gödel raised his hand and claimed that he had actually developed a theorem proving that if the demonstrations were to be algorithmic, then it was impossible to give axioms for arithmetic that would allow all the truths of the theory to be demonstrated. More specifically, Gödel’s first incompleteness theorem states that for every self-consistent recursive axiomatic system (for example, one that can describe the arithmetic of natural numbers) there are always true propositions that cannot be proved from the axioms. Hilbert persuaded the intuitionists by demanding that the proof of theorems always be algorithmic, and that is precisely what Gödel did to convince Hilbert: he provided an algorithmic demonstration of his famous incompleteness theorem, that is, a demonstration that scrupulously fulfilled Hilbert’s program.

Let us remember now that Chomsky’s program for linguistics implies that the right theory of language must be “perfectly explicit”, that is, a “generative grammar”. And it is here, in the generative (algorithmic) character of Chomskyan syntactic theory, that we can find the connection between the history of mathematical logic and theoretical linguistics. Let us see how.

(Almost all) Semantics is Syntax

In mathematical logic, the semantic-syntactic duality is used in the following sense: a concept related to a sequence of symbols is ‘syntactic’ if it only refers to symbols without their meaning being relevant (for example, if we say that in ktlvd there are five letters or that the first is a k), while it is ‘semantic’ if it depends on the meaning that the sequence transmits (for example if we ask what it refers to or if it is true). In this context it could be said that in mathematical logic what is ‘syntactic’ is what is algorithmic (computable) and what is ‘semantic’ is what is vague and imprecise (not computable). As Piñeiro (a mathematician and historian of mathematics) has pointed out, “the fundamental premise of Hilbert’s program was to demand that the validity of the semantic aspects of mathematics is controlled by syntactic methods. Syntax, clear and indubitable, should put a stop to semantics, prone to paradoxes” (Piñeiro 2012: 98, my translation). Although the use of the terms ‘semantics’ and ‘syntax’ in mathematical logic and in linguistics is different, it should not be overlooked that the difference between what linguists call semantics and syntax has to do with the generative (computable) nature of syntax and the non-computable (and to some extent ‘magical’ or ‘mysterious’) character of semantics. As Penrose puts it, “if Hilbert’s hope could be realized, this would even enable us to dispense with worrying about what the propositions mean altogether!” (Penrose 1989: 144).

It is very tempting to relate this crucial episode of mathematical logic to the tension in modern linguistics about the relative weight of semantics and syntax. I have suggested a parallel between Hilbert’s program for mathematics and what I have called Chomsky’s program for linguistics because generative grammar can be characterized as an attempt to reduce semantics to syntax, in the sense that in Chomsky’s model syntax is not at the service of transmitting meaning or ordering symbols, as is usually understood, but syntax actually creates meaning. Syntax, understood as an unrestricted computational system, constructs meanings that simply would not exist without it and, what is fundamental, does so mechanically, algorithmically, in an unambiguous way.

But let’s not forget Gödel. After all, his incompleteness theorems proved that the ‘syntactic method’ is necessarily incomplete: there will always be true statements that are not demonstrable from the axioms. It seems that in mathematical logic we have to choose: either we have reliable (‘syntactic’) reasoning methods, but thus we cannot prove all the truths, or we can know all the truths (using ‘semantic’ methods), but without the certainty that our reasoning is correct. The problem is undoubtedly complex and relates to the controversy as to the very nature of free will and the debate about strong artificial intelligence. For some authors (Penrose among them) the fact that we can have certainty of mathematical truths without a possible algorithmic demonstration is a proof of the qualitative difference between the human mind and a computer. So, as Piñeiro points out, what Gödel showed is that we cannot be sure of being cognitively superior to a computer, since “we can never be certain that our semantic reasoning is correct” (Piñeiro 2012: 161, my translation). In any case, what matters now is that the development of syntactic theory in the last 60 years has shown that there are many more computable aspects in human language than what was previously thought.

Gödel, Hale & Keyser and the incompleteness of syntax

If we focus on language, it might seem that Gödel supports those who approach language as an essentially semantic (not computable, analogical, so to speak) phenomenon. However, the history of generative grammar is an indication that the future often holds surprises for us. Given the conclusion that the human mind (apparently) is not fully algorithmic, Piñeiro ends with a question of crucial importance in our discussion:

“Is there an intermediate level between purely syntactic reasoning and freely semantic reasoning that allows us to overcome the incompleteness of Gödel’s theorems while ensuring consistency? Is there really such a sharp difference between ‘syntax’ and ‘semantics’ or are what we call semantic concepts nothing more than more sophisticated syntactic concepts (in which you work with groups of symbols instead of individual symbols)?”

Piñeiro (2012: 162, my translation)

It is unlikely that any generativist linguist who reads this text, although in fact it deals with mathematical logic, can avoid thinking about the controversy of recent decades between lexicalism and anti-lexicalism. The tendency initiated by Hale & Keyser’s works on ‘lexical syntax’ in the 1990s has been continued by Marantz’s distributed morphology, Borer’s ‘exoskeletal’ approach, and Starke’s nanosyntax. What all these models have in common is that they are advanced instances of the Chomskyan program of reducing the unmeasurable, vague and fluctuating territory of semantics to the computable, algorithmic and unambiguous domain of syntax. The decomposition of lexical meaning in terms of units and principles of syntax is an instance of the possibility mentioned in the previous quotation of trying to avoid the ‘incompleteness of syntax’ by means of the decomposition of the ‘semantic symbols’ into computable (that is, syntactic) structures.

Consider, for example, the seminal article by Hale & Keyser (1993) in which they demonstrated (though not in the mathematical sense!) that the argument structure of verbal predicates is actually an ordinary syntactic structure. What is relevant now is that thanks to this hypothesis we can predict why there are certain verbs but not others, and why they mean what they mean but not something else (thus, we bottle the wine, in the sense that we put the wine in bottles, but we do not, with the same sense, wine the bottles). In fact, Hale & Keyser ask a pertinent semantic question (‘why are there so few semantic roles?’), and offer an interesting syntactic response (‘because semantic roles are actually syntactic relations’).

Indeed, even the most detailed studies in this regard tend to agree that languages use a very small number of semantic roles (agent, experimenter, theme, location, etc.), a number that tends to oscillate between two in the most restrictive theories to a dozen in the most extensive. Most of the approaches available in the1990s tended to propose a hierarchically ordered list of such roles and did not ask relevant questions such as: Why are there so few semantic roles? Why do they tend to be the same in all languages? Why are they ordered hierarchically (in the sense that it does not happen that a verb has a patient as a subject and an agent as a direct object)? Why do the same semantic roles tend to appear in the same positions (agents as subjects, themes as objects, etc.)?

Since it does not seem to be a problem for the human mind to learn two dozen semantic roles, Hale & Keyser suggested that perhaps the answers have to do with syntax, and not with semantics. What is relevant in the context of our discussion is that the restrictive and predictable nature of semantic roles (which are part of the meaning of predicates) would actually be a consequence of the unambiguous and limited nature of syntactic relations. Thus, the cocktail that is produced, on the one hand, by the asymmetry of the binary and endocentric syntactic projections (in which, for example, the relation between head and complement is not reversible) and, on the other hand, by the fact that there is a very limited number of grammatical categories, leads Hale & Keyser to formulate a specific answer: semantic roles are not primitives of the theory but finite descriptive labels of restricted syntactic relations. In general, the analysis that current syntactic models offer of the complex syntactic structure that underlies seemingly simpler linguistic units such as words and morphemes themselves (once considered as authentic syntactic atoms and, therefore, units of the mysterious realm of semantics) is a clear instance of this process of reducing semantics to syntax.

But let’s not forget Gödel 

But, of course, all this does not imply that Gödel is defeated. Far from it. It is relatively easy to imagine that there will always be a time when a truly ‘atomic’ type of meaning is to be postulated as simply ‘known’ by the mind, a point at which ‘a true axiom’ cannot be algorithmically proved, as the ‘Hilbert-Chomsky program’ would require. But, just as mathematicians have indeed sought to address the question of demonstrating all postulated theorems, we could say that formalist linguists are similarly obliged to develop the syntactic program initiated in the 1950s to the extreme, as a scientifically lawful and unrefutably objective.

It seems, then, that the warning Chomsky made in Aspects is still valid today:

“In general, one should not expect to be able to delimit a large and complex domain before it has been thoroughly explored. A decision as to the boundary separating syntax and semantics (if there is one) is not a prerequisite for theoretical and descriptive study of syntactic and semantic rules.”

Chomsky (1965: 150)

It is possible that there is an ‘irreducible semantics’, but it would not be prudent to overlook the fact that the theorem that demonstrates the ‘incompleteness of syntax’ did, after all, have a ‘syntactic’ demonstration. So you never know . . .

[Adapted from a paper written in Spanish: Mendívil-Giró, José-Luis (2017): “Qué es la sintaxis y por qué es el principio básico del lenguaje humano”. In Ángel J. Gallego Bartolomé, Yolanda Rodríguez Sellés & Javier Fernández-Sanchez (eds.): Relaciones sintácticas: Homenaje a Josep M. Brucart i M. Lluïsa Hernanz, Universitat Autònoma de Barcelona, Barcelona, pp.: 519-530.

2 thoughts on “On the incompleteness of syntax

  1. I am by no means a specialist, but I have to note that what you say about semantic roles (“Indeed, even the most detailed studies in this regard tend to agree that languages use a very small number of semantic roles (agent, experimenter, theme, location, etc.), a number that tends to oscillate between two in the most restrictive theories to a dozen in the most extensive.”) is not uncontroversial. First of all, in Yuri Apresjan’s work up to 50 or 60 or so semantic roles are posited (though I know this from Valentina Apresjan and I can’t give any references, and his work is mainly in Russian), but this is not so important. What is more important is that the very notion of semantic roles falls apart under scrutiny (or at least is very problematic and far from figured out), the reference for which being Beth Levin and Malka Rappaport Hovav’s 2005 book “Argument Realization”, in particular the second chapter which is an overview of various approaches that posit semantic role lists. (Of course, this doesn’t make the general point of this post invalid.)

    Liked by 1 person

  2. Thank you very much for reading my text and especially for your interesting comment, which clarifies and enriches the text. It remains here as an interesting footnote!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s