1  The Nature of Universal Grammar
           The idea of Universal Grammar (UG) put forward by Noam Chomsky has been a crucial driving force in linguistics. Whether linguists agree with it or not, they have defined themselves by their reactions to it, not only in terms of general concepts of language and language acquisition, but also in how they carry out linguistic description. From the 1960s to the 1980s, UG became a flash-point for disciplines outside linguistics such as psychology, computer parsing of language and first language acquisition, even if these areas have tended to lose contact in recent years. The aim of this book is to convey why Chomsky's theories of language still continue to be stimulating and adventurous and why they have important consequences for all those working with language.
             This book is intended as an introduction to Chomsky's UG Theory for those who want a broad overview with sufficient detail to see how its main concepts work rather than for those who are specialist students of syntax, for whom technical introductions such as Adger (2003) and Hornstein et al. (2005) are more appropriate. Nor does it cover Chomsky's political views, still as much a thorn in the side of the US establishment as ever, for example Chomsky (2004a). While the book pays attention to the current theory, called the Minimalist Program, it concentrates on providing a background to the overall concepts of Chomsky's theory, which have unfolded over six decades. Where possible, concepts are illustrated through Chomsky's own words. The distinctive feature of the book is the combination of Chomsky's general ideas of language and language acquisition with the details of syntax.
             This opening chapter sets the scene by discussing some of the general issues of Chomsky's work on the notion of UG. Following this, chapter 2 discusses central concepts of the framework and how these relate to Chomsky's views on language acquisition. The next two chapters provide an introduction to the syntax of Government/Binding Theory in terms of structure and of movement respectively. Chapter 5 looks at Chomskyan approaches to first language acquisition, chapter 6 at second language acquisition. Then chapters 7 and 8 outline the current Minimalist Program, again separating structure and movement.
             Two conventions followed in this book need briefly stating. As usual in linguistics books, an asterisk indicates an ungrammatical sentence. Example sentences, phrases and structures are numbered for ease of reference, i.e.:
2  1 The Nature of Universal Grammar
                   (1) *That John left early seemed.
                While much of the discussion is based upon English for convenience/ the UG Theory gains its power by being applied to many languages. Indeed the past twenty years have seen a proliferation in the languages studied, which will be drawn on when possible. It should perhaps be pointed out that the sentences used in this book are examples of particular syntactic Issues rather than necessarily being based on complete recent analyses of the languages in question.
               1.1    The early development of Universal Grammar Theory
               The approach adopted in this book is to look at the general ideas of the Chomskyan theory of UG without reference to their historical origins. Nevertheless some allusions have to be made to the different versions that have been employed over the years and the history of the theory needs to be briefly sketched, partly so that the reader is not confused by picking up a book with other terminology.
                 Development has taken place at two levels. On one level are the general concepts about language and language acquisition on which the theory is based. The origins of such ideas as competence and performance or the innateness of language can be traced back to the late fifties or mid-sixties. These have grown continuously over the years rather than being superseded or abandoned. On this level the UG Theory is recognizable in any of its incarnations and the broad outlines have remained substantially the same despite numerous additions.
                 On another level come ideas about the description of syntax, which fall into definite historical phases. Different periods in the Chomskyan description of syntax have tended to become known by the names of particular books. Each was characterized by certain concepts, which were often rejected by the next period; hence the statements of one period are often difficult to translate into those of the next. Unlike the continuity of the general ideas, there are shifts in the concepts of syntax, leading to a series of apparent discontinuities and changes of direction.
                 The original model, Syntactic Structures, took its name from the title of Chomsky's 1957 book, which established the notion of 'generative grammar' itself, with its emphasis on explicit 'generative', formal description through 'rewrite rules' such as S —» NP VP, as described below. It made a separation between phrase structure rules that generated the basic structures, called 'kernel sentences', and transformations which altered these in various ways by turning them into passive or negative sentences etc.; hence its popular name was 'transformational generative grammar' or 'TGG'. Its most memorable product was the sentence:
                 (2)  Colourless green ideas sleep furiously.
               intended to demonstrate that sentences could be grammatical but meaningless and hence that syntax is independent of semantics. This sentence became so widely known that attempts were made to create poems that included it naturally (after all, Andrew Marvell wrote of 'a green thought in a green shade').
1 The Nature of Universal Grammar 3
              This theory was superseded by the model first known as the Aspects Model after Chomsky's 1965 book Aspects of the Theory of Syntax, later as the Standard Theory. This was distinguished by the introduction of the competence/per-formance distinction between language knowledge and language use and by its recognition of "deep7 and "surface7 structure in the sentence. Two classic test sentences were:
              (3) John is eager to please.
           which implies that John pleases other people, and:
              (4) John is easy to please.
           which implies that other people please John. This difference is captured by claiming that the two sentences have the same surface structure but differ in deep structure, where John may act as the subject or object of the verb please. Again, sentence (4) was so widely known it featured in graffiti on London street walls in the 1970s and was used as a book title (Mehta, 1971).
              During the 1970s the Standard Theory evolved into the Extended Standard Theory (EST), which refined the types of rules that were employed. This in turn changed more radically into the Govemment/Binding (GB) Model (after Lectures on Government and Binding, Chomsky, 1981a), which substantially underpins this book. The GB Model claimed that human languages consisted of principles that were the same for any grammar and parameters that allowed grammars to vary in limited ways, to be illustrated in the next chapter. It also refined deep and surface structure into the more technical notions of "D-structure" and "S-structure7, to be discussed below, The GB version of UG was presented most readably in Knowledge of Language (Chomsky, 1986a). Though "Government and Binding Theory" was the common label for this model, Chomsky himself found it misleading because it gave undue prominence to two of its many elements: "these modules of language stand alongside many others ... Determination of the nature of these and other systems is a common project, not specific to this particular conception of the nature of language and its use" (Chomsky, 1995a, pp. 29-30). Hence the label of Principles and Parameters (P&P) Theory has come to be seen as closer to its essence, and can still be applied to the contemporary model.
              Since the late eighties a further major model of syntax has been undergoing development, a model called the Minimalist Program (MP), again reflected in the title of Chomsky's first publication in this framework (Chomsky, 1993) and his later book (Chomsky, 1995a). So far this has had three phases. In the first phase, up till about 1996, the MP concentrated on the general features of the model, simplifying knowledge of language to invariant principles common to all languages, and, by attaching parameters to the vocabulary, making everything that people have to acquire in order to know a particular language part of the lexicon. From about 1996 the second phase embarked on a programme of radically rethinking syntax, eliminating much of the apparatus of GB Theory in favour of a minimal set of operations and ideas and exploring whether the central "computational system" of language interfaces "perfectly" with phonology and cognition. Since 2000
4   1 The Nature of Universal Grammar
Starting  Model                    Key terms                Key            
date                                                        bo ok/article  
1957      Transformational         Rewrite rules            Chomsky, 1957  
          generative grammar CTG)  Transformation                          
                                   Generative                              
                                   Kernel sentence                         
1965      Aspects, later Standard  Competence / performance Chomsky, 1965  
          Theory                   Deep/surface structure                  
c. 1970   Extended Standard Theory                          Chomsky 1970   
          (EST)                                                            
1981      Government / Binding     Principles               Chomsky, 1981a 
          Theory (GB)              Parameters                              
                                   D" and S-structure                      
                                   Movement                                
post-1990 Minimalist Program (MP)  Computational system     Chomsky, 1993  
                                   Interface conditions                    
                                   Perfection                              
Figure 1.1 Phases in the development of Chomsky's Universal Grammar
                a new model has been emerging, known as the Phases Model. A current view is presented in chapters 7 and 8 below. A readable set of lectures by Chomsky on f; this framework is The Architecture of Language (Chomsky, 2000a).                /
                                                                                                     ¥'
                1.2  Relating 'sounds' and 'meanings'                                           j;
                                                                                                     I
                The focus of all Chomsky's theories has been what we know about language and t where this knowledge comes from. The major concern is always the human mind.
                The claims of UG Theory have repercussions for how we conceive of human beings 1;
                and for what makes a human being. Language is seen as something in the indi-    y
                vidual mind of every human being. Hence it deals with general properties of lam l
                guage found everywhere rather than the idiosyncrasies of a particular language  /
                such as English or Korean - what is common to human beings, not what distin-   Ji
                guishes one person from another. Everybody knows language: what is it that we   |
                know and how did we come to acquire it?                                         £
                  As well as this invisible existence within our minds, language also consists / of physical events and objects, whether the sounds of speech or the symbols of f writing: language relates to things outside our minds. The fundamental question
1 The Nature of Universal Grammar 5
            for linguistics since Aristotle has been how language bridges the gap between the intangible interior world of knowledge and the concrete physical world of sounds and objects; "each language can be regarded as a particular relationship between sounds and meaning7 (Chomsky, 1972a, p. 17). So a sentence such as:
              (5)  The moon shone through the trees.
            consists on the one hand of a sequence of sounds or letters, on the other of a set of meanings about an object called The moon7 and a relationship with some other objects called Trees7. Similarly the Japanese sentence:
              (6)  Ohayoh gozaimasu.
            is connected to its Japanese pronunciation on the one side and to its meaning 7Good morning7 on the other,
              The sounds and written symbols are the external face of language, its contact with the world through physical forms; they have no meaning in themselves. Moon means nothing to a monolingual speaker of Japanese, gozaimasu nothing to a monolingual English speaker. The meanings are the internal face of language, its contact with tire rest of cognition; they are abstract mental representations, independent of physical forms. The task of linguistics is to establish the nature of this relationship between external sounds and internal meanings, as seen in figure 1.2.
                External (E)                                                  Internal (I)
                  physical 'sounds' <-----------:----;-----------► 'meanings' mental
                   world                                                         world
            Figure 1.2 The sound-meaning link
            If language could be dealt with either as pure sounds or as pure meanings, its description would be simple: moon is pronounced with phonemes taken from a limited inventory of sounds; moon has meanings based on a limited number of concepts. The difficulty of handling language is due to the complex and often baffling links between them: how do you match sounds with meanings? How does The moon shone through the trees convey something to an English speaker about a particular happening?
               According to Chomsky (for instance Chomsky, 1993), the human mind bridges this gap via a 'computational system7 that relates meanings to sequences of sounds in one direction and sequences of sounds to meanings in the other.
External (E)
   physical 'sounds' •* world
computational
                                                          system
Figure 1.3 The computational system
                                                                                                              Internal (I)
                                                                                                   'meanings' mental world
The sheer sounds of language, whether produced by speakers or perceived by listeners, are linked to the meanings in their minds by the computational system.
6  1 The Nature of Universal Grammar
What speakers of a language know is not just the sounds or the meanings but how to connect the two. The complexity of language resides in the features of the computational system, primarily in the syntax.
   Since the late 1990s, as part of the MP, Chomsky has been interested in exploring the connections between the central computational system and, on the one side, the physical expressions of language, on the other, the mental representation of concepts: what happens at the points of contact between the computational system and the rest? At the point of contact with sounds, the mind needs to change the internal forms of language used by the computational system into actual physical sounds or letters through complex commands to muscles, called by Chomsky (2001a) the 'sensorimotor system'; i.e. the moon is said using the appropriate articulations. In reverse, the listener's mind has to convert sounds into the forms of representation used by the computational system, so that is perceived as the phrase the moon.
   At the point of contact with meanings, the mind needs to change the representation of language used by the computational system into the general concepts used by the mind, called 'the conceptual-intentional system' (Chomsky, 2001a), i.e. moon is connected to the concept of 'earth's satellite'. Going in the opposite direction, while speaking the mind has to convert the concepts into linguistic representation for the computational system, i.e. 'earth's satellite' is converted into moon. Figure 1.4 incorporates these interfaces into the bridge between sounds and meanings.
External (E)
sensorimotor
   system
'sounds'
 computational  
    system      
meanings
interface
interface
Internal (I)
conceptual-
intentional
   system
Figure 1.4 The interfaces with the computational system
The points of contact are the interfaces between the computational system, that is to say, between language knowledge, and two other things which are not language - the outside world of sounds and the inside world of concepts. For the computational system to work, it must be able to interface in both directions, to have two 'access systems' (Chomsky, 2000a, p. 9). Or, put another way, language is shaped by having to be expressed on the one hand as sounds or letters that can be handled by the human body, on the other as concepts that can be conceived by the human mind.
   Let us now add some of the apparatus of principles and parameters to this picture. To describe a sentence such as:
   (7)  Gill teaches physics.
the grammar must show how the sentence is pronounced - the sequence of sounds, the stress patterns, the intonation, and so on; what the sentence actually means - the individual words ~ Gill is a proper name, usually female, and so on; and how these relate to one another via syntactic devices such as the subject (Gill)
1 The Nature of Universal Grammar 7
coming before the verb (teaches), with the object (physics) after. The term 'grammar' is used generally to refer to the whole knowledge of language in the person's mind rather than just to syntax, which occasionally leads to misinterpretation by outsiders. The linguist's grammar thus needs a way of describing actual sounds -a phonetic representation; it needs a way of representing meaning - a semantic representation; and it needs a way of describing the syntactic structure that connects them - a syntactic level of representation. Syntactic structure plays a central mediating role between physical form and abstract meaning.
   Principles and Parameters Theory captures this bridge between sound and meaning through the technical constructs Phonetic Form (PF), realized as sound sequences, and Logical Form (LF), representations of certain aspects of meaning, connected via the computational system, as shown in figure 1.5:
physical ,     , ,
r J ' ■ sounds world
Phonetic
  Form
computational
                                                          system
  Logical
                                                                                                     Form
,     .  , mental
meanings . t
                                                                                                          world
Figure 1.5 The computational system
PF and LF have their own natures, for which distinct PF and LF components are needed within the model. They form the contact between the grammar and other areas, at the one end physical realizations of sounds, at the other further mental systems: 'PF and LF constitute the "interface" between language and other cognitive systems, yielding direct representation of sound on the one hand and meanings on the other as language and other systems interact' (Chomsky, 1986a, p. 68).
   Most principles-and-parameters-based research has concentrated on the central computational component rather than on PF or LF. If syntax is a bridge, independent theories of PF or LF are beside the point: however elegant the theories of PF or LF may be in themselves, they must be capable of taking their place in the bridge linking sounds and meanings. The same is true of language acquisition: the central linguistic problem is how the child acquires the elements of the computational system rather than the sounds or meanings of the language. PF and LF are treated in this book as incidentals to the main theme of syntax. Throughout the development of Chomsky's models, a key aspect has been their 'syntactocentrism' (Jackendoff, 2002): syntax has always been the key element of knowledge of language. Perhaps this is why the word 'grammar' is often extended in Chomskyan theories to encompass the whole knowledge of language in the individual's mind.
   Nevertheless this does not mean that considerable work on theories of LF and PF has not been carried out over the years. The PF component for example grew from The Sound Pattern of English (Chomsky and Halle, 1968) into a Whole movement of generative phonology, as described in Roca (1994) and Kenstowicz (1994).
   The bridge between sounds and meanings shown in figure 1.5 is still not complete in that LF represents essentially 'syntactic' meaning. 'By the phrase "logical form" I mean that partial representation of meaning that is determined by grammatical structure' (Chomsky, 1979, p. 165). LF is not in itself a full semantic representation but represents the structurally determined aspects of meaning that form
8 1 The Nature of Universal Grammar
one input to a semantic representation, for example the difference in interpreting the direction:
   (8)     It's right opposite the church, as:
   (9)     It's [right opposite] the church, meaning 'exactly opposite the church', or as:
   (10) It's right [opposite the church].
meaning 'turn right when you are opposite the church'.
   The simplest version of linguistic theory needs two levels to connect the com-putational system with the physical articulation and perception of language on the one hand and with the cognitive semantic system on the other. The MP indeed claims that 'a particularly simple design for language would take the (conceptually necessary) interface levels to be the only levels' (Chomsky, 1993, p. 3).
1.3  The computational system
The make-up of the computational system is then the subject-matter of linguistics. One vital component is the lexicon in the speaker's mind containing all their vocabulary, analogous to a dictionary. This knowledge is organized in 'lexical entries' for each word they know. While the meaning of the sentence depends upon the relationships between its various elements, say how the moon relates to shone, it also depends upon the individual words such as moon. We need to know the diverse attributes of the word - that it means 'satellite of a planet', that it is pronounced [mum], that it is a countable noun, a moon, and so on. Each lexical entry in the mental lexicon contains a mass of information about how the word behaves in sentences as well as its 'meaning'. Our knowledge of language consists of thousands of these entries encoding words and their meanings. The computational system relies upon this mental lexicon. Since the early 1980s Chomskyan theories have tied the structure of the sentence in with the properties of its lexical items rather than keeping syntax and the lexicon in watertight compartments. To a large extent the choice of lexical items drives the syntax of the computational system, laying down what structures are and aren't possible in a sentence having those lexical items: if you choose the verb read, you've got to include an object (read something); if you choose the noun newspaper in the singular, you've got to have a determiner (ofthe newspaper).
  The second vital component in the computational system is the principles of UG. Knowledge of language is based upon a core set of principles embodied in all languages and in the minds of all human beings. It doesn't matter whether one speaks English, Japanese or Inuktitut: at some level of abstraction all languages
Si:’-
t;
t-
f:
.if
1
I
11'
v$:
1'
;|§
♦
:1I
1 The Nature of Universal Grammar 9
rely on the same set of principles. The differences between languages amount to a limited choice between a certain number of variables, called parameters. Since the early 1980s the Chomskyan approach has concentrated on establishing more and more powerful principles for language knowledge, leading to the MP. The central claim that language knowledge consists of principles universal to all languages and parameters whose Values vary from one language to another was the most radical breach with the view of syntax prevailing before 1980, which saw language as 'rules' or 'structures' and believed language variation was effectively limitless.
   Principles apply across all areas of language rather than to a single construction and they are employed wherever they are needed. Knowledge of language does not consist of rules as such but of underlying principles from which individual rules are derived. The concept of the rule, once the dominant way of thinking about linguistic knowledge, has now been minimized. 'The basic assumption of the P&P model is that languages have no rules at all in anything like the traditional sense, and no grammatical constructions (relative clauses, passives, etc.) except as taxonomic artifacts' (Chomsky, 1995b, p. 388). The change from rules to principles was then a major development in Chomskyan thinking, the repercussions of which have not necessarily been appreciated by those working in psychology and other areas, who still often assume Chomskyan theory to be rule-based. 'There has been a gradual shift of focus from the study of rule systems, which have increasingly been regarded as impoverished,... to the study of systems of principles, which appear to occupy a much more central position in determining the character and variety of possible human languages' (Chomsky, 1982, pp. 7-8). The information stated in rules has to be reinterpreted as general principles that affect all rules rather than as a property of individual rules. Rules are by-products of the interaction between the principles and the lexicon. UG Theory is not concerned with specific syntactic constructions such as 'passive' or 'relative clause' or 'question', or the rules which linguists can formulate to express regularities in them, which are simply convenient labels for particular interactions of principles and parameters. The passive is not an independent construction so much as the product of a complex interaction of many principles and parameter settings, each of which also has effects elsewhere in the syntax: 'a language is not, then, a system of rules, but a set of specifications for parameters in an invariant system of principles of Universal Grammar (UG)' (Chomsky, 1995b, p. 388).
   Figure 1.6 then incorporates the principles and the lexicon into the computational system. The lexicon is the key starting point for a sentence; the principles combine with the properties of the lexical items chosen to yield a representation
External (E)
sensorimotor ,     , ,
                                                                                                                 sounds
   system
PF <-
Lexicon + UG principles
                                                             1
                                                       computational
                                                          system
  LF
            Internal (I)
            conceptual-'meanings' intentional system
Figure 1.6 The computational system
EXERCISE
1.1
10 1 The Nature of Universal Grammar
that is capable of connecting with the sounds and meanings outside the computational system.
Here are some Rules' for everyday human behaviour in different parts of the world. Could these be seen as examples of more general principles? How are these different from rules and principles of language?
   Wash your hands in between the courses of meals.
   Drive on the left of the road.
   Add salt only after potatoes have boiled.
   Do not use a capital letter after a colon.
   The evening meal should be eaten about 11 p.m.
   Children should be seen but not heard.
   Women must wear a head scarf.
   For examinations students must wear a dark suit a gown and a mortar board.
                                                                physical world
Interfaces that allow the linguistic systems to access other systems
sounds' ■ PF +
computational
system
                                                  Phonetic representation/ showing the sounds of speech in sequence
                                                     The system for relating the sound sequences and the meanings, i.e. principles, the lexicon
-► LF
                                                                                              mental meanings TJ
                                                                                                     ° world
                                                  Semantic representation, showing the grammatical aspects of meanings of speech
The computational system:
-  provides the link between:
    sounds: the physical production of language (sensorimotor system) and
    meanings: the mental representation of meaning (conceptual-intentional system)
-  via
    Phonetic Form (PF) (how the abstract phonological representation gets a
    pronunciation)
    and
    Logical Form (LF) (how the abstract syntactic representation gets a meaning)
-  relying on:
    the lexicon which stores all the properties of words and
    principles which dictate what structures can be used.
i;
■f:
■i
Figure 1.7 Explanatory diagram of the computational system
1 The Nature of Universal Grammar 11
1.4   Questions for linguistics
Tq sum up what has been said so far, UG is a theory of knowledge that is concerned with the internal structure of the human mind - how the computational system links sounds to meaning. Since the early 1980s it has claimed that this knowledge consists of a set of principles that apply to all languages and parameters with settings that vary between languages. Acquiring language therefore means learning how these principles apply to a particular language and which value is appropriate for each parameter for that language. Each principle or parameter that is proposed is a substantive claim about the mind of the speaker and about the nature of language acquisition, not just about the description of a single language. UG Theory is making precise statements about properties of the mind based on specific evidence, not vague or unverifiable suggestions. The general concepts of the theory are inextricably tied to the specific details; the importance of UG Theory is its attempt to integrate grammar, mind and language at every moment.
   The P&P approach is not, however, tied to a particular model of syntactic description; 'it is a kind of theoretical framework, it is a way of thinking about language' (Chomsky, 2000a, p. 15). Historically speaking it arose out of Chomsky's work of the early 1980s, notably Chomsky (1981a), The MP of the 1990s and 2000s has remained within an overall P&P 'framework' despite abandoning, modifying or simplifying many of them. P&P Theory is a way of thinking about knowledge of language as consisting of certain fixed and constant elements and some highly restricted variable elements, and so it can be implemented in different ways. The MP intends to cut down the number of operations and assumptions, making it in the end simpler than past theories.
   The aims of linguistics are often summed up by Chomsky, for example Chomsky (1991a), in the form of three questions:
(i) What constitutes knowledge of language?
The linguist's prime duty is to describe what people know about language - whatever it is that they have in their minds when they know English or French or any language, or, more precisely, a grammar. Speakers of English know, among many other things, that:
   (11)     Is Sam the cat that is black? is a sentence of English, while:
   (12)  *Is Sam is the cat that black?                             J
is not, even if they have never met a sentence like (12) before in their lives,
(ii)  How is such knowledge acquired?
A second aim for linguistics is to discover how people acquire this knowledge of language. Inquiring what this knowledge is like cannot be separated from asking
12  1 The Nature of Universal Grammar
how it is acquired* Anything that is proposed about the nature of language knowledge begs for an explanation of how it came into being* What could have created the knowledge in our minds that sentence (11) is possible, sentence (12) is impossible? Since the knowledge is present in our minds, it must have come from somewhere; the linguist has to show that there is in principle some source for it. This argument is expanded on a grander scale in chapter 5. Logically speaking, explaining the acquisition of language knowledge means first establishing what the knowledge that is acquired actually consists of, i.e. on first answering question (i).
(iii) How is such knowledge put to use?
A third aim is to see how this acquired language knowledge is actually used. Sentence (11) presumably could be used to distinguish one cat out of many in a photograph. Again, investigating how knowledge is used depends on first establishing what knowledge is, i.e. on the answer to question (i).
   Sometimes a fourth question is added, as in Chomsky (1988, p. 3):
(iv)  What are the physical mechanisms that serve as the material basis for this system of knowledge and for the use of this knowledge?
This mental knowledge must have some physical correlate, in other words there must be some relationship between the computational system in the mind and the physical structures of the brain. The principles and parameters of UG themselves must be stored somewhere within the speaker's brain. Though our understanding of the physical basic for memory is advancing rapidly with the advent of modern methods of brain scanning, Chomsky (2005b, p. 143) points out 'current understanding falls well short of laying the basis for the unification of the sciences of the brain and higher mental faculties, language among them'. Useful accounts of such brain research can be found in Fabbro (1999) and Paradis (2004). It will not be tackled in this book.
   Recently Chomsky has begun to pose more difficult questions about the nature of language and the extent to which this is determined by restrictions imposed on it from the other systems which interpret it: 'How good a solution is language to certain boundary conditions that are imposed by the architecture of the mind?' (Chomsky, 2000a, p. 17). Pre-theoretically, the linguistic system might be a completely independent aspect of the human mind which happens to be made use of by the sensorimotor system and the conceptual-intentional system, but which has distinct properties all of its own. On the other hand, the linguistic system might be an optimal solution to the problem of bridging the gap between the interpretative systems which has no properties other than those necessary for completion of its function, this being to deliver objects which are 'legible' at the interface levels, i.e. which meet the 'legibility conditions' imposed by! the sensorimotor and conceptual-intentional systems. Chomsky discusses this in terms of a concept of perfection, the discovery of which he sees as being at the heart of science; 'language is surprisingly close to perfect in that very curious sense; that is, it is a near-optimal solution to the legibility conditions' (Chomsky, 2000a, p. 20). That is to say, a system that consists of mechanisms motivated entirely by the requirement that it links meanings and sounds can be called 'perfect'.

(i)
(ii)
(iii)
(iv)
CHOMSKY'S QUESTIONS FOR LINGUISTICS
   What constitutes knowledge of language?
   How is such knowledge acquired?
   How is such knowledge put to use?
   What are the physical mechanisms that serve as the material basis for this system of knowledge and for the use of this knowledge?
1 The Nature of Universal Grammar 13
1.5   General ideas of language
Let us now put all this in the context of different approaches to linguistics, Chomsky's work distinguishes externalized (E-)Ianguage from internalized (I-)Ianguage (Chomsky, 1986a; 1991b). E-language linguistics, chiefly familiar from the American structuralist tradition such as Bloomfield (1933), aims to collect samples of language and then to describe their properties. An E-language approach collects sentences 'understood independently of the properties of the mind' (Chomsky, 1986a, p. 20); E-language research constructs a grammar to describe the regularities found in such a sample; 'a grammar is a collection of descriptive statements concerning the E-language' (p. 20), say that English questions involve moving auxiliaries or copulas to the beginning of the sentence. The linguist's task is to bring order to the set of external facts that make up the language. The resulting grammar is described in terms of properties of such data through 'structures' or 'patterns', as in say Pattern Grammar (Hunston and Francis, 2000),
   I-language linguistics, however, is concerned with what a speaker knows about language and where this knowledge comes from; it treats language as an internal property of the human mind rather than something external: language is 'a system represented in the mind/brain of a particular individual' (Chomsky, 1988, p. 36). Chomsky's first question for linguistics - discovering what constitutes language knowledge - sets an I-language goal.
   Chomsky claims that the history of generative linguistics shows a shift from an E-language to an I-language approach; 'the shift of focus from the dubious concept of E-language to the significant notion of I-language was a crucial step in early generative grammar' (Chomsky, 1991b, p. 10). I-language research aims to represent this mental state; a grammar describes the speaker's knowledge of the language, not the sentences they have produced. Success is measured by how well the grammar captures and explains language knowledge in terms of properties of the human mind. Chomsky's theories thus fall within the I-language tradition; they aim at exploring the mind rather than the environment. 'Linguistics is the study of I-languages, and the basis for attaining this knowledge' (Chomsky, 1987). Indeed Chomsky is extremely dismissive of the E-language approaches: 'E-language, if it exists at all, is derivative, remote from mechanisms and of no particular empirical significance, perhaps none at all' (Chomsky, 1991b, p. 10).
14 1 The Nature of Universal Grammar
  E-LANGUAGE AND I-LANGUAGE
E-language
-  Externalized'
-  consists of a set of sentences deals with sentences actually produced - 'corpora'
-  describes properties of such data
-  is concerned with what people have done
I-language
-   'internalized'
-   consists of a system of principles
-   deals with knowledge of potential sentences - 'intuitions'
-   describes the system in an individual's mind
-   is concerned with what they could do
   The E-language approach includes not only theories that emphasize the physical manifestations of language but also those that treat language as a social phenomenon, 'as a collection (or system) of actions or behaviours of some sort' (Chomsky, 1986a, p. 20). TEe study of E-language relates a sentence to the sentences that preceded it, to the situation at the moment of speaking, and to the social relationship between the speaker and the listener. It concentrates on social behaviour between people rather than on the inner psychological world. Much work within the fields of sociolinguistics and discourse analysis comes within an E-language approach in that it concerns social rather than mental phenomena.
   An I-language is whatever is in the speaker's mind, namely 'a computational procedure and a lexicon' (Chomsky, 2000b, p. 119). Hence the more obvious man-in-the-street idea of language is rejected: 'In the varieties of modem linguistics that concern us here, the term "language" is used quite differently to refer to an internal component of the mind/brain' (Hauser et al., 2002, p. 1570): the things that we call the English language or the Italian language are E-languages in the social world; I-languages are known by individuals. To Chomsky the study of language as part of the mind is quite distinct from the study of languages such as English or Italian, 'a language is a state of the faculty of language, an I-language, in technical usage' (Chomsky, 2005a, p. 2).
   The opposition between these two approaches in linguistics has been long and acrimonious, neither side conceding the other's reality. It resembles the perennial divisions in literature between the Romantics who saw poetry as an expression of individuality and the Classicists who saw it as part of society or, indeed, the personality difference between introverts concentrating on the world within and extraverts engaging in the world outside. It has also affected the other disciplines related to linguistics. The study of language acquisition is broadly speaking divided between those who look at external interaction and communicative function and those who look for internal rules and principles; computational linguists roughly divide into those who analyse large stretches of text and those who write rules. An E-linguist collects samples of actual speech or actual behaviour; evidence is concrete physical manifestation: what someone said or wrote to someone else in a particular time and place, say the observation that in March 2006 a Microsoft advertisement claims Children dream of flight, to soar or that a crossword clue in
1 The Nature of Universal Grammar 15
the Guardian newspaper is Sleeping accommodation with meals or any other sentence taken at random from the billions occurring every day: if something happens, i.e. someone produces a sentence, an E-linguist has to account for it. An I-linguist invents possible and impossible sentences; evidence is whether speakers know if the sentences are grammatical - do you accept That John left early seemed is English or not? The E-linguist despises the I-linguist for not looking at 'real' facts; the I-linguist derides the E-linguist for looking at trivia. The I-language versus E-language distinction is as much a difference of research methods and of admissible evidence as it is of long-term goals.
  The influential distinction between competence and performance, first drawn in Chomsky (1965), partly corresponds to the I-language versus E-language split. Competence is The speaker/hearer's knowledge of his language', performance 'the actual use of language in concrete situations' (Chomsky, 1965, p. 4), Since it was first proposed, this distinction has been the subject of controversy between those who see it as a necessary idealization and those who believe it abandons the central data of linguistics.
  Let us start with Chomsky's definition of competence: 'By "grammatical competence" I mean the cognitive state that encompasses all those aspects of form and meaning and their relation, including underlying structures that enter into that relation, which are properly assigned to the specific subsystem of the human mind that relates representations of form and meaning' (Chomsky, 1980a, p. 59). The grammar of competence describes I-language in the mind, distinct from the use of language, which depends upon the context of situation, the intentions of the participants and other factors. Competence is independent of situation. It represents what the speaker knows in the abstract, just as people may know the Highway Code or the rules of arithmetic independently of whether they can drive a car or add up a column of figures. Thus it is part of the competence of all speakers of English that movements must be local. The description of linguistic competence then provides the answer to the question of what constitutes knowledge of language.
  To do this, it idealizes away from the individual speaker to the abstract knowledge that all speakers possess. So differences of dialect are not its concern - whether an English speaker comes from Los Angeles or Glasgow. Nor are differences between genres - whether the person is addressing a congregation in church or a child on a swing. Nor is level of ability in the language - whether the person is a poet or an illiterate. Nor are speakers who know more than one language, say Japanese in addition to English. This is summed up in the classic Aspects definition of competence (Chomsky, 1965, p. 3): 'Linguistic theory is concerned primarily with an ideal speaker-listener in a completely homogenous speech community,' Ever since it was first mooted, people have been objecting that this is throwing the baby out with the bathwater; the crucial aspects of language are not this abstract knowledge but the variations and complexities that competence deliberately ignores. For example chapter 6 will argue that most human beings in fact know more than one language and that the theory cannot be arbitrarily confined to people who know only one.
  Chomsky's notion of competence has sometimes been attacked for failing to deal with how language is used, and the concept of communicative competence
16  1 The Nature of Universal Grammar
has been proposed to remedy this lack (Hymes, 1972). The distinction between competence and performance does not deny that a theory of use complements a theory of knowledge; Hanguage linguistics happens to be more interested in the theory of what people know; it claims that establishing knowledge itself logic-ally precedes studying how people acquire and use that knowledge. Chomsky accepts that language is used purposefully; 'Surely there are significant connections between structure and function; this is not and has never been in doubt' (Chomsky, 1976, p. 56). As well as knowing the structure of language, we have to know how to use it. There is little point in knowing the structure of:
   (13)  Can you lift that box?
if you can't decide whether the speaker wants to discover how strong you are (a question) or wants you to move the box (a request).
   Indeed in later writings Chomsky has introduced the term pragmatic competence - knowledge of how language is related to the situation in which it is used. Pragmatic competence 'places language in the institutional setting of its use,, relating intentions and purposes to the linguistic means at hand' (Chomsky, 1980a, p. 225). It may be possible to have grammatical competence without pragmatic competence. A schoolboy in a Tom Sharpe novel Vintage Stuff (Sharpe, 1982) takes everything that is said literally; when asked to turn over a new leaf, he digs up the headmaster's camellias. But knowledge of language use is different from knowledge of language itself; pragmatic competence is not linguistic competence. The description of grammatical competence explains how the speaker knows that:
   (14)     Why are you making such a noise? is a possible sentence of English, but that:
   (15)  *Why you are making such a noise?
is not. It is the province of pragmatic competence to explain whether the speaker who says:
   (16)  Why are you making such a noise?
is requesting someone to stop, or is asking a genuine question out of curiosity, or is muttering a sotto voce comment. The sentence has a structure and a form that is known by the native speaker independently of the various ways in which it can be used: this is the responsibility of grammatical competence.
   Chomsky's acceptance of a notion of pragmatic competence does not mean, however, that he agrees that the sole purpose of language is communication:
   Language can be used to transmit information but it also serves many other purposes: to establish relations among people, to express or clarify thought, for creative mental activity, to gain understanding, and so on. In my opinion there is no reason to accord privileged status to one or the other of these modes. Forced to choose,
   I would say something quite classical and rather empty: language serves essentially for the expression of thought. (Chomsky, 1979, p. 88)

'i;-
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     ■■■
■T
:{
                                                                       1 The Nature of Universal Grammar 17
The claim that human language is a system of communication devalues the importance of other types of communication: 'Either we must deprive the notion "communication" of all significance, or else we must reject the view that the purpose of language is communication7 (Chomsky, 1980a, p, 230), Though approached from a very different tradition, this echoes the sentiments of the 'British7 school of linguistics from Malinowski (1923) onward that language has many functions, only one of which is communication. Chomsky claims then that 'language is not properly regarded as a system of communication. It is a system for expressing thought' (Chomsky, 2000b, p. 76). That is to say, it is a means of using the concepts of the mind via a computational system; for some purposes it might not matter if there were no interface to sounds - we can use language to organize our thoughts; it may not matter if there is no listener other than ourselves - we can talk to ourselves, write lecture notes or keep a diary. The social uses of language are in a sense secondary: 'The use of language for communication might turn out to be a kind of epiphenomenon' (Chomsky, 2002, p. 107), an accidental side-effect. Hence question (iii) 'how is knowledge put to use?' can only be tackled after we have described the knowledge itself, question (i).
   In all Chomskyan models a crucial element of competence is its creative aspect; the speaker's knowledge of language must be able to cope with sentences that it has never heard or produced before. E-language depends on history - pieces of language that happen to have been said in the past. I-language competence must deal with the speaker's ability to utter or comprehend sentences that have never been said before - to understand:
   (17)  Ornette Coleman's playing was quite sensational.
even if they are quite unaware who Ornette Coleman is or what is being talked about. It must also reflect the native speakers' ability to judge that:
   (18)  "Is John is the man who tall?
is an impossible sentence, even if they are aware who is being referred to, and can comprehend the question; 'having mastered a language, one is able to understand an indefinite number of expressions that are new to one's experience, that bear no simple physical resemblance to the expressions that constitute one's linguistic experience' (Chomsky, 1972a, p. 100), whether a sentence from a radio presenter such as:
   (19)     If you have been, thank you for listening, or today's newspaper headline:
   (20)  Bomb death day before birthday.
Creativity in the Chomskyan sense is the mundane everyday ability to create and understand novel sentences according to the established knowledge in the mind - novelty within the constraints of the grammar. 'Creativity is predicated on a
18   1 The Nature of Universal Grammar
system of rules and forms, in part determined by intrinsic human capacities. Without such constraints, we have arbitrary and random behaviour, not creative acts' (Chomsky, 1976, p. 133). It is not creativity in an artistic sense, which might well break the rules or create new rules, even if ultimately there may be some connection between them. The sentence:
   (21)     There's a dog in the garden, is as creative as:
   (22)  There is grey in your hair.
in this sense, regardless of whether one comes from a poem and one does not.
   It is then a characteristic of human language that human beings can produce an infinite number of sentences. I-language has to deal with 'the core property of discrete infinity' of language (Hauser et al., 2002, p. 1571), The ability of language to be infinite is ascribed by Chomsky to recursion, meaning that some rules of language can have other versions of themselves embedded within them indefinitely, rather like fractals. Or indeed the example in The Mouse and his Child (Hoban, 1967), where a can of dog-food has a picture of a dog looking at a can of dog-food with a picture of a dog..., and so on till the last visible dog. Recursion will be discussed in chapter 3.
   Let us now come back to performance, the other side of the coin. One sense of performance corresponds to the E-language collection of sentences. In this sense performance means any data collected from speakers of the language - today's newspaper, yesterday's diary, the improvisations of a rap singer, the works of William Shakespeare, everything anybody said on TV yesterday, the 100 million words in the British National Corpus. Whether it is W. B. Yeats writing:
   (23)    There is grey in your hair, or a band leader saying:
   (24)  Write down your e-mail address if you would like to be let know about our next CD.
it is all performance. An E-language grammar has to be faithful to a large sample of such language, as we see from recent grammars such as Biber et al. (1999), based on a corpus of 40 million words. An I-language grammar does not rely on the regularities in a collection of data; it reflects the knowledge in the speaker's mind, not their performance.                                          *
   However, 'performance' is used in a second sense to contrast language knowledge with the psychological processes through which the speaker understands or produces language. Knowing the Highway Code is not the same as being able to drive along a street; while the Code in a sense informs everything the driver does, driving involves a particular set of processes and skills that are indirectly related to knowledge of the Code. Language performance has a similar relationship
1 The Nature of Universal Grammar 19
to competence. Speakers have to use a variety of psychological and physical processes in actually speaking or understanding that are not part of grammatical competence, even if they have some link to it; memory capacity affects the length of sentence that can be uttered but is nothing to do with knowledge of language itself. Samples of language include many phenomena caused by these performance processes. Speakers produce accidental spoonerisms:
   (25)  You have hissed my mystery lectures and will have to go home on the town drain.
and hesitations and fillers such as er and you know:
   (26)  I, er, just wanted to say, well, you know, have a great time.
They get distracted and produce ungrammatically odd sentences:
   (27)  At the same time do they see the ghost?
They start the sentence all over again:
   (28)  Well it was the it was the same night.
   One reason for the I-linguist's doubts about using samples of language as evidence for linguistic grammars is that they reflect many other psychological processes that obscure the speaker's actual knowledge of the language. The other half of Chomsky's Aspects definition of competence (Chomsky, 1965, p. 3) is that the ideal speaker-hearer is: 'unaffected by such grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of the language in actual performance'. We need to see through the accidental properties of the actual production of speech to the knowledge lying behind it. Again, censoring all these performance aspects from competence has not been an easy concept for many people to accept: T believe that the great majority of psycholinguists around the world consider the competence-performance dichotomy to be fundamentally wrong-headed' (Newmeyer, 2003, p. 683).
COMPETENCE AND PERFORMANCE
   Competence: 'the speaker-hearer's knowledge of his language' (Chomsky, 1965, p. 4)
   Performance: 'the actual use of language in concrete situations' (Chomsky, 1965, p. 4)
   Pragmatic competence: 'places language in the institutional setting of its use, relating intentions and purposes to the linguistic means at hand' (Chomsky, 1980a, p. 225)
20 1 The Nature of Universal Grammar
EXERCISE
1.2
   Here is an example of performance - a transcript of an ordinary recorded conversation. What aspects of it would you have to ignore to produce a version that more accurately reflected the speaker's competence?
   A: After the the initial thrill of being able to afford to stay in a hotel wore off I found them terrible places. I mean perhaps if one can afford a sufficiently expensive one, it's not like that but er these sort of smaller hotels where it's like a private house but it's not your private house and you can't behave as you would at home.
   • BuSurely that's dying out in England now? ...
   A: London is absolutely full of them - small private hotels and you're supposed to be in at a particular time at night, let's say half past eleven, and breakfast will be served between half past seven and half past eight and if you come downstairs too late, that's it.
1.6   Linguistic universals
Let us now consider the notion of a linguistic universal a little more closely. One might think that something that is universal should be present in all linguistic systems. However, this is not necessarily the case. Consider the notion of movement for example. Movement plays an important role in Chomskyan theory and is employed to describe a number of constructions ranging from passives to questions. In English question words typically begin with the letters 'wh', i.e. who, where, and so on, and are therefore called wh-elements. A question may be formed by moving a wh-element to the front of the sentence. Suppose a person called Mike knows a person called Bert. This can be expressed by the sentence:
   (29) Mike knows Bert.
Suppose now that, although we know that Mike knows someone, we don't know who that person is. We might then ask the question:
   (30) Who does Mike know?
Note here the wh-element who stands for the object of the verb (the one who is known). Objects in English usually follow the verb, as in sentence (29), yet in the question (30) the wh-element occupies a position at the front of the sentence. .We might therefore propose that forming the question involves moving the object wh-element from its position behind the verb to its interrogative position at the front of the sentence:
              NP                         i
    (31)   Who does Mike know - ?
English wh-questions then involve movement; some element moves from its usual position to the front of the sentence.
*
■
s
1
i
%

i
                                                                1 The Nature of Universal Grammar 21
   But not every movement apparent in one language is apparent in all. In Japanese for example the statement:
   (32) Niwa-wa soko desu.
        (garden there is)
        The garden is there.
differs from the question:
   (33) Niwa-wa doko desu ka?
        (garden where is)
        Where is the garden?
by adding the element ka at the end and having the question word doko in the place of soko. The question-word doko is not moved to the start, as must happen in English (except for echo questions such as You said what?). Japanese does not use syntactic movement for questions, though it may need other types of movement.
   Other languages also lack movement for questions. In Bahasa Malaysia for example they can be formed by adding the question element kah to the word that is being asked about (King, 1980):
   (34) Dia nak pergi ke Kuala Lumpurkah?
        (he is going to Kuala Lumpur)
        Is he going to Kuala Lumpur?
without shifting it to the front. The presence or absence of syntactic movement is then a parameter of variation between languages; English requires certain movements, Bahasa Malaysia and Japanese do not. Parameters such as presence or absence of question movement vary in setting from one language to another. If knowledge of language were just a matter of fixed principles, all human languages would be identical; the variation between them arises from the different way that they handle certain parameterized choices, such as whether or not to form questions through movement.
   The setting of this movement parameter has a chain of consequences in the grammar. For example, as will be elaborated on in the next chapter, there are restrictions on movements which hold for all languages. One of these is that movements have to be short, a result of a principle (or principles) known as the Locality Conditions. But clearly a language which does not have wh-movement has no need for the locality conditions placed on such movements since there is nothing for them to affect.
   In what sense can a universal that does not occur in every language still be universal? Japanese does not break any of the requirements of syntactic movement; it does not need locality for question movement because question movement itself does not occur. Its absence from some aspect of a given language does not prove it is not universal. Provided that the universal is found in some human language, it does not have to be present in all languages. UG Theory does not insist all languages are the same; the variation introduced through parameters allows principles to be all but undetectable in particular languages. UG Theory does not, however, allow principles to be broken.
22 1 The Nature of Universal Grammar
  . This approach to universals can be contrasted with the longstanding attempts to construct a typology for the languages of the world by enumerating what they have in common, leading to what are often called 'Greenbergian' universals, after the linguist Joseph Greenberg. One example is the Accessibility Hierarchy (Keenan and Comrie, 1977). All languages have relative clauses in which the subject of the relative clause is related to the Noun as in the English:
   (35)  Alexander Fleming was the man who discovered penicillin.
A few languages do not permit relative clauses in which the object in the relative clause relates to the Noun. For example the English:
   (36)  This is the house that Jack built.
would not be permitted in Malagasy. Still more languages do not allow the indirect object from the relative clause to relate to the Noun. The English sentence:
   (37)  John was the man they gave the prize to.
would be impossible in Welsh. Further languages cannot have relative clauses that relate to the Noun via a preposition, as in:
   (38)     They stopped the car from which the number plate was missing, or via a possessive; the English sentence:
   (39)  He's the man whose picture was in the papers.
would not be possible in Basque. Unlike many languages, English even permits, though with some reluctance, the relative clause to relate via the object of Comparison as in:
   (40)  The building that Canary Wharf is taller than is St Paul's.
   The Accessibility Hierarchy is represented in terms of a series of positions for relativization:
   (41)  Subject > Object > Indirect Object > Object of Preposition > Genitive > Object of Comparison.
All languages start at the left of the hierarchy and have subject relative clauses; some go one step along and have object clauses as well; others go further along and have indirect objects; some go all the way and have every type of relative clause, including objects of comparison. It is claimed that no language can avoid this sequence; a language may not have, say, subject relative clauses and object of preposition relative clauses but miss out the intervening object and indirect object clauses. The Accessibility Hierarchy was established by observations
‘vf2                                                                                                                                                                                    -y'^Xi' Ay^A ^ l '--< : -:_: \3 r^T L:                                                                                                                                                                                                                                                                        v Tr):S - s ■ ■U /V " ■ L^ :
                                                                       1 The Nature of Universal Grammar 23
based on many languages; it is a Greenbergian universal. There is as yet no compelling reason within UG Theory why this should be the case, no particular principle or parameter involved: it is simply the way languages turn out to be. Greenbergian universals such as the Accessibility Hierarchy are data-driven; they arise out of observations; a single language that was an exception could be their downfall, say one that had object of preposition relative clauses but no object relative clauses. Universals within UG Theory are theory-driven; they may not be breached but they need not be present. There may indeed be a UG explanation for a particular data-driven universal such as the Accessibility Hierarchy; this would still not vitiate the distinction between the theory-driven UG type of universal and the data-driven Greenbergian type.
   So it is not necessary for a universal principle to occur in dozens of languages. UG research often starts from a property of a single language, such as locality in English. If the principle can be ascribed to the language faculty itself rather than to experience of learning a particular language, using an argument to be developed in chapter 4, it can be claimed to be universal on evidence from one language alone; T have not hesitated to propose a general principle of linguistic structure on the basis of observations of a single language' (Chomsky, 1980b, p. 48). Newton's theory of gravity may have been triggered by an apple but it did not require examination of all the other apples in the world to prove it. Aspects of UG Theory are disprovable; a principle may be attributed to UG that further research will show is in fact peculiar to Chinese or to English; tomorrow someone may discover a language that clearly breaches locality. The nature of any scientific theory is that it can be shown to be wrong; 'in science you can accumulate evidence that makes certain hypotheses seem reasonable, and that is all you can do - otherwise you are doing mathematics' (Chomsky, 1980b, p. 80), Locality is a current hypothesis, like gravity or quarks; any piece of relevant evidence from one language or many languages may disconfirm it. But of course, the evidence does have to be relevant: not just the odd observation that seems to contradict the theory but a coherent alternative analysis. Producing a single sentence that contradicts a syntactic theory, say, is not helpful if we cannot also provide an account for the problematic sentence. There are anomalous facts that do not fit the standard theory in most fields, whether physics or cookery, but they are no particular use until they lead to a broader explanation that subsumes them and the facts accounted for by the old theory.
  UNIVERSALS
      Universals consist of principles such as locality.
      Chomskyan universals do not have to occur in all languages, unlike Greenbergian universals.
      No language violates a universal principle (the language simply may not use the principle in a particular context).
      Universals are part of the innate structure of the human mind and so do not have to be learnt (to be discussed later).

24 1 The Nature of Universal Grammar
1.7   The evidence for Universal Grammar Theory
How can evidence be provided to support these ideas? The question of evidence is sometimes expressed in terms of 'psychological reality'. Language knowledge is part of the speaker's mind; hence the discipline that studies it is part of psychology. Chomsky has indeed referred to 'that branch of human psychology known as linguistics' (Chomsky/ 1972a/ p. 88). Again/ it is necessary to forestall too literal interpretations of such remarks. A movement rule such as:
   (42)  Move a wh-element to the front of the clause.
does not have any necessary relationship to the performance processes by which people produce and comprehend sentences; it is solely a description of language knowledge. To borrow a distinction from computer languages, the description of knowledge is 'declarative' in that it consists of static relationships, not 'procedural' in that it does not consist of procedures for actually producing or comprehending speech. The description of language in rules such as these may at best bear some superficial resemblance to speech processes.
   When knowledge of language is expressed as principles and parameters, the resemblance seems even more far-fetched; it is doubtful whether every time speakers want to produce a sentence they consider in some way the interaction of all the principles and parameters that affect it. Conventional psychological experiments with syntax tell us about how people perform language tasks but nothing directly about knowledge. They can provide useful indirect confirmation of something the linguist already suspects and so give extra plausibility perhaps. But they have no priority of status; the theory will not be accepted or rejected as a model of knowledge because of such evidence alone. Chomsky insists that the relevant question is not 'Is this psychologically real?' but 'Is this true?' He sees no point in dividing evidence up into arbitrary categories; 'some is labelled "evidence for psychological reality", and some merely counts as evidence for a good theory. Surely this position makes absolutely no sense.. . ?' (Chomsky, 1980a, p. 108). The linguist searches for evidence of locality and finds that questions such as:
   (43)     Is John the man who is tall? are possible, but questions such as:
   (44)  *Is John is the man who tall?
are impossible. The linguist may then look for other types of confirming evidence: how speakers form questions in an experiment, the sequence in which children acquire types of question, the kinds of mistake people make in forming questions. All such evidence is grist to the mill in establishing whether the theory is correct; 'it is always necessary to evaluate the import of experimental data on theoretical constructions, and in particular, to determine how such data bear on hypotheses
2 The Nature of Universal Grammar 25
|§
'■m
"M-
*
m
M
m
M

that in nontrivial cases involve various idealizations and abstractions' (Chomsky, 1980b, p. 51). Psychological experiments provide one kind of data, which is no more important than any other. A speaker's claim that:
   (45)     Is John the man who is tall? is a sentence of English and:
   (46)  *Is John is the man who tall?
is not provides a concrete piece of evidence about language knowledge. It doesn't matter whether the sentence has ever actually been said, only whether a sentence of that form could be said and how it would be treated if it were. Such evidence may well be incomplete and biased; in due course it will have to be supplemented with other evidence. But we have to start somewhere. The analysis of this easily available evidence is rich enough to occupy generations of linguists. It is simplest to start from the bird in the hand, our own introspections into the knowledge of language we possess; when that has been exhausted, other sources can be tapped.
   However, it should not be thought that the UG Theory attempts to deal with everything about language. UG Theory recognizes that some aspects of language are unconnected to UG. For example in English the usual form of the past tense morpheme is '-ed', pronounced in three different ways as /d/ planned, /t/ liked, and /id/ waited. But English also has a range of irregular past forms, some that form definite groups such as vowel change bitefbit, or lack of change hit/hit, others that have no pattern go/went, am/was. These irregular forms are learnt late by children in the first language, give problems to learners of a second language, and vary from one region to another, UK dived versus US dove. There is no need for UG to explain this odd assortment of forms; they are simply facts of English that any English speaker has to learn, unconnected to UG. The fact that such forms could be learnt in associationist networks has been used as a test case by advocates of connectionism to show that 'rules' are not needed (Rumelhart and McLelland, 1986) (though this often involves a confusion of the spoken and written language - /peid/ is regular in speech, <paid> irregular in writing (Cook, 2004)). As these forms are marginal for UG, whether they are learnable or not by such means has no relevance to the claims of the UG Theory.
   The knowledge of any speaker contains masses of similar oddities that do not fit the overall pattern of their language. 'What a particular person has in the mind/brain is a kind of artefact resulting from the interplay of accidental features' (Chomsky, 1986a, p. 147). UG Theory avoids taking them all on board by raising a distinction between core and periphery. The core is the part of grammatical competence covered by UG where all the principles are maintained, all the parameters set within the right bounds. The periphery includes. aspects that are not predictable from UG. It is unrealistic to expect UG Theory to account for myriads of unconnected features of language knowledge. It deals instead with a core of central language information and a periphery of less essential information; 'a core language is a system determined by fixing values for the parameters
26   1 The Nature of Universal Grammar
            of.UG, and the periphery is whatever is added on in the system actually represented in the mind/brain of a speaker-hearer' (Chomsky, 1986a, p. 147). The theory of UG is far from a complete account of the speaker's entire knowledge of language; it deals with the core aspects that are related to UG, not with the periphery that is unrelated to UG.
EXERCISE      Here are some sentences of Gateshead English, as represented in a novel. Setting
1.3           aside differences in accent, try to allocate them to core grammar, to performance
               or to peripheral grammar.
              1  'How we gonna gan and see them?'
              2  'Ah cannet wait, man Gerry.'
              3  'Ah've always wanted a dog, me.'
              4  'Cannet hang around here all night.'
              5  'When was the last time ye went to a match like?'
              6  'Neeone's gonnae burn doon nee hoose.'
              7  'Me mate Sewell here doesn't like being conned.'
              8  'Bet even Gazza couldn't get a penalty past Rusty.'
                                            Source: Jonathan Tulloch (2000), The Season Ticket, Jonathan Cape
           1.8  Conclusion
           To sum up, the distinctive feature of Chomsky's I-language approach is that its claims are not unverifiable but checkable; it supports unseen mental ideas with concrete evidence. The theory can easily be misconceived as making woolly abstract statements unconnected to evidence, which can be countered by sheer assertion and argument. Most criticism of Chomskyan concepts has indeed attempted to refute them by logic and argument rather than by attacking their basis in data and evidence.
             A case in point is Chomsky's well-known argument that children are born equipped with certain aspects of language, based on the fact that children know things about language that they could not have learnt from the language they have heard, as developed in chapters 2 and 5. This argument could be refuted by showing that the alleged fact is incorrect: either children could learn everything about language from what they hear or adults do not have the knowledge ascribed to them. 'An innatist hypothesis is a refutable hypothesis' (Chomsky, 1980a, p. 80). But the argument cannot be dismissed by pure counter-argument unsupported by evidence.
             The discussion in this book describes the actual syntactic content of the theory at length because Chomsky's general ideas cannot be adequately understood without looking at the specific claims about language on which they are based, A principle of language is not a proposal for a vague abstraction but a specific hypothesis about the facts of human language, eventually coming down to precise claims about the grammaticality or ungrammaticality of particular sentences.
1 The Nature of Universal Grammar 27
The UG Theory claims to be a scientific theory based on solid evidence about language. As such, it is always progressing towards better explanations for language knowledge, as later chapters will demonstrate.
Discussion topics
1  To what extent do you think Chomsky's theories as presented in this chapter are in fact scientific?
2  Is describing the core of language as the computational system a change of label or something more profound?
3  Can you accept Chomsky's claim that not all universal principles will be found in every language?
4  Do Chomsky's questions for linguistics touch on any issues that the person in the street would recognize?
5  Is it really creative to be able to produce new sentences? Couldn't a computer do the same?
6  What do you think could count as evidence against Chomsky's theories?
7  How acceptable do you find a scientific approach that can dismiss any counterexamples as peripheral grammar or performance? Or is it inevitable that any scientific theory has to be able to discard some of its apparent data?
8  For those who know Chomsky's political views in books such as Chomsky (2004a), what connection do you see between them and his overall ideas of language?
2   Principles, Parameters and Language Acquisition
This chapter takes a closer look at some of the ideas that have been driving Chomskyan linguistics for more than twenty-five years. It explains the notions of principles and parameters, introduced briefly in the last chapter, and shows how these are integrated with ideas about language acquisition.
2.1   Principles and parameters
2.1.1 Rules in early generative grammar
To understand grammatical principles and parameters means looking at certain linguistic phenomena that they account for and sketching what these notions replaced.
2,1.2   Phrase structure and rewrite rules
A major assumption in linguistics since the 1930s has been that sentences consist of phrases - structural groupings of words: sentences have phrase structure. Thus the sentence (S):
   (1)  The child drew an elephant.
breaks up into a Noun Phrase (NP) the child and a Verb Phrase (VP) drew an elephant. The VP in turn breaks up into a Verb (V) drew and a further Noun Phrase an elephant:
2 Principles, Parameters and Language Acquisition 29
   (2)            Sentence
Noun Phrase Verb Phrase
                                                                                    the child Verb Noun Phrase drew an elephant
   These Noun Phrases also break up into smaller constituents; the NP the child consists of a Determiner (Det or D) the and a Noun (N) child, while the NP an elephant consists of a Determiner an and an N elephant. The final constituents are then items in the lexicon:
   (3)                 Sentence
Noun Phrase         Verb Phrase
Determiner Noun  Verb   Noun Phrase   
   the     child drew Determiner Noun 
                          an elephant 
Phrase structure analysis thus breaks the sentence up into smaller and smaller grammatical constituents, finishing with words or morphemes when the process can go no further. A sentence is not just a string of words in a linear sequence but is structured into phrases, all of which connect together to make up the whole. A sentence is then defined by the phrases and lexical items into which it expands.
   A tree diagram such as (3) is one way to represent the phrase structure of a sentence: each constituent of the structure is represented by a node on the tree, which is labelled with its name; elements which are grouped into a constituent are linked to the node by branches. Another way of representing structure commonly found in linguistic texts is through labelled brackets, where paired brackets are used to enclose the elements that make up a constituent and the label on the first of the pair names the constituent. Thus the structure in (3) might equally be represented as the following labelled bracketing without any change in the analysis:
   (4)  [sU-kThe] [Nchild]][Vp[vdrcw][N,P[nan] [Nelephant]]]].
   One of Chomsky's first influential innovations in linguistics was a form of representation for phrase structure called a rewrite rule (Chomsky, 1957), seen in:
    (5)   S -> NP VP
30 2 Principles, Parameters and Language Acquisition
In this the 'rewrite' arrow / can be taken to mean 'consists of'. The rule means exactly the same as the phrase structure tree:
   (6)            S
NP      VP
or as the bracketing:
   (7)  [SNPVP]
namely that the Sentence (S) 'consists of' a Noun Phrase (NP) and a Verb Phrase (VP).
   The next rewrite rule for English will then be:
   (8)  VP ^ V NP
Again, this means that the VP 'consists of' a Verb (V) and an NP, as shown in the tree:
   (9)        VP
V       NP
or as the bracketing:
   (10) [Vp V NP]]
   More rewrite rules like (5) and (8) can be added to include more and more of the structure of English, for example to bring in the structure of Noun Phrases:
   (11) NP DetN
showing that NPs consist of a determiner (Det) and a noun (N). The sentence is generated through the rewrite rules until only lexical categories like Noun or Verb are left that cannot be further rewritten. At this point an actual word is chosen to fit each category out of all the appropriate words stored in the lexicon, say child for N and draw for V.
   Rewrite rules, then, take the subjective element out of traditional grammar rules. Understanding a rule like 'Sentences have subjects and predicates' means interpreting what you mean by a subject or predicate. To understand rule (5) you don't need to know what an S is because the very rule defines it as an NP and a VP; you don't need to know what an NP is because rule (11). defines it as a Determiner and a Noun; and you don't need to know what a Noun is because you can look up a list or a dictionary. Rewrite rules are formal and explicit, sometimes called 'generative' as we see below. Hence they can easily be used by
2 Principles, Parameters and Language Acquisition 31
computers. Rewrite rules can be written directly in to the computer language Prolog, so that you can type in a set of rules like (5) and get a usable parser for generating English sentence structures (Gazdar and Mellish, 1989).
   PHRASE STRUCTURE
   Phrase structure analysis divides sentences into smaller and smaller constituents until only words or morphemes are left, usually splitting into two constituents at each point, most commonly represented as a tree diagram:
Det
    my         cat fears
or as labelled brackets:
foxes
[s [np UMy cat][VP fears NP[foxes]]]].
   Rewrite rules formally define a set of possible structures which model those of any particular language, thus making a grammar of that language. We might therefore expect natural languages to be made up of such rewrite rules. However, the rules given here are specific to particular structures in particular languages. So, for example, rule (5) S —» NP VP does not necessarily imply that all sentences of all languages consist of an NP subject followed by a VP. In some languages the subject comes last, as in Malagasy:
   (12)  Nahita ny mpianatra my vehivavy.
          (saw the student the woman)
          The woman saw the student.
thus requiring another rewrite rule:
   (13)  S -» VP NP
   Even in English, few sentences have the bare NP V NP structure seen in (3). They may for instance contain other elements, such as adverbs often or auxiliary verbs will The NPs and VPs may differ from the ones seen so far by containing adjectives old or having intransitive verbs with no objects runs. To work for English, the rewrite rules in (3), (8) and (11) need to be massively expanded in number. Malagasy would similarly need vast expansion from the single rule given in (13).
32 2 Principles, Parameters and Language Acquisition
EXERCISE
2.1
   The grammars of the two languages have to be spelled out specifically in rewrite rules and have very little in common. The grammars of languages differ in terms of the actual rewrite rules even if they can be described in the same formal way. Rewrite rules are not specific to a language, so they can never achieve the overall goal of universality for all languages.
  A MINI-GRAMMAR FOR ENGLISH USING REWRITE RULES
     S -> NP VP VP ->VNP NP -» Det N
  Lexicon
     Ns: child, elephant, truth, Winston Churchill. . .
     Vs: draw, fly, cook, pay, invigilate . . .
     Dets: the, a, an . . .
   1  In the early days of rewrite rules, people tried to use them for a variety of other purposes such as classification of cattle-brands (Watt, 1967). Try to write a system of rewrite rules like the one in the box above for one or more of the following:
      -  a three course meal
      -  building a house
      ~ yesterday's television programmes
      -  the structure of The Lord of the Rings, or another novel
      -  dancing the salsa
      -  making a cup of tea
   2  Below is a description of the English sentence
         The policeman arrested the man.
      in traditional grammatical terms. Can you convert it into a tree diagram? Into rewrite rules?
   English sentences have a subject and a predicate; predicates may have transitive verbs with an object; singular nouns may have a definite article.
2.1.3 Movement
Phrase structure rewrite rules are not enough for natural languages. A major assumption in most linguistic theories, harking back to traditional grammars, is
                                                         2 Principles, Parameters and Language Acquisition 33
that elements of the sentence appear to move about, as we discussed in the last chapter, A sentence such as:
   (14)  What did the child draw?
makes us link the question-word what to the element in the sentence that is being questioned, namely the object:
   (15)  The child drew something. (What was that something?)
as if what had been moved to the beginning of the sentence from its original position after the verb drew:
   (16)  The child drew what.
           T                     !
         What did the child draw ?
   This can be termed the "displacement property" of language: "phrases are interpreted as if they were in a different position in the expression" (Chomsky, 2000b, p. 12). The relationship between the displaced elements and the original position is called "movement": a phrase is said to "move" from one position in the structure to another. This does not mean actual movement as we process the sentence, i.e. the speaker thinking of an element and then mentally moving it, even if this mistaken implication at one stage led to a generation of psycholinguistic research; movement is an abstract relationship between two forms of the sentence, which behave as //something moved. In other words movement is a relationship in competence - the knowledge of language - not a process of performance.
   The kind of rule that deals with displacement phenomena is called a transformation. In general such rules take a structure generated by the rewrite rules and transform it into a different structure in which the various constituents occupy different positions. Again, such rules are specific to a particular language and specific to particular constructions rather than applying to all constructions in all languages. Other English constructions involving displacements include the passive:
   (17)  The elephant was drawn.
In this sentence the NP in subject position, the elephant, is interpreted as the object and hence we might propose that it involves a movement from object to subject position (with some other changes):
         NP      '          I
   (18)          drew the elephant
This is clearly very different from questions where a wh-element moves to a position in front of the subject, not to the subject position itself.
   We might capture these facts by claiming that English grammar consists of man)? such transformational rules, one for each construction that involves displacement. Hence these rules are specific to the constructions that they are involved in.
EXERCISE
2.2
34 2 Principles, Parameters and Language Acquisition
Furthermore, while English has a transformation rule which moves an interrogative word like what to the beginning of a question-sentence, Japanese does not demonstrate this displacement:
   (19) a. Niwa-wa soko desu.
            (garden there is)
            The garden is there.
        b. Niwa-wa soko desu ka?
            (garden there is Q)
            Where is the garden?
As can be seen from the examples in (19), a Japanese question is formed by the addition of an interrogative particle ka. at the end of the sentence; the other elements remain in the same positions they occupy in the declarative. On the other hand, movements occur in other languages which do not happen in English. Hungarian, for example, moves a focused element in front of the verb where English would use stress or a particular sentence type, known as a cleft sentence (seen in the English translation in (20), to express the same thing:
   (20) Janos tegnap ment el.
        (John yesterday left)
        John left YESTERDAY.
        It was yesterday that John left.
Since languages use different transformation rules, their grammars must obviously differ in terms of the transformation rules they contain.
   How could you treat the following sentences as examples of movement in English? What has moved and where?
      Where have all the flowers gone?
      Do you think I'm sexy?
      What's new pussycat?
      How many roads must a man walk down?
      Why, why, why Delilah?
      Are you lonely tonight?
      When the saints go marching in.
      In the town where I was born lived a man who sailed to sea.
      Is that all there is?                                     *
   The picture of grammar that emerges from these observations is a collection of construction-specific phrase structure and transformational rules which differ from language to language. This view prevailed in generative circles throughout the 1960s and 1970s, until it was challenged by Principles and Parameters (P&P) Theory in the 1980s, which claimed that natural language grammars are not constructed
2 Principles, Parameters and Language Acquisition 35
out of these kinds of rules, but of something far more general. Principles state conditions on grammaticality that are not confined to a specific construction in a specific language, but are in principle applicable to all constructions in all languages, We will exemplify a possible principle later in the chapter.
2.1 A A brief word on the meaning of 'generative'
Before going further into principles and parameters, it is worth considering the notion of generative used in generative grammar, first to avoid a common misunderstanding and secondly to see what has happened to the term since the onset of the P&P approach. Generative' means that the description of a language given by a linguist's grammar is rigorous and explicit: 'when we speak of the linguist's grammar as a "generative grammar" we mean only that it is sufficiently explicit to determine how sentences of the language are in fact characterised by the grammar' (Chomsky, 1980a, p. 220). The chief contrast between traditional grammar statements and the rules of generative grammar lay not in their content so much as in their expression; generative rules are precise and testable without making implicit demands on the reader's knowledge of the language. One of the famous traps people fall into, called the Generative Gaffe by Botha (1989), is to use the term 'generative' as a synonym for 'productive' rather than for 'explicit and formal'. A rewrite rule like (5) is intended as a model not directly for how people produce sentences but for what they know. Thus a generative grammar is not like an electrical generator which 'generates' (i.e. produces) an electrical current: when we say that a grammar generates a language, we mean that it describes the language in an explicit way. A set of phrase structure and transformational rules form a generative grammar, as they state precisely what the structures are in a language and how those structures may be transformed into other structures. This contrasts with traditional grammars which might tell us, for example, that 'the subject comes before the verb' without defining what 'the subject' is or stating explicitly where 'before the verb' is.
   However, the replacement of rules by principles has consequences for the interpretation of the term 'generative'. Principles do not readily lend themselves to the same formal treatment as rules. The rival syntactic theory of Head-Driven Phrase Structure Grammar (HPSG) (Pollard and Sag, 1994) claims firmly to be part of generative grammar on the grounds that it uses formal explicit forms of statement, but challenges the right of recent Chomskyan theories to be called 'generative'; generative grammar 'includes little of the research done under the rubric of the "Government Binding" framework, since there are few signs of any commitment to the explicit specification of grammars or theoretical principles in this genre of linguistics' (Gazdar et al., 1985, p. 6). One introduction to 'generative grammar' (Horrocks, 1987) devoted about half its pages to Chomskyan theories, while a survey chapter on 'generative grammar' (Gazdar, 1987) dismissed Chomsky in the first two pages. Thus, though the Universal Grammar (UG) Theory still insists that grammar has to be stated explicitly, this is no longer embodied in the formulation of actual rules. The rigour comes in the principles and in the links to evidence.
36 2 Principles, Parameters and Language Acquisition
   Although Chomsky later claimed "true formalization is rarely a useful device in linguistics" (Chomsky, 1987), the theory nevertheless insists on its scientific status as a generative theory to be tested by concrete evidence about language. It sees the weakness of much linguistic research as its dependence on a single source of data - observations of actual speech - when many other sources can be found, A scientific theory cannot exclude certain things in advance; the case should not be prejudged by admitting only certain kinds of evidence. "In principle, evidence ... could come from many different sources apart from judgments concerning the form and meaning of expression: perceptual experiments, the study of acquisition and deficit or of partially invented languages such as Creoles, or of literary usage or language change, neurology, biochemistry, and so on" (Chomsky, 1986a, pp. 36-7).
   Some of these disciplines may not as yet be in a position to give hard evidence; neurology may not be able to show clearly how or where language is stored physically in the brain: in principle it has relevant evidence to contribute and may indeed do so one day. Fodor (1981) contrasts what he calls the "Wrong View" that linguistics should confine itself to a certain set of facts with the "Right View" that "any facts about the use of language, and about how it is learnt... could in principle be relevant to the choice between competing theories". When UG Theory is attacked for relying on intuitions and isolated sentences rather than concrete examples of language use or psycholinguistic experiments, its answer is to go on the offensive by saying that in principle a scientific theory should not predetermine what facts it deals with.
   The term "generative grammar" is, however, now generally used by many linguists simply as a label for research within the UG paradigm of research, for example the conference organizations Generative Linguists of the Old World (GLOW) and Generative Approaches to Second Language Acquisition (GASLA). It is often now little more than a vaguely worthy term distinguishing some Chomsky-derived linguistics approaches from others rather than having precise technical content.
2.2.5  An example of a principle: locality
To show what is meant by a grammatical principle we can develop the idea of movement. The previous section introduced the view that certain elements move about in a structure. As might be expected, movement has limits; many possible movements result in a structure that is ungrammatical. One of the first limitations to be noticed was that movements have to be short, i.e. not span too much of the sentence. Much work on movement since the late 1960s has tried to account for this observation by proposing different principles, whith crop up in subsequent chapters. For now, however, this limitation can be called the Locality Principle: movements must be within a "local" part of the sentence from which the moved element originates.
   Some examples will serve to put the idea across. To form a yes-no question in English (i.e. one that can be answered with either a "yes" or a "no") an auxiliary


a
.■M
ia;
aai
M
M
■ .aa
a'a.
                                                      2 Principles, Parameters and Language Acquisition 37
verb; such as will, can, may, have or he, moves from its normal position behind the subject to a position in front of the subject:
   (21)          The manager will fire Beckham.
          T                   *
. (22) Will the manager fire Beckham?
This is known as subject-auxiliary inversion. This example is simple as there is just one auxiliary verb which moves to form the question. However, other sentences have more than one auxiliary verb:
   (23) The manager will have fired Beckham.
The issue is whether either of the two auxiliaries will and have may move to form the question:
   (24) a. Will the manager have fired Beckham? b. * Have the manager will fired Beckham?
In this case the first auxiliary can move, but the second cannot, due to the Locality Principle. Comparing the distances that the two auxiliaries must move in these examples, moving the first auxiliary will clearly involves a shorter movement than moving the other auxiliary have:
              T             I
   (25) a. Will the manager - have fired Beckham?
              ^                   I
        b. Have the manager will - fired Beckham?
In other words, the shorter movement is grammatical and the longer ungrammatical.
   This observation applies not only to the movement of auxiliary verbs but also to other movements. The following example involves moving a subject of an embedded clause the manager to a higher subject position in the sentence:
   (26) a. It seems [the manager has fired Beckham],
                   T            I
        b. The manager seems [ - to have fired Beckham].
This is known as subject raising. If there is more than one embedded clause, movement from one subject position to the next is possible, but not moyement over the top of a subject:
                           f              I
   (27) a. It seems [the manager is likely [ - to fire Beckham]].
                    T                      I
        b. * The manager seems [it is likely [ - to fire Beckham]].

38 2 Principles, Parameters and Language Acquisition
Again, the shorter of the two movements is grammatical, the longer one is ungrammatical, conforming to the Locality Principle.
   Let us consider another type of movement. As discussed in the previous section, certain questions are formed by displacing question words, known as wh-elements, to the front of the sentence:
          T                      I
   (28) Who did the manager fire - ?
This is called wh-movement. Although the issue will need more discussion in chapter 4, wh-movements also have to be short As seen in (28), a wh-element is allowed to move out of a clause to a position at the front of the clause. But moving another wh-element to the front of an even higher clause gives an ungrammatical result:
   (29) a. David asked [who the manager fired - ].
              Np                 NT~i I '
        b.  * Who did David ask [ who - fired - ]?
Thus, while a wh-element can move to the front of its own clause, it cannot move directly to the front of a higher clause, which would obviously involve a longer movement. Again, wh-movement obeys the Locality Principle.
   The Locality Principle extends to other linguistic phenomena as well as movements. Take the case of English pronouns. These are said to 'refer to' nouns in the sentence. In:
   (30) Peter met John and gave him the message.
the personal pronoun him refers to the same person as John. Reflexive pronouns such as myself and herself however, obey strict conditions on what they can refer to, which differ from those for personal pronouns such as me and her. For one thing, a reflexive pronoun must refer to some other element in the same actual sentence and cannot refer to something directly identified in the discourse situation outside the sentence, as personal pronouns can:
   (31) a. George talks to himself.
        b.  * George talks to herself.
        c.  George talks to her.
In (31a) the reflexive pronoun himself obviously refers to George and no one else. (31b), however, is ungrammatical as the reflexive herself cannot refer to anything in the sentence and, unlike the personal pronoun her in (31c), it cannot refer to someone not mentioned in the sentence either, with the minor. well-known exception of George Eliot, the Victorian woman novelist. However, if there are two possible antecedents, i.e. elements that a pronoun can refer to, one inside the clause containing the reflexive and one outside this clause, only the nearest one can actually be referred to:
m
It
■■m
M
I!

.-a
'v-S
m
■a
If
'Sj


:i'
                                                          2 'Principles, Parameters and Language Acquisition 39
   (32)  The regent thinks [George talks to himself].
In this sentence only George can act as the antecedent of himself, although in other sentences the regent would be a possible antecedent. The reason is that George is a closer antecedent than the regent and hence the referential properties of reflexive pronorms are subject to the Locality Principle: a reflexive pronoun can only refer to an antecedent within a limited area of the sentence known as a local domain.
   The Locality Principle is clearly not just a rule which tells us how to form a particular construction in English; it is something far more general, which applies to phenomena such as reference as well as to movements. It doesn't lead to the formation of any specific construction, but applies to many constructions. The key difference between a rule and a principle is that, while a rule is construction specific, a principle applies to constructions across the board. Chomsky's claim is that human grammars are constructed entirely of principles, not of rules. Specific constructions and the rules for them, such as those we have looked at, are the result of the complex interaction of a number of general principles, including the Locality Principle,
   Moreover, principles are universal and so applicable in all human languages. The Locality Principle can indeed limit movement and reference in all languages (with some degree of parameterization - as seen in the next section). For example inversion phenomena, whereby a verbal element moves to the front of a clause, can be found in numerous languages, including German and French. Constructions involving inversion in these languages conform to the Locality Principle:
   (33)  a. Liest Hans das Buch?
             (reads Hans the book)
             Does Hans read the book?
         b.  Hat Hans das Buch gelesen?
             (has Hans the book read)
             Has Hans read the book?
         c.  * Gelesen Hans das Buch hat?
In these German examples, the verb moves to the front of the sentence to form a yes-no question (33a) and when there is an auxiliary and a main verb, the auxiliary undergoes the movement (33b). However, in this case, the main verb cannot move as the movement of the auxiliary is the shorter. Hence, verb movement in German conforms to the Locality Principle. The same is true of French:
(34) a, Quand lit~il le livre?
             (when reads-he the book) When does he read the book?
       b.    Quand a-t-il lu le livre? (when has-he read the book)
       c.   *Quand lu-t-il a le livre?
■!&,. -
As in English, when a wh-element moves to the front of the clause to form a question, we get inversion of the verb as well. The main verb can invert with the
40 2 Principles, Parameters and Language Acquisition
subject, as in (34a), when there is no auxiliary. When there is an auxiliary, it is this that inverts (34b) and the main verb cannot (34c). The movement of the auxiliary is the shorter one and hence verb movement in French also conforms to the Locality Principle.
   Subject raising also occurs in other languages. But it is always a short movement, in accordance with the Locality Principle.
   Wh-movement is similarly restricted in other languages, demonstrating the universality of the Locality Principle. In Hungarian a wh-element moves to the front of the verb, but it cannot move out of a clause that starts with a relative pronoun:
   (35)  a. Janos talalkozott Peterrel.
             (John met         Peter-with)
             John met Peter.
                     T                  l
         b.  Janos kivel talalkozott - ? (John wh-with met)
             Who did John meet?
   (36)  a. Janos talalkozott az emberrel [aki latta Petert], (John met                    the man-with who saw Peter)
              John met the man who saw Peter.
                      vp                                           I
          b.  * Janos kit talalkozott az emberrel [aki latta™]? (John who(m) met                the man-with who saw)
              * Who did John meet the man who saw?
Clearly in Hungarian short movements are grammatical while long movements are ungrammatical, in accordance with the Locality Principle.
   Finally, the effects of the Locality Principle in reference phenomena can also be seen in other languages than English. In the following French example, the reflexive se can refer to Jean but not to Pierre:
   (37) Pierre dit que Jean se regarde dans la glace.
         (Peter says that John himself looks at in  the mirror)
         Peter says John looks at himself in the mirror.
Similarly in the following Arabic example, the reflexive can refer to the nearby Zaydun but not to the distant Ahmed:
   (38)  Qala Ahmed ?anna zaydun qatala nafsahu.
         (said Ahmed that Zaid killed himself)
         Ahmed said that Zaid killed himself.
The Locality Principle applies in this case too. The Locality Principle is then a universal principle that applies to a wide variety of constructions in many languages: it is part of UG. Of course, the discussion given so far has been necessarily superficial and locality phenomena are more properly analysed as the result
2 Principles, Parameters and Language Acquisition 41
of other universal principles, as we will see. The point is, however, whatever principles are involved in accounting for the phenomena discussed above, they are not construction specific, like grammatical rules, and they are universal, that is to say, applicable to all languages.
   THE LOCALITY PRINCIPLE
   Principles: these are general conditions that hold for many different constructions. The claim of P&P Theory is that human languages consist of principles with no construction specific rules.
   Locality: this principle is a property of linguistic processes which restricts their application to a limited part of the sentence. This then forces movements in all languages to be local: they must be short:
                    T           I
      i)   It seems he is likely - to leave.
             T                    1
      ii) * He seems it is likely - to leave.
   The principle also limits the reference of reflexive pronouns to a local domain universally:
                      \F
NT
      iii) Qala Ahmed ?anna zaydun qatala nafsahu. (said Ahmed that Zaid killed himself)
2.1.6  An example of a parameter: the head parameter
Locality seems common to all languages. Yet languages obviously differ in many ways; if knowledge of language consisted solely of invariant principles, all human languages would be identical. To see how the theory captures variation between languages, let us take the example of the head parameter, which specifies the order of certain elements in a language.
   A crucial innovation to the concept of phrase structure that emerged in the early 1970s (Chomsky, 1970) was the claim that all phrases have a central element, known as a head, around which other elements of the phrase revolve and which can minimally stand for the whole phrase. Thus the VP drew an elephant has a head Verb drew; the NP the child has a head Noun child; a PP such as hy the manager has a head Preposition hy; and so on for all phrases. This enabled the structure of all phrases to be seen as having the same properties of head plus other elements, to be expanded in chapter 3. The aim behind this, as always, is to express generalizations about the phrase structure of all human languages rather than features that are idiosyncratic to one part of language or to a single language.
   An important aspect of language variation concerns the location of the head in relationship to other elements of the phrase called complements. The head of the phrase can occur on the left of a complement or on its right. So in the NP:
42 2 Principles, Parameters and Language Acquisition
r
   (39) education for life
the head Noun education appears on the left of the complement for life:
   (40)     Noun Phrase
          Noun    complement
            i           i
        education for life In the VP:
   (41) opened the door
the head Verb opened appears on the left of the complement the door:
   (42)     Verb Phrase
          Verb complement opened the door
Similarly in the PP:
   (43)  in the car
the head Preposition in appears on the left of the complement the car:
   (44)      Preposition Phrase
         Preposition complement
m
the car
Japanese is very different. In the sentence:
   (45)  E wa kabe ni kakatte imasu.
         (picture wall on is hanging)
         The picture is hanging on the wall.
the head Verb kakatte imasu occurs on the right of the Verb complement kabe ni:
   (46)         Verb Phrase
       Complement verb kabe ni kakatte imasu
2 Principles, Parameters and Language Acquisition 43
and the Postposition ra (on) comes on the right of the PP complement kabe:
   (47)          Preposition Phrase
        Complement Postposition kabe                 ni
There are thus two possibilities for the structure of phrases in human languages: head-left or head-right:
(48)          Any Phrase (XP)                       Any Phrase (XP)
                                        VERSUS
         Phrase Head (X) complement              complement Phrase Head (X)
   Chomsky (1970) suggested that the relative position of heads and complements needs to be specified only once for all the phrases in a given language, creating the 'X-bar syntax' to be described in the next chapter. Rather than a long list of individual rules specifying the position of the head in each phrase type, a single generalization suffices: 'heads are last in the phrase' or 'heads are first in the phrase'. If English has heads first in the phrase, it is unnecessary to specify that verbs come on the left in Verb Phrases, as in:
   (49)  liked him
or Prepositions on the left in Preposition Phrases, as in:
   (50)  to the bank
Instead the order of the elements in all English phrases is captured by a single head-first generalization.
   Japanese can be treated in the same way: specifying that Japanese is a head-last language means that the Verb is on the right:
   (51)  Nihonjin desu.
         (Japanese am)
         (I) am Japanese.
and that it has Postpositions that come after the complements rather than prepositions:
   (52)  Nihon ni (Japan in) in Japan
44 2 Principles, Parameters and Language Acquisition
And the same for other languages. Human beings know that phrases can be either head-first or head-last (alternatively known as head-initial and head-final); an English speaker has learnt that English is head-first; a Japanese speaker that Japanese is head-last, and so on. The word order variation between languages can now be expressed in terms of whether heads occur first or last in the phrase. The variation in the order of elements amounts to a choice between head-first and head-last. UG captures the variation between languages in terms of a limited choice between two or so possibilities - the head parameter. The settings for this parameter yield languages as different as English and Japanese. 'Ideally we hope to find that complexes of properties differentiating otherwise similar languages are reducible to a single parameter, fixed in one or another way' (Chomsky, 1981a, p. 6).
   The argument here first showed that the speaker of a language knows a single fact that applies to different parts of the syntax, e.g. the phrases of the language consistently have heads to the left. Then it postulated a parameter that all languages have heads either to the left or to the right of their complements. Unlike the universal necessity for locality in movement, the head parameter admits a limited range of alternatives: 'head-first' or 'head-last', depending on the particular language.
   Thus alongside the unvarying principles that apply to all languages, UG incorporates 'parameters' of variation; a language 'sets' or 'fixes' the parameters according to the limited choice available. English sets the head parameter in a particular way so that heads of phrases come on the left; Japanese sets the parameter so that they come on the right. This account of the head parameter inevitably simplifies a complex issue; alternative approaches to the word order within the phrase are discussed in more detail in chapter 4. In particular it should be noted both that there are some exceptions to the notion that all phrases have the same head direction in a particular language (for example Hungarian has postpositions but head-initial NPs), and that there are claims that head direction may link to the general performance requirement for language processing that 'lighter' elements precede 'heavier' (Hawkins, 2003).
  THE HEAD PARAMETER
   Nature: a parameter of syntax concerning the position of heads within phrases, for example Nouns in NPs, Verbs in VPs etc.
   Definition: a particular language consistently has the heads on the same side of the complements in all its phrases, whether head-first or head-last. Examples:
   English is head-first:
     in the hank: Preposition head first before the complement NP in a PP amused the man: Verb head first before the complement NP in a VP Japanese is head-last:
     Watashi wa nihonjin desu (I Japanese am): V (desu) head last in a VP Nihon ni (Japan in): P (ni) head last in a PP

2 Principles, Parameters and Language Acquisition 45
Knowing the head direction of a language can provide you with important information about its structure. Here is some vocabulary in different languages; make trees for possible phrases and sentences in these languages.
Arabic (head-first): rajol (man); khobz (bread); ahabba (liked)
Chinese (head-final): zai (in); chuan (ship); shang (on); gou (dog)
Japanese (head-final): on'na no ko (girl); okaasan (mother); shiawasema (happy); miru (see)
Italian (head-first); squadra (team); scudetto (championship); vincere (win)
(Note that to get trees for whole sentences you may need information about where the subject occurs, not given here.)
Does the fact you can write trees for languages you may not know show that the head direction parameter is in fact universal?
2.2  Language acquisition
221 The language faculty
Already lurking in the argument has been the assumption that language knowledge is independent of other aspects of the mind. Chomsky has often debated the necessity for this separation, which he regards as "an empirical question, though one of a rather vague and unclear sort7 (Chomsky, 1981b, p. 33). Support for the independence of language from the rest of the mind comes from the unique nature of language knowledge. The Locality Principle does not necessarily apply to all aspects of human thinking; it is not clear how UG principles could operate in areas of the mind other than language. People can entertain mathematical or logical possibilities that are not 'local7; they can even imagine linguistic constructions that do not conform to locality by means of their logical faculties, as the asterisked sentences of linguists bear witness. Nor do the principles of UG seem to be a prerequisite for using language as communication; it might be as easy to communicate by means of questions that reverse the linear order of items as by questions that are based on allowable movement; a language without restrictions on the positions of heads in phrases might be easier to use.
  Further arguments for independence come from language acquisition; principles such as Locality do not appear to be learnable by the same means that, say, children learn to roller-skate or to do arithmetic; language acquisition uses special forms of learning rather than those common to other areas.
  Chomsky does not, however, claim that the proposal to integrate language with other faculties is inconceivable, simply that the proposals to date have been inadequate; 'since only the vaguest of suggestions have been offered, it is impossible, at present, to evaluate these proposals7 (Chomsky, 1971, p. 26). In the absence of more definite evidence, the uniqueness of language principles such as Locality points to an autonomous area of the mind devoted to language
EXERCISE
2.3
46  2 Principles, Parameters and Language Acquisition
knowledge, a 'language faculty', separate from other mental faculties such as mathematics, vision, logic, and so on. Language knowledge is separate from other forms of representation in the mind; it is not the same as knowing mathematical concepts, for example.
   Thus the theory divides the mind into separate compartments, separate modules, each responsible for some aspect of mental life; UG is a theory only of the language module, which has its own set of principles distinct from other modules and does not inter-relate with them. This contrasts with cognitive theories that assume the mind is a single unitary system, for example 'language is a form of cognition; it is cognition packaged for purposes of interpersonal communication' (Tomasello, 1999, p. 150). The separation from other faculties is also reflected in its attitude to language acquisition; it does not see language acquisition as dependent on either 'general' learning or specific conceptual development but sui generis. Thus it conflicts with those theories that see language development as dependent upon general cognitive growth; Piaget for instance argues for a continuity in which advances in language development arise from earlier acquired cognitive processes (Piaget, 1980).
   In some ways the theory's insistence on modularity resembles the nineteenth-century tradition of 'faculty' psychology, which also divided the mind into autonomous areas (Fodor, 1983). The resemblance is increased by a further step in the argument. We speak of the body in terms of organs ~ the heart, the lungs, the liver, etc. Why not talk about the mind in terms of mental organs - the logic organ, the mathematics organ, the common-sense organ, the language organ? 'We may usefully think of the language faculty, the number faculty, and others as "mental organs", analogous to the heart or the visual system or the system of motor coordination and planning' (Chomsky, 1980a, p. 39). The mistake that faculty psychology made may have been its premature location of these organs in definite physical sites, or 'bumps', rather than its postulation of their existence. On the one hand; 'The theory of language is simply that part of human psychology that is concerned with one particular "mental organ", human language' (Chomsky, 1976, p. 36); on the other; 'The study of language falls naturally within human biology' (Chomsky, 1976, p. 123). For this reason the theory is sometimes known as the biological theory of language (Lightfoot, 1982); the language organ is physically present among other mental organs and should be described in biological as well as psychological terms, even if its precise physical location and form are as yet unknown. 'The statements of a grammar are statements of the theory of mind about the 1-language, hence statements about the structures of the brain formulated at a certain level of abstraction from mechanisms' (Chomsky, 1986a, p. 23). The principles of UG should be relatable to physical aspects of the brain; the brain sciences need to search for physical counterparts for the mental abstractions of UG - 'the abstract study of states of the language faculty should formulate properties to be explained by the theory of the brain' (Chomsky, 1986a, p. 39), i.e. question (iv) on page 12 above; if there are competing accounts of the nature of UG, a decision between them may be made on the basis of which fits best with the structure of brain mechanisms.
   The language faculty is concerned with an attribute that all people possess. All human beings have hearts; all human beings have noses. The heart may be
2 Principles, Parameters and Language Acquisition 47
■;v|
■
1,,;,
damaged in an accident, the nose may be affected by disease; similarly a brain injury may prevent someone from speaking, or a psychological condition may cause someone to lose some aspect of language knowledge. But in all these cases a normal human being has these properties by definition, 'This language organ, or "faculty of language" as we may call it, is a common human possession, varying little across the species as far as we know, apart from very serious pathology' (Chomsky, 2002, p. 47). Ultimately the linguist is not interested in a knowledge of French or Arabic or English but in the language faculty of the human species. It is irrelevant that some noses are big, some small, some Roman, some hooked, some freckled, some pink, some spotty; the essential fact is that normal human beings have noses that are functionally similar. All the minds of human beings include the principle that movement is local; it is part of the common UG. It is not relevant to UG Theory that English employs locality in one way, French in another, German in another: what matters is what they have in common.
   The words 'human' or 'human being' have frequently figured in the discussion so far. The language faculty is indeed specific to the human species; no creature apart from human beings possesses a language organ. The evidence for this consists partly of the obvious truth that no species of animal has spontaneously come to use anything like human language; whatever apes do in captivity, they appear not to use anything like language in the wild (Wallman, 1992). Some controversial studies in recent years have claimed that apes in particular are capable of being taught languages. Without anticipating later chapters, it might be questioned whether the languages used in these experiments are fully human-like in incorporating principles such as Locality; they may be communication systems without the distinctive features of human language. It may be possible to learn them via other faculties than language at the animal's disposal; in a human being some aspects of language may be learnable by some other means than the language faculty; a linguist may know that movement in German is local without having any knowledge of German simply because movement in all languages is local. Similarly patterns of learning used by the animal for other purposes may be adapted to learning certain aspects of language. The danger is that this argument could evade the issue: how is it possible to tell 'proper' language knowledge gained via the language faculty from 'improper' language knowledge gained in some other way, an issue that is particularly important to second language acquisition? (This will become important in chapter 6.) Presumably only by returning to the first argument: if it embodies principles of UG and has been acquired from 'natural' evidence, then it is proper. None of the systems learnt by animals seems proper in this sense, either because they fail to reflect abstract features of language or because they are artificially 'taught'. Or, indeed, because they fail to be creative in the sense seen in the last chapter: 'animal communication systems lack the rich, expressive and open-ended power of human language' (Hauser et al., 2002, p. 1570).
   The species-specificity of UG nevertheless raises difficult questions about how it could have arisen during evolution; Piaget, for instance, claims 'this mutation particular to the human species would be biologically inexplicable' (Piaget, 1980, p. 31). However, 'the fact that we do not know how to give serious evolutionary explanation of this is not surprising; that is not often possible beyond simple cases' (Chomsky, 2000a, p. 50). While the possession of language itself clearly confers
48 2 Principles, Parameters and Language Acquisition
            an immense advantage on its users over other species, why should Locality confer any biological advantage on its possessor? Indeed one puzzle is why there are different human languages: it would seem advantageous if the whole species spoke the same language. Presumably our lack of distance from human languages makes them appear so different to us; the differences between Japanese and English C might seem trivial to a non-human alien. 'The Martian scientist might reasonably conclude that there is a single human language, with differences only at the margins' (Chomsky, 2000b, 7).
              In the 2000s Chomsky has been developing a slightly different way of think- ; ing of the language faculty in association with evolutionary biologists (Hauser et al., 2002). He now makes a distinction between the broad faculty of language (FLB) and the narrow faculty of language (FLN). The FLN 'is the abstract linguistic computational system alone, independent of the other systems with which it interacts and interfaces' while the FLB includes as well 'at least two other organism-internal systems, which we call "sensory-motor" and "conceptual-intentional"' (Hauser et al., 2002, pp. 1570-1). This proposal seems to restrict the language faculty severely, leaving little that is unique to language, possibly only the core property of recursion, which allows rules to call upon themselves and will be discussed in chapter 3.
              The distinction between FLB and FLN has proved highly controversial in that it seems to concede much of the ground to Chomsky's opponents and leave very little that is peculiar to language. The narrow language faculty is now a small unique area; the broad language faculty that includes all the language-related systems is no longer unique and much of it may be shared with the animal kingdom. Indeed, if recursion is all that is left, some have pointed out that this too is shared with other faculties - even Hauser et al. (2002) mention it is a property of natural numbers - and that not all human languages have it (Everett, 2005). On the other hand the FLN seems to have thrown the baby out with the bathwater if it casts vocabulary and phonology out of the core language faculty. For a taste of the raging debate over this issue, readers are referred to the critique by Pinker and Jackendoff (2005) and the answer by Fitch et al. (2005).
EXERCISE 1 Compare from your experience the faculties of language and mathematics  
2.4 - what knowledge you have in them, how you learnt them, and how you use        
      them in everyday life. Does this convince you they are separate or overlap?  
      2 Here are some of the 27 faculties proposed by the phrenologist Franz Gall  
in the 1790s. Which might be seen  as faculties today? Which might inter-          
face with the faculty of language?                                                 
impulse to propagation             tenderness for the offspring,                   
murder, carnivorousness            sense of cunning                                
faculty of language                sense for sounds, musical talent                
arithmetic, counting, time         metaphysical perspicuity                        
poetic talent                      recollection of persons                         
mimic                              perseverance, firmness                          
2 Principles, Parameters and Language'Acquisition 49
THE LANGUAGE FACULTY
is where the knowledge of language is stored in the individual mind is common to all human beings
is independent of other faculties such as mathematics
has unique properties of its own like Locality or recursion not shared
            with other faculties
is unique to the human species, at least in the narrow sense can be thought of as a "mental organ" that "grows"
2.2.2  States of the language faculty
Let us now relate the language faculty to the acquisition of language,* always central to the UG Theory. The language faculty can be thought of as a state of the mind, containing whatever the speaker knows at a particular point in time, the sum of all their knowledge of language, variously called a grammar or an I-language: "The internal language, in the technical sense, is a state of the faculty of language" (Chomsky, 2002, p. 48). The language faculty then comprises a computational system with principles, parameters and a lexicon all fleshed out for a particular language, as seen on the right of figure 2.1.
   Initial state     Steady state 
(Universal Grammar)  (I-language) 
Figure 2,1 The states model of the development of the language faculty
But children are not born with the knowledge of all the lexical items in the language. In the initial state, the parameters have not been set, the lexical items have not been learnt etc., as seen in the left of figure 2.1; this is the language faculty with the minimal contents - whatever aspects of language are intrinsic to the human mind, that is to say UG. The two extreme states of the language faculty are then the final state when the mind knows a complete I-language and the initial state when it knows only the principles. Language acquisition comes down to how the human language faculty changes from the initial to the final state, how children acquire all the knowledge of language seen in the adult.
Let us sum up this section in Chomsky"s own words: "the language organ is the faculty of language (FL); the theory of the initial state of FL, an expression of the genes, is universal grammar (UG); theories of states attained are particular grammars; the states themselves are internal languages, ""languages"" for short" (Chomsky, 2005b, p. 145). The child"s mind starts in the initial state of the language faculty (alias UG); this contains only whatever is genetically determined about language -principles such as Locality etc. This in itself takes a controversial position about the innate elements in language, as we shall see. The language faculty achieves adult knowledge of language, complete with parameter settings and lexicon for
1
50 2 Principles, Parameters and Language Acquisition
          a particular language, by getting certain types of information about the structures and vocabulary of the language it is exposed to. Put another way, UG gets instantiated as the knowledge of a particular language; the language faculty still has the same structure but the syntactic peculiarities and lexical items of a particular language have become attached to it. In between the beginning and end positions the language faculty evolves through a number of states, each of them a possible language of its own. So language acquisition amounts to the child's mind fleshing out the skeleton of language knowledge already present in its mind with the material provided by the environment. This 'states' view of acquisition sees the whole language faculty as involved: the language faculty incorporates the information about the specific language within itself to get a grammar of a particular language.
            We can now elaborate this in figure 2.2. The grammar is a state of UG, the faculty of language in the human mind, not a product of UG.
    Initial state S0      Si S2 S3 S4 Sn      Steady state Ss      
   (Universal Grammar)                         (I-language)        
Principles: Locality etc.                Principles: Locality etc. 
Parameters: head-initial/                Parameters: head-initial/ 
  -final (unset), etc.                      -final (set), etc.     
   Lexicon (unfilled)                        Lexicon (filled)      
Figure 2.2 The development of the language faculty: zero to final states
           In the beginning of language acquisition is the mind of the new-born baby who knows no language, termed the initial (zero) state or SD, containing nothing but UG itself. At the end is the mind of the adult native speaker with full knowledge of the language, including the principles, parameter settings and lexicon. This final state is, to all intents and purposes, static; the speaker may become more or less efficient at using language or may add or lose a few vocabulary items - step-change suddenly entered British people's vocabulary after a single politician's remark - but linguistic competence is essentially complete and unchanging once it has been attained. To sum up in Chomsky's words, the faculty of language 'has a genetically-determined initial state S0, which determines the possible states it can assume' (Chomsky, 2001b, p. 1).
           While for many purposes it is convenient to look at just the initial and final states of the acquisition process, the language faculty goes through many intervening states while children are acquiring the language. 'The initial state changes under the triggering and shaping effect of experience, and internally determined processes of maturation, yielding later states that seem to stabilise at several stages, finally at about puberty' (Chomsky, 2002, p. 85). Figure 2.2 therefore shows a series of states - Su S2, S3/ S4 ... Sn - intervening between the initial state S0 and the steady state Ss, each of them a possible state of the language faculty incorporating a knowledge of principles, parameters and the lexicon. Acquiring language means progressing from not having any language, S0, to having full competence, Ss.
2 Principles, Parameters and Language Acquisition 51
      2.2.3  Behaviourism
      To some extent, Chomsky's ideas are now so taken for granted that their originality has been obscured. For, prior to Chomsky's work of the late fifties, language was not considered to be what people knew but how they behaved, as incorporated in the structuralist linguistics tradition best known from Bloomfield's book Language (Bloomfield, 1933). Bloomfield saw language acquisition as initiated by the child more or less accidentally producing sounds such as da; these sounds become associated with a particular object such as a doll because of the parents' reactions, so that the child says da whenever a doll appears; then the child learns to talk about doll even when one is not present - 'displaced speech'. The adult is a crucial part of the process; the child would never learn to use da for 'doll' without the adult's reaction and reinforcement. This Bloomfieldian version of language acquisition was the commonplace of linguistics before Chomsky.
        What Chomsky specifically repudiated, however, was the more sophisticated behaviourist theory of B. F. Skinner, put forward in Verbal Behavior (Skinner, 1957), a sympathetic account of which can be found in Paivio and Begg (1981). Skinner rejected explanations for language that were inside the organism in favour of explanations in terms of outside conditions. Language is determined by stimuli consisting of specific attributes of the situation, by responses the stimuli call up in the organism, and by reinforcing stimuli that are their consequences. Thus the object 'doll' acts as a stimulus for the child to respond Doll, which is reinforced by the parents saying Clever girl! or by them handing the child the doll. Or the child feels thirsty - the stimulus of (milk-deprivation' - responds by saying Milk, and is reinforced with a glass of milk. As with Bloomfield, language originates from a physical need and is a means to a physical end. The parents' provision of reinforcement is a vital part of the process.
        Chomsky's classic critique of Skinner in his review of Verbal Behavior (Chomsky, 1959) presaged many of his later ideas. Chapter 1 here introduced the key Chomskyan notion of creativity; people regularly understand and produce sentences that they have never heard before, say Daeman faxed into solidity near Adas home (Simmons, 2004). How could they be acting under the control of stimuli, when, outside science fiction, they have never encountered the concept of a human being being faxed or even seen faxed used as an intransitive verb? To take Chomsky's examples, you do not need to have experienced the situation before to take appropriate action if someone says The volcano is erupting.
        Nor is a stimulus usually as simple and unambiguous as milk-deprivation or a volcano erupting. Chomsky imagines the response of a person looking at a painting. They might say Dutch or Clashes with the wallpaper, I thought you liked abstract art, Never saw it before, Tilted, Hanging too low, Beautiful, Hideous, Remember our camping trip last summer?, or anything else that comes to mind. One stimulus apparently could have many responses. There can be no certain prediction from stimulus to response. One of the authors went to a supermarket in Kassel, prepared with language teaching cliches for conversations about buying and selling; the only German that was addressed to him was:
52 2 Principles, Parameters and Language Acquisition
EXERCISE
2.5
            (53)  Konnten Sie mir bitte den Zettel vorlesen, weil ich meine Brille zu Hause vergessen habe?
                  Could you please read the label to me because I've left my glasses at home?
          In other words, human language is unpredictable from the stimulus. The important thing about language is that it is stimulus-free, not stimulus-bound ~ we can say anything anywhere without being controlled by precise stimuli.
            It is also hard to define what reinforcement means in the circumstances of children's actual lives, rather than in the controlled environment of a laboratory. The child rarely encounters appropriate external rewards or punishment; 'it is simply not true that children can learn language only through "meticulous care" on the part of adults who shape their verbal repertoire through careful differential reinforcement' (Chomsky, 1959, p. 42). The burden of Chomsky's argument is, on the one hand, that Skinnerian theory cannot account for straightforward facts about language, on the other, that the apparently scientific nature of terms such as 'stimulus' and 'response' disguises their vagueness and circularity. How do we know the speaker was impressed by the 'Dutchness' of the painting rather than by some other quality? Only because they said Dutch: We are discovering the existence of a stimulus from the response rather than predicting a response from a stimulus. This early demolition of Skinner still remains Chomsky's main influence on psychology, rather than his later work; introductions to psychology seldom mention post-1965 writing.
           People have often distinguished language-like behaviour from language behaviour. If you took the following activities, which could you say were true language, which language-like?
           -  riding a bicycle
           -  doing mental arithmetic
           -  reading a map
           -  improvising an epic poem
           -  praying
           -  miming how to do a task
           ~ exclaiming 'ouch' when you hit your finger
           -  finding your way in a maze
           Do you feel this shows a clear distinction between language and non-language processing by the individual, as Chomsky claims, or that the same processes are involved in language as in other cognitive operations?
         22 A The Language Acquisition Device and levels of adequacy
         A more familiar way of thinking about language acquisition put forward by Chomsky is in terms of a Language Acquisition Device (LAD). Chomsky (1959, p. 58) provided the germ out of which this model grew; 'in principle it may be
2 Principles, Parameters and Language Acquisition 53
     possible to study the problem of determining what the built-in structure of an information-processing (hypothesis-forming) system must be to enable it to arrive at the grammar of a language from the available data in the available time". Chomsky (1964) put this metaphorically as a black box problem. Something goes into a black box, something comes out. By looking at the input and the output, it is possible to arrive at some understanding of the process concealed inside the box itself. Suppose we see barley and empty bottles going in one door of a Speyside distillery, crates of Scotch whisky coming out the other; we can deduce what is going on inside by working out what must be done to the barley to get whisky. Given a detailed analysis of the whisky and of the barley, we could deduce the processes through which one is transformed into the other.
       Children hear a number of sentences said by their parents and other caretakers - the primary linguistic data; they process these within their black box, the LAD, and they acquire linguistic competence in the language - a generative grammar" in their minds. We can deduce what is going on inside the child's LAD by careful examination and comparison of the language input that goes in - the material out of which language knowledge is constructed - and the knowledge of language that comes out - the generative grammar. "Having some knowledge of the characteristics of the acquired grammars and the limitations on the available data, we can formulate quite reasonable and fairly strong empirical hypotheses regarding the internal structure of the language acquisition device that constructs the postulated grammars from the given data" (Chomsky, 1972a, p. 113). In the case of the whisky, we could go into the distillery to check on our reasoning and see just what is going on inside; it is not of course possible to open the child"s mind to confirm our deductions in the same fashion; the black box of the mind cannot be opened.                            ..
       The model of this process proposed by Chomsky (1964) was as shown in figure 2.3, adapted slightly.
                Input (primary             Language
                linguistic data)       * Acquisition
                                                                                                        Device
                                                                                                          Output (a > generative grammar)
      Figure 2.3 The Language Acquisition Device model of first language acquisition
     The LAD is "a procedure that operates on experience acquired in an ideal community and constructs from it, in a determinate way, a state of the language faculty" (Chomsky, 1990, p. 69). Since essentially all human children learn a human language, it has to be capable of operating for any child anywhere; it must tackle the acquisition of Chinese as readily as the acquisition of English, the acquisition of Russian as readily as that of Sesotho,
       The LAD conceptualization was a powerful metaphor for language acquisition within UG theory. McCawley (1992) indeed insists that it is true of any theory of language acquisition, not just UG. It embodied the central tenet of the theory by treating language acquisition as the acquisition of knowledge. While this may now seem obvious, it was nevertheless at odds with the accounts of language
54 2 Principles, Parameters and Language Acquisition
          acquisition based on behaviour that had been provided before the 1960s. The LAD metaphor said that it was not how children behaved that mattered; it was not even what they actually said: it was what they knew.
            The LAD led to a neat way of putting the goals of linguistics in terms of three 'levels of adequacy' (Chomsky/ 1964)/ foreshadowing the goals of linguistics described in chapter 1.
          •  Observational adequacy is the first level that a linguistic theory has to meet: a theory is observationally adequate if it can predict grammaticality in samples of language/ that is to say in the primary linguistic data of adult speech as heard by the child/ otherwise called the input to the LAD.
          •  Descriptive adequacy is the second level: a theory achieves descriptive adequacy if it deals properly with the linguistic competence of the native speaker/ i.e. the generative grammar output from LAD.
          •  Explanatory adequacy is the third level: a theory is explanatorily adequate if the linguistics theory can provide a principled reason why linguistic com-petence takes the form that it does/ i.e. if it can explain the links between linguistic competence and primary linguistic data that are concealed within the LAD itself.
            Explanatory adequacy was presented as a method of deciding between two descriptions of linguistic competence both of which seem descriptively and observationally adequate - mathematically speaking a not unlikely event given that an infinite number of descriptively adequate grammars is possible. The preferred description of the output grammar, whenever there is a choice, is the one that children can learn most easily from the language data available to them. A proper linguistic theory has then to meet' "the condition of explanatory adequacy": the problem of accounting for language acquisition' (Chomsky, 2000a, p. 13).
            The 1964 LAD model can be accommodated within P&P Theory to some extent. The LAD itself is synonymous with the language faculty, i.e. UG. 'In general, the "language acquisition device" is whatever mediates between the initial state of the language faculty and the states it can attain, which is another way of saying it is a description of the initial state' (Chomsky, 2000a, p, 55). The linguistic competence output that emerges from LAD consists of a grammar couched in principles and parameters form; the knowledge that the child needs to acquire consists of the setting of the head parameter, etc. The grammar contains the appropriate parameter settings and has thousands of lexical entries specifying how each word can behave in the sentence. The translation into UG terms can be expressed in figure 2.4, now conventional in slight variations in the literature, for example Haegeman (1994, p. 15) and Atkinson (1992, p. 43).
                                                            Input
                                                        (primary linguistic data)
Universal
Grammar
           Output
^ (a grammar consisting of principles, parameters and lexicon)
            Figure 2.4 The Universal Grammar model of first language acquisition
2 Principles, Parameters and Language Acquisition 55
        The UG revision to the LAD model alters the relative importance of the levels of adequacy. Explanatory adequacy had seemed an ideal but fairly distant goal; most energy had gone into the descriptions themselves. Acquisition seldom had a real role in deciding on the right linguistic theory. The UG version with principles and parameters, however, integrated acquisition with the description of grammar by making explanatory adequacy central; the description of the grammar goes hand in hand with the explanation of how it is learnt. In principle any element in the grammar has to be justified in terms of acquisition; any principle or parameter that is proposed for the speakers knowledge of syntax has to fit into an account of acquisition. So all the technical apparatus of P&P Theory must ultimately be integrated with the theory of language acquisition.
        Chomsky and other researchers essentially switch between these two metaphors of the changing state and the input/output box, assuming them to be equivalent. However, the changing state metaphor sees UG itself as the initial grammar, which evolves over time; the input/output metaphor implies an unchanging UG producing a series of grammars distinct from itself. As we see below, the discussion of second language acquisition reveals some contradictions between the two metaphors (Cook, 1993).
     LEVELS OF ADEQUACY
        •  Observational adequacy; faithfulness to the primary linguistic data of | adult speech
        •  Descriptive adequacy: faithfulness to the linguistic competence of the native speaker
        •  Explanatory adequacy: faithfulness to the acquisition of linguistic competence
     2.2.5 The poverty-of-the-stimulus argument
     The black box LAD model led to further interesting ideas about acquisition. To return to the distillery, barley is going in and whisky is coming out, but where does the water come from that makes up 43 per cent of distillery-strength Scotch? Anything that comes out of the distillery that has not been seen to go in must have originated within the distillery itself. So there is presumably a source of water inside the distillery that the observer can't see from the outside.
        Suppose, however, something comes out of the LAD that didn't go in: where could this ingredient come from? This is the conundrum called 'Plato's problem', which is at the heart of Chomskyan ideas of language acquisition: 'How do we come to have such rich and specific knowledge, or such intricate systems of belief and understanding, when the evidence available to us is so meagre?' (Chomsky, 1987). The answer is that much of our linguistic knowledge must come from the internal structure of the mind itself, in terms of the distillery metaphor a hidden well. If the adult's grammar Ss incorporates principles that could not be constructed
56 2 Principles, Parameters and Language Acquisition
          from the primary linguistic data then they must have been added by the mind itself. The things that are missing from the input are added by the mind: the black box is not just processing the input but contributing things of its own.
            Our knowledge of language is complex and abstract; the experience of language we receive is limited. Human minds could not create such complex knowledge on the basis of such sparse information. It must therefore come from somewhere other than the limited evidence they encounter. Plato's own solution was to say that the knowledge originated from memories of prior existence; Chomsky's solution is to invoke innate properties of the mind: it doesn't have to be learnt because it is already there. This argument has a clear and simple form: on the one hand there is the complexity of language knowledge, on the other there are the impoverished data available to the learner; if the child's mind could not create language knowledge from the data in the surrounding environment, given plausible conditions on the type of language evidence available, the source must be within the mind itself. This is therefore known as the poverty-of-the-stimulus argument, meaning that the data in the stimulus are too meagre to justify the knowledge that the mind builds out of them.
            Let us go over the poverty-of-the-stimulus argument informally before putting it in more precise terms. Part of the linguistic competence of a native speaker is demonstrably the principle of Locality, sketched above. Research reported in Cook (2003) asked for grammaticality judgements on sentences involving different parameters from first language (LI) speakers of English and L2 speakers from several backgrounds. Native speakers of English indeed rejected sentences involving rules that violated Locality in questions, such as:
            (54)  *Is Sam is the cat that black?
          99.6 per cent of the time. But how could they have learnt this judgement from their parents? What clues might children hear that would tell them that English obeys Locality? Children never hear examples of English sentences that violate this principle since these do not exist outside the pages of linguistics books. Nor is it likely that parents correct them when they get such things wrong, partly because children do not produce such errors, partly because parents would probably not know what they were if they did.
            Perhaps it is just the sheer unfamiliarity of the sentence that offends them. Yet native speakers encounter new and strange sentences all the time, which they immediately accept as English, even if they do not fully understand them, say:
            (55)  On later engines, fully floating gudgeon pins are fitted, and these are retained in the pistons by circlips at each end of the pin. (Haynes, 1971, p. 29)
          It is not that a sentence that breaches grammatical principles is novel or necessarily incomprehensible: we know it is wrong. The child has been provided with no clues that the Locality Principle exists - but nevertheless knows it. The source of the Locality Principle is not outside the child's mind since the environment does not provide any clues to its existence. As this constraint is part of the grammar
2 Principles, Parameters and Language Acquisition 57
       that came out of the black box but it did not go in as part of the input it must be part of the black box itself: the Locality Principle must already have been present in the child's mind. Thus it is an innate aspect of the human language faculty. There are four steps to the poverty-of-the~stimulus argument (Cook, 1991):
       Step A: A native speaker of a particular language knows a particular aspect of syntax. The starting point is then the knowledge of the native speaker. The researcher has to select a particular aspect of language knowledge that the native speaker knows, say the Locality Principle, or any other aspect of language knowledge.
       Step B: This aspect of syntax could not have been acquired from the language input typically available to children. The next step is to show that this aspect of syntax could not have been acquired from the primary linguistic data, the speech that the child hears. This involves considering possible sources of evidence in the language the child encounters and in the processes of interaction with parents. These will be itemized in greater detail below.
       Step C: We conclude that this aspect of syntax is not learnt from outside. If all the types of evidence considered in Step B can be eliminated, the logical inference is that the source of this knowledge does not lie outside the child's mind. Step D: We deduce that this aspect of syntax is built in to the mind. Hence the conclusion is that the aspect of syntax must originate within the child's mind. Logically, as the aspect did not enter from without, it must come from within, i.e. be a built-in part of the human language faculty.
        Steps C and D are kept distinct here because there could be explanations for things which are known but not learnt other than the innate structure of the mind. Plato's memories of previous existence are one candidate, telepathy and morphogenesis modern ones.
Step A: A native speaker of a particular language knows a particular aspect of syntax.
Step B: This aspect of syntax could not have been acquired from the language input typically available to children.
Step C: We conclude that this aspect of syntax is not learnt from outside.
Step D: We deduce that this aspect of syntax is built in to the mind.
       Figure 2.5 Steps in the poverty-of-the-stimulus argument for first language acquisition
         The steps of this argument can be repeated over and over for other areas of syntax. Whatever the technical details of the syntax, the argument still holds. It may be, as Rizzi (1990) has argued, that Locality derives from other general principles of syntax. But the point is still true: if no means can be found through which the child can acquire language knowledge from the usual evidence he or she may receive then it must be built in to the mind, however controversial or uncertain the linguist's analysis itself may be. The poverty-of-the-stimulus argument
58 2 Principles, Parameters and Language Acquisition
             is fundamentally simple; whenever you find something that the adult knows which the child cannot in principle acquire, it must already be present within the child. Indeed the form of the poverty-of-the-stimulus argument has been used in areas other than language. Several religions for example claim that the world is so beautiful or so complex that it could not have come into existence spontaneously and must therefore be due to a creator; this 'argument by design' was used by Paley (1802, quoted in Gould, 1993) as a stick with which to beat evolutionary theories, revived to some extent by the recent arguments for intelligent design.
                The crucial steps in the argument are: first that some aspect of language is indeed part of the native speaker's linguistic competence; second that the child does not get appropriate evidence. In a critical review, Pullum and Scholz (2002, p. 9) have teased out a series of separate claims within the poverty-of-the-stimulus argument, put in their terms as:
                -  Step A addresses the 'acquirenduin' - establishing what the speaker knows;
                -  Step B concerns the 'lacunae' ~ what sentences are missing from the language input;
                -  Step C involves 'inaccessibility' - evidence that the lacuna sentences are not actually available to the learner - and 'positivity' - the unavailability of indirect negative evidence;
                -  Step D relies on 'indispensability' - the argument that the acquirendum could not be learnt without access to the lacunae. In other words 'if you know X, and X is underdetermined by learning experience, then the knowledge of X must be innate'. (Legate and Yang, 2002, p. 153)
                Hence the apparent simplicity and clarity of Chomsky's poverty-of-the-stimulus argument may in fact depend on a number of sub-issues that need to be discussed separately.
EXERCISE Try to devise poverty-of-the-stimulus arguments for other areas of human devel-2.6          opment such as:
             -  knowledge of gravity
             -  knowledge of mathematics
             -  knowledge of cookery
             -  knowledge of vision
             Does this convince you that the argument works for language?
           2.2.6   The Principles and Parameters Theory and language acquisition
           The overall model of language acquisition proposed by Chomsky can be put quite simply. UG is present in the child's mind as a system of principles and parameters. In response to evidence from the environment, the child creates a core grammar Ss that assigns values to all the parameters, yielding one of the allowable
                                                             2 Principles, Parameters and Language Acquisition 59
         human languages - French, Arabic, or whatever. To start with, the child's mind is open to any human language; it ends by acquiring one particular language. What the language learner must do 'is figure out on the basis of his experience, what options his language has taken at every choice point' (Rizzi, 2004, p. 330). The principles of UG are principles of the initial state, S0. No language breaches them; since they are under determined by what the child hears, they must be present from the beginning. They are not learnt so much as 'applied'; they are automatically incorporated into the child's grammatical competence. The resemblances between human languages reflect their common basis in principles of the mind; Japanese incorporates Locality, as do English and Arabic, because no other option is open to the child. While the discussion here concentrates on the acquisition of syntax, Chomsky extends the argument to include 'fixed principles governing possible sound systems' (Chomsky, 1988, p. 26) and 'a rich and invariant conceptual system, which is prior to any experience' (Chomsky, 1988, p. 32); these guide the child's acquisition of phonology and vocabulary respectively.
            Acquiring a language means setting all the parameters of UG appropriately. As we have seen, these are limited in number but powerful in their effects. To acquire English rather than Japanese, the child must set the values for the head parameter, and a handful of other parameters. The child does not acquire rules but settings for parameters, which, interacting with the network of principles, create a core grammar. 'The internalised I-language is simply a set of parameter settings; in effect, answers to questions on a finite questionnaire' (Chomsky, 1991b, p. 41). Rather than a black box with mysterious contents, Chomsky is now proposing a carefully specified system of properties, each open to challenge.
            In addition to the core grammar, the child acquires a massive set of vocabulary items, each with its own pronunciation, meaning and syntactic restrictions. While the acquisition of core grammar is a matter of setting a handful of switches, the child has the considerable burden of discovering the characteristics of thousands of words. 'A large part of "language learning" is a matter of determining from presented data the elements of the lexicon and their properties' (Chomsky, 1982, p. 8). So the child needs to learn entries that specify that sleep is a Verb which requires a subject; that give is a Verb that requires a subject, an object and an indirect object; the referential properties of himself; and so on for all the items that make up the mental lexicon of a speaker of English.
            As well as those aspects derived from UG principles, the child acquires parts of the language that depart from the core in one way or another, for example the irregular past tense forms in English such as broke and flew. Grammatical competence is a mixture of universal principles, values for parameters, and lexical information, with an additional component of peripheral knowledge for, say constructions like the more the merrier. Some of it has been present in the speaker's mind from the beginning; some of it comes from experiences that have set values for parameters in particular ways and led to the acquisition of lexical knowledge. To sum up in Chomsky's words: 'what we "know innately" are the principles of the various subsystems of S0 and the manner of their interaction, and the parameters associated with these principles. What we learn are the values of the parameters and the elements of the periphery (along with the lexicon to which similar considerations apply)' (Chomsky, 1986a, p. 150).
60 2 Principles, Parameters and Language Acquisition
Discussion topics
1  What could count as evidence that something is or isn't part of the initial state of the language faculty built in to the human mind?
2  What do you think is the proper goal of linguistics?
3  Feyerabend (1975) sees science as based on powerful arguments rather than evidence. Is there a way out of Chomsky's poverty-of-the-stimulus argument?
4  Is any major type of evidence overlooked in the discussion that could account for the child's acquisition of principles of UG?
5  To what extent are the arguments that Chomsky proposed against Skinnerian behaviourism valid against modern psychological approaches to language acquisition such as usage-based acquisition (Tomasello/ 2003) or connectionism (Rumelhart and McLelland, 1986)?
6  Recent work has tended to extend the areas of language that animals can cope with, for example showing rats are capable of distinguishing Dutch from Japanese (Toro et alv 2005). Does this affect Chomsky's insistence that language is peculiar to human beings? Or is it simply part of the FLB?
3  Structure in the Government/ Binding Model
The previous chapter introduced three key concepts which have been present in all the stages of development of Chomsky's linguistics since the early 1960s:
•  the lexicon, which stores all idiosyncratic information about the words of the language, in the form of lexical entries;
•  phrase structure rules, which combine lexical elements to form basic structures; and
•  movement rules, which shift elements about in structures (displacement).
  The immediate consequence of having movement rules is the recognition that there are two levels at which the structure of any sentence can be described: the level before the movement takes place and the level formed after the movement has happened. So the sentence:
   (1)     What film did you see? clearly requires an underlying level:
   (2) You saw what film?
with what film in its original position.
  At the level after things have been moved about, seen in sentence (1), the elements are sitting in positions which are in closer accord with the linear order in which the sentence is actually pronounced; i.e. after movement what film occurs at the beginning of the sentence as it does in speech. For this reason this level was initially called surface structure. The level before movement seen in sentence
(2) is in some ways more abstract than surface structure, with elements sitting in positions different from those indicated by their usual pronounced order, and as such it represents the analysis of structure deeper in the system. It was therefore originally called deep structure. However, the connotations of the terms 'deep7 and 'surface7 were misleading and the more neutral terms D-structure and S-structure were later adopted.
62 3 Structure in the Govemment/Binding Model
  The basic form of the grammar, incorporating the relationship of D-structure to S-structure via movement, is represented in figure 3.1,
S-structure
Figure 3,1 The basic form of the grammar 1980s~90s
  This then attempts to model the computational system that the mind uses to bridge sounds and meanings, as seen in the models in chapter 2. Its core remained virtually unchanged from the mid-1960s till the 1990s.
3.1  The heart of the Govemment/Binding Model
This simple model of the grammar brings out ideas that were to have a profound effect on its development in the 1980s. In particular the grammar is split up into parts that have their own specific roles to play, in Govemment/Binding (GB) Theory known as the modules, two of which are shown in figure 3.1. Perhaps confusingly, some of these modules were called theories, e.g. Binding Theory.
  This chapter and the next introduce the modules of GB Theory and the phenomena that each was proposed to account for. This chapter considers those modules relevant for D-structure; the next chapter moves on to those relevant to S-structure, i.e. those which interact with movement.
  First we need to outline the modular nature of the grammatical system as a whole and the reasons for it being named Principles and Parameters (P&P) Theory, briefly outlined in chapters 1 and 2.
3.2  Modules, principles and parameters
Prior to GB Theory, at the stage of development around 1970 often referred to as Standard Theory, all grammatical rules were thought to belong to one of the three components of the grammar outlined in figure 3.1. Specifically there were phrase structure rules (= structure rules) and transformational rules (= movement rules), familiar in slightly different forms from the early days of Syntactic Structures (Chomsky, 1957). However, work throughout the 1970s indicated that other
3 Structure in the Govemment/Binding Model 63
types of rule were needed. These 'rules' exert a moderating effect on the existing rules, constraining their actions and allowing them to be greatly simplified.
3.2.1   Phrase structure rules and their restrictions
Let us start with phrase structure rules. To recap chapter 2, Chomsky's breakthrough in the 1950s was to devise a way of representing grammatical rules as rewrite rules, i.e. instructions about how to generate structures for sentences. A typical basic set of rewrite rules was presented in chapter 2:
   (3) (i)     S -» NP VP
         (ii) VP ^ V NP
         (iii) NP Det N
Following these rules, we can generate the structure in (4):
   (4)                   S
NP        VP
Det       N V           NP
Det       N
At the bottom of the tree comes a set of category labels, namely Det, N and V, known as terminal nodes, to which we can attach elements from the lexicon of the appropriate category, i.e. an N might be dog or politician, a V chew or bribe, a Det a or the. So this structure can be associated with sentences such as:
   (5)                    S
NP           VP
Det   N   V        NP      
the dog chewed Det       N 
               the slipper 
These three rules describing the structure (4) allow for a vast number of sentences, limited only by the number of Nouns, Verbs and Determiners in the lexicon of the language.
64 3 Structure in the Government/Binding Model
   We saw in chapter 2 that these rules (3i-iii) are language specific, in as much as they generate the structures of English - other languages need different sets of rewrite rules - and construction specific - each one describes the structure of a particular phrase, such as the Verb Phrase, rather than of all phrases of English - Verb Phrases, Noun Phrases and all the other phrases.
   It might be a good idea to collapse some of these rules to produce something more general. For example, alongside the VP rule (3ii) which requires the Verb to have an object, i.e. to be transitive, we also need to deal with intransitive verbs, which lack objects, for example:
   (6) The baby sleeps.
The following rule describes sentences where the verb is the only element in the VP:
   (7) VP-^V
We now have two VP rules (3ii) and (7). However, these two rules can be collapsed into a single, more general, rule by introducing a notation to show that the object in the VP is optional by enclosing it in brackets. The brackets round the NP in (8) therefore indicate its optionality; the object NP may, or may not, be present in the VP:
   (8) VP —> V (NP)
Two rules (3ii) and (7) have been collapsed into one (8) through the concept of optional elements, yielding two possible outputs of the rule with or without an NP.
   Obviously, whether or not the VP has an object depends on which verb is selected to be inserted into the V terminal node: chew requires an object, sleep does not. This is handled by a lexical insertion rule, which takes note of the structural context in which a particular lexical element can appear. 'Base rules generate D-structures (deep structures) through insertion of lexical items into structures generated by [phrase structure rules], in accordance with their feature structure' (Chomsky, 1981a, p. 5). Thus, if there is an object following the verb in the structure, the lexical insertion rule will only insert transitive verbs such as chew into the V terminal node and, if there is no object, the insertion rule will only insert intransitive verbs such as sleep:
   (9)                S                                             S
Det       N V           NP               Det       N         V
3 Structure in the Government/Binding Model 65
The lexical entries for verbs must therefore indicate whether they are intransitive or transitive. This is done by the subcategorization frame of the lexical entry in the lexicon, which simply states the contextual conditions under which the lexical item may be inserted into a structure:
   (10) chew: [___NP]
         sleep: [_0]
The lexical entries in (10) state that chew is a transitive verb and hence appears in a position preceding an NP, shown in square brackets with underlined space
to indicate the position of the item [_NP]. Sleep, however, is an intransitive verb
and appears in positions in which there is no following complement (indicated by 0). We can see that the grammar needs to take into account not only the category of each item - Nouns, Verbs, etc. - but also the subcategories of each category that link it to particular constructions - whether a Verb is followed by an object or not, and so on. Accounting for the behaviour of particular lexical items in sentences then became more and more crucial to the model.
   As Chomsky (1970) pointed out, this system contains a number of redundancies and it is also incapable of capturing generalizations across different constructions. For one thing, the subcategorization frames of lexical verbs duplicate the information in the phrase structure rule (8) that VPs can contain verbs and NPs or just verbs: on the one hand is a rule saying a VP contains a Verb and an optional NP; on the other lexical entries that specify chew is followed by an NP, sleep is not. It is unavoidable that this information be included in the lexicon, as whether or not a particular lexical element subcategorizes for an object is an idiosyncratic property of that lexical item and not a property of the grammar as such. For this reason, we should look to the phrase structure rules for a way to eliminate the redundancy. Moreover, the context sensitivity of the lexical insertion rules is applicable not only to transitive and intransitive verbs, but to verbs with any kind of complement and indeed to any category which has complements, i.e. nouns, adjectives and prepositions as well as verbs. In English, the complement always follows the relevant lexical element, as pointed out in the discussion of the head parameter in chapter 2, But the kinds of rewrite rules discussed here cannot capture such generalizations as they are construction specific - i.e. they concern VPs, NPs, APs and PPs separately rather than all phrases. The fact that rule (8) says Verbs precede Noun Phrases in the Verb Phrase tells us nothing about the structure of Noun Phrases.
   Chomsky (1970) introduced a structural notation called the X-bar notation which addressed both of these issues. The idea was to take all the lexical material out of the phrase structure rules. When a lexical item was put into a structure, it would bring along all the informatioh from its lexical entry: lexical information would be projected into the structure from the lexicon. Tn general, the phrase structure rules expressing head-complement structure can be eliminated apart from order by recourse to a projection principle, which requires lexical properties be represented by categorial structure in syntactic representations: If claim takes a clausal complement as a lexical property, then in syntactic representations it must have a clausal complement" (Chomsky, 1986a, p. 82).
66 3 Structure in the Goverhment/Binding Model
   The phrase structure rules that are left after this change are very general as they do not need to rely on lexically specific material, such as category and subcategorization information. They also make the general statement that a head of a phrase, the lexical element which projects its categorial nature onto the phrase, precedes its complement, i.e. the element determined by the head's subcategorization frame:
   (11)  X' X YP
This rewrite rule says that a constituent of type X' is made up of a head, which is also of type X, and its following complement, YP. We will return to the significance of the prime "', pronounced 'bar7, following the X in (11) a little later, but for now take X7 merely to be the phrasal expansion of X. X and Y can then stand for any of the four lexical categories used in the theory, namely N (Noun), V (Verb), A (Adjective) or P (Preposition).
   The values for X and Y in any particular structure will depend on which lexical element is inserted into the structure: if X is a transitive verb, then X will be V, and consequently X7 will be V7 (a verb phrase), and Y will be N as transitive verbs have nominal complements in the form of NPs. As the X-bar rule in (11) makes no reference to lexically specific information, this notation eliminates the redundancy of the previous system. It also enables us to make structure-general statements, as anything which is stated about 7X7 is also stated about N, V, A and P simultaneously. So now all we need to do is state that English complements always follow the head X and this statement will be applicable to all nouns, verbs, adjectives and prepositions; rather than having four rewrite rules for four different phrases, one general statement about all phrases suffices. This single powerful rule (11) now includes all the syntactic information that was included in rules (3i) and (3ii), and many others besides regardless of whether an NP or a VP, etc. is concerned.
   Chomsky's motivation for the 'bar' part of X-bar Theory was to be able to make further statements that would apply to all constructions in general. The rule in (11) introduces two projection levels: structural elements which receive specific properties projected from lexical material. The first level is the head X, also known as the zero level projection or X°. Above this we have the X7, the first projection of the head. The bar, then, indicates the level of the projection. Chomsky claimed that there is a further projection level required, an X77 (pronounced 'X double bar7). This is introduced by the following rule:
   (12)  X" specifier X7
Thus the X" contains the X7 and a preceding element known as the specifier of X7. The full structure produced by rules (11) and (12) is as follows:
   (13)           X77
specifier         X7
X       YP
3 Structure in the Government/Binding Model 67
X" is the last projection level of the head, equivalent to what we have been referring to as the phrase (XP). As there are no further projections after this, X" is also called the maximal projection.
   To take some examples of the different types of phrases, the structure captured in rules (11-12) and seen in (13) could be a Verb Phrase:
   (14) '          V"
specifier         V'
            has eaten the peas a Noun Phrase:
    (15)            N"
          specifier         N'
            the peas on the plate an Adjective Phrase:
    (16)
          specifier        A'
very fond of peas
(Note that the triangle is a convention to show that a structure has not been given in full.) Or it could be a Preposition Phrase:
   (17)            P"
specifier          P'
P     NP
             just with a fork
68   3 Structure in the Government,/Binding Model
   The specifier position was assumed to be the place for determiner-like elements - determiners in NP, perhaps auxiliary verbs in VP, and degree adverbials in AP:
   (18)  a, the slipper
         b.   has slept
         c.   too far
Specifiers differ from complements in that they precede rather than follow the head in English. They are also not subcategorized elements in that they can appear with any head of the relevant type and are not restricted by the head's lexical requirements. Other elements were claimed to occupy the specifier position, notably the possessor in the NP, which is in complementary distribution with the determiner, i.e. you can have one or the other but not both - there is, for instance, never an NP starting the his - and hence they seem to occupy the same position:
   (19)  a. the flight to London
         b.   his flight to London
         c.   *the his flight to London
   The main point of the structural hierarchy introduced by the X" is that it allows structural generalizations to extend even further. Compare the two structures below:
The creation of a specifier position highlights the similarity between the subject of the clause he and the possessor of the N his, which can be captured in a structural way: both are directly under the maximal node of the structure itself (S in the first case and N" (NP) in the second) and both are excluded from the rest of the structure (VP/V' in the first and N' in the second), which contains the main semantic element of the construction (the verb in the first and the noun in the second). Throughout the 1960s it had often been noted that subjects and possessors behave similarly; introducing the specifier into the theory allowed the statement of general rules which could affect subjects and possessors alike.
3 Structure tn the Government[Binding Model 69
   During the 1970s, X-bar Theory defined the possible types of rules for the phrase structure part of the grammar: the phrase structure rules were essentially retained from earlier models, subject to the restrictions of X-bar Theory (Jackendoff, 1977). It was still supposed that certain structure-specific facts required structure-specific rewrite rules to account for them, such as the fact that English nouns and adjectives never have NP complements. Thus, the structure of the grammar was believed to be:
S-structure
Figure 3.2 Government/Binding Standard Theory (X-bar Theory added)
While GB Theory was to change this view, it nevertheless provides a good example of how the components that were added to the grammar throughout the 1970s restricted the kinds of rules that had previously been envisaged.
  RELATIONSHIPS AND CONFIGURATIONS IN X-BAR SYNTAX
   This general diagram labels some of the structural relationships that are useful in the discussion of GB Theory.
                                              Phrase: the maximal projection from a lexical head X [i.e, N, V, etc.]
                                            Specifier: various elements including determiners, possessors, etc., not selected by the head
                                                  Head: a lexical category, i.e. N, V, P, A
                                            Projection: the lexical head N, V, etc. projects its properties onto other elements in the phrase
> specifier   pr X'
70 3 Structure in the Government minding Model
EXERCISE
3.1
   1 Try to represent the following phrases as X-bar syntax trees using the descriptions made so far, identifying the head and the specifiers and complements if present.
the coat with a missing button offended many people his grasp of politics with grace
too good to miss
has destroyed the immune system very interesting scatter to the winds
     What problems did you find?
   2 Identify the heads, their complements and specifiers in the following phrases:
John's picture of Mary so fond of a soft bed right on the top shelf
was thinking about it too certain that he will win several cups of tea
3.2.2  Transformational rules and their restrictions
A similar development happened to the transformational rules in the grammar. During the 1960s few restrictions were placed on what could be a possible transformation, other than what was demanded from empirical considerations and how the description of these empirical facts could be best distributed between the existing components of the grammar. However, once again this led to a situation in which the transformational component contained language- and construction-specific rules so that it was impossible to make generalizations that applied to all transformations.
   For example, one idea that emerged was that all movements are short, introduced as the Locality Principle in chapter 2. Transformations cannot move a phrase "too far" in a well-defined sense' (Chomsky, 1986a, p. 72). Let us look further at how the English wh-elements such as who, why or what behave in interrogative constructions. These elements move to the front of the interrogative clause and hence we may conceive of an interrogative wh-movement rule:
   (21)  D-structure:  I asked - Mary likes who.
         S-structure:  I asked who Mary likes -.
For the time being we will ignore the issue of where the wh-element moves to in order to concentrate on the length of the movement itself. The movement demonstrated in (21) moves the wh-element from its D-structure position as object of the verb to a position at the front of the embedded clause.
                              T              I
   (22)  D-structure:  I asked - Mary likes who.
3 Structure in the Government/Binding Model 71
   The next example shows an apparently longer movement:
   (23) D-structure: I asked [ - Bill thinks [Mary likes who]].
         S-structure: I asked [who Bill thinks [Mary likes - ]].
i.e.
                 'P                        I
   (24) I asked [ - Bill thinks [Mary likes who]].
Here the wh-element is moved out of two embedded clauses.
   In fact, it appears that a wh-element can be extracted out of any number of embedded clauses and hence that in principle the interrogative wh-movement rule is unrestricted in terms of how far it can move wh-elements. But this turns out to be false, as the ungrammatically of the following example shows:
                   'P                               l
   (25) * I asked who Bill wondered why Mary likes
This observation was first made by Ross (1967), who accounted for it by introducing a constraint on transformations that prevented a wh-element being extracted out of a clause which already had a wh-element moved to its initial position. This constraint Ross called the wh-Island Constraint: clauses which start with wh-elements are 'islands' on which other elements are stranded.
   There is a more general way to view this phenomenon, however, which assumes that movement is always short, as we argued in chapter 2: movement is short hops, not mammoth leaps. Note that the difference between the grammatical movement in (24) and the ungrammatical movement in (25) is that the position at the front of the second embedded clause is vacant in the first but occupied in the second. If the grammatical case involves not one long movement, but two short ones which makes use of the vacant wh-position, this accounts for the ungrammaticality of (25) by claiming that long movements are not possible:
   (26) I asked who Bill thinks - Mary likes
                 4_____________14s_________I
         I asked who Bill wondered why Mary likes -
Thus, the reason why clauses that start with a wh-element are islands is that the wh-element which sits in the initial position prevents all other wh-elements from moving to this position and, given that all movements are short, no other wh-element can move out of the clause. The short hop is prevented because there is nowhere for it to land.
   The advantage of the short movement account over islands is that the former explains other observations as well. For example, raising takes a subject of a lower clause and moves it to the subject position of a higher clause:
   (27)  D-structure: - seems [John to like Mary].
         S-structure:  John seems [ - to like Mary].
72 3 Structure in the Government/Binding Model
Raising exhibits similar effects to wh-movement: the subject can be moved over long distances only if all intervening subject positions are available to be moved into. If an intervening position is filled, however, long-distance movement is impossible:
   (28)  John seems - to be certain - to like Mary.
           ^_________kb_____________I
         * John seems that it is certain - to like Mary.
            't— -------#1---------:----1
The nature of this condition will be detailed in chapter 4. For now the important point to note is that there seems to be a general condition that movements are short and as such we can envisage a general condition which applies to all constraints, which is part of a separate module known as Bounding Theory. 'Bounding theory poses locality conditions on certain processes and related items' (Chomsky, 1981a, p. 5). This has a similar relationship to the transformational component to that which X-bar Theory has to the phrase structure component, as seen in figure 3.3,
S~structure
Figure 3.3 Government/Binding Standard Theory (Bounding Theory added)
Just as with X-bar Theory, the introduction of Bounding Theory allowed the grammar to become more general and simple as movement rules could now be stated in far more general ways. During the 1970s this component of the grammar became much reduced, containing very general rules for moving elements such as interrogative pronouns and NPs which were not construction specific (Chomsky, 1977).
   The general point of this section has been how the grammar was broken down into modules in order to achieve generality and simplicity within each specific module. In GB Theory this process continued as more modules were introduced, partly to extend the grammar's coverage of grammatical phenomena and partly to simplify the approach to issues already addressed. The next sections will provide more details concerning the modules of the grammar that are specific to D-structure.
3 Structure, in the Government,/Binding Model 73
                                                                                                                                                                                                                                                                                                                                                                                                                     v':>:
  BOUNDING THEORY
  A generalized theory of restrictions placed on movements. Its principles (to be discussed in the next chapter) ensure that all movements are short. In this way it accounts for why certain constructions (= islands) are impossible to move out of, as these would involve movements longer than allowed:
    * who did you ask [why [Mary kissed -]]
   This sentence demonstrates a wh-island, a clause beginning with a wh-element. The ungrammaticality is due to the movement of the object who from the wh-island. Bounding Theory explains this by claiming that the movement is too long, with the allowable shorter movement being blocked as there is already a wh-element (why) in the relevant position.
     Bounding Theory allows transformational rules themselves to be simplified by extracting from them restrictions that would be complicated when stated with respect to particular movements, but are simplified when stated as general conditions on all movements.
Identify the moved elements in the following sentences:
   He had been watched.
   Will that fit through the door?
   A man who no one knew arrived at the party.
   This suggestion, we will consider next week.
   A picture was found of the suspect at the scene of the crime. Never had they seen such a performance.
3.3  X-bar Theory in Government and Binding
A main consequences of the modularization of the grammar discussed above was that each module could be made simple and general, partly as a result of modules dealing with widespread phenomena rather than specific constructions and partly as a result of extracting complexities from one module and relocating them more generally in a separate module of their own. One of the major simplifications in GB Theory concerned the status of X-bar Theory. As we saw above, X-bar Theory was taken to be a constraining module acting on the phrase structure component of the grammar. In GB Theory, X-bar Theory replaced the phrase structure component altogether, and so took on the role of a constraint on actual structures rather than on the rules responsible for those structures. To see how this is possible, we need to see why the X-bar schema was originally viewed as a constraint.
EXERCISE
3.2
74 3 Structure in the Government/Binding Model
   Although X-bar Theory was designed to capture generalities seen in cross-categorial structures/ say the fact that the English head always precedes its complement (the head parameter from chapter 2), certain facts nevertheless seem to be specific to particular phrases. For example/ while verbs and prepositions can appear with NP complements/ nouns and adjectives cannot:
   (29)  chew the slipper in the kennel
         * the picture the dog
         * fond the dog
This is something that X-bar Theory does not predict as its general rule for introducing complements simply states that a head of any category can be followed by a complement of any category:
   (30)  X7 -> X YP
The specific heads that can appear with specific complements are a matter of lexical information. Yet it cannot just be lexical information that all nouns and adjectives do not take NP complements/ as this is not the idiosyncratic behaviour of individual words but something far more general about structures. Thus the need was felt to maintain phrase structure rules to capture generalizations which hold across categories. X-bar Theory then had to be included in conjunction with phrase structure rules.
   This was obviously not an optimal solution because of the amount of redundancy between phrase structure rules and X-bar rules. Furthermore it introduces an extra level of complexity to the descriptive content of the grammar: lexical information describes idiosyncratic facts about individual words/ phrase structure rules describe facts specific to syntactic categories and X-bar rules describe facts which are general across languages.
   Yet the fact that nouns and adjectives do not take NP complements seems to be as much a fact about the distribution of NPs as it does about the complementtaking abilities of nouns and adjectives. Indeed, it is generally accepted that adjectives and nouns can select for NP complements as a lexical property, but that these can never surface as bare NPs as they have to undergo a transformation which inserts the preposition of in front of them:
   (31)  the picture the dog —> the picture of the dog
         fond the dog        —» fond of the dog
   In GB Theory these facts were ascribed to an entirely different module of the grammar, dealing with the phenomenon of Case. More will be said about this in the following chapter, but we need to sketch an outline here. In many languages Case concerns the form that nominal elements bear which realizes information about the semantic or grammatical role of the nominal element. Latin for example has different forms of the noun for different cases; rex is the Nominative form for the Subject of the sentence, regem is the Accusative form for the Object, regie
3 Structure in the Government/Binding Model 75
           is the Genitive form to show possession, etc. While Old English used to have many cases, modern English only shows case in the pronouns, he, him, his etc.
              Case is then associated with certain structural relationships in the sentence; certain positions require the categories that fill them to be in a particular Case. Often subjects are in the Nominative Case while objects are in the Accusative Case. Case is not always obvious from the surface structure; as pointed out above, it is only Visible in English through the forms of the pronouns, unlike languages like Latin or Finnish that make it visible in nouns throughout the sentence. Take:
              (32) He saw him.
           He is the Nominative Case form of the pronoun and is associated only with the subject of finite clauses, that is to say those having tense, number etc. as opposed to non-finite clauses that lack these features. Him is the Accusative Case form and, although this has other uses, its main function is to mark the object of verbs. The difference between the two is the structural position they occupy in the sentence. In GB Theory, following fairly traditional assumptions, the form of the object is 'governed' by the verb: verbs assign an Accusative Case to their objects. Prepositions also assign Accusative Case to their objects; therefore any NP complement of a preposition or verb will bear Accusative Case, for instance:
              (33) I gave the book to him.
              Unlike verbs and prepositions, it can be assumed that nouns and adjectives do not assign any Case to their complements and hence the complement of a noun or an adjective is a Caseless position. Finally, a general principle of Case Theory, called the Case Filter, claims that all NPs have to occupy Case positions. Thus no NP can surface in the complement position of a noun or an adjective, even if, as a lexical property, individual nouns and adjectives select for NP complements. However, the insertion of the preposition of provides a way of getting round the Case Filter: the preposition assigns the missing Case and renders the position a Case position. That is, in:
              (34) fond him
           him cannot have Case because it is in the complement position of the adjectival phrase. Inserting an of
              (35) fond of him
           allows him to be in the accusative case because it is now governed by a preposition.
              Facts about the surface distribution of NPs therefore fall out from the Case module of the grammar and do not have to be stated as special rules in the phrase structure component. Once all category-specific facts have been accounted for in this way, the phrase structure rules themselves are no longer needed and only X-bar rules and lexical information remain.
76 3 Structure in the Government {Binding Model The grammar can therefore be simplified as in figure 3,4.
                          S-structure         '    '
Figure 3.4 Government/Binding Theory (X-bar replacing phrase structure)
   CASE THEORY
   A module of the grammar that determines the distribution of NPs through a requirement that all NPs must be in Case positions at S-structure, known as the Case Filter. Case positions are those governed by certain Case assignors:
   - Verbs and Prepositions govern and Case-mark objects as accusative:
      e,g. John saw him.
                                                                            John showed the picture to him.
   ~~ Nouns and adjectives do not Case-mark their objects and hence they do not have bare NP complements at S-structure.
   However, by insertion of the pleonastic preposition of, the NP complements of nouns and adjectives are allowed to surface:
                                                                              e.g. * a picture him a picture of him * fond him fond of him etc.
   Completely replacing phrase structure rules with X-bar principles gets rid of the redundancy of categorial information being stated both in the lexicon and in the phrase structure component. Under this view, the X-bar principles regulate a category-neutral structure and categorial information enters as lexical items are inserted. In this way it is the lexicon that determines the specific properties of actual phrases through the notion of projection. We can imagine a process in which we first construct an X-bar structure, such as the following:
3 Structure in the Government/Binding Model 77
    (36)   X"
          X'
          X
                                                             We then insert a lexical item into the head position:
    (37)   X"
            X'
            X
            I                                            .
          sleep
As this lexical item is a verb, its verbal category will be projected first to the X node, making it a V:
    (38)   X"
                                                                                                                 X7
            V t
          sleep
projection
                               As X is the head of X', the projection will continue to this node, making it a V":
    (39)   X"
            V'
          V
         sleep
projection
Finally, as X7 is the direct head of X", the projection will continue further up to the V":
    (40)   V" (VP)
            V'
          V
         sleep
projection
78 3 Structure in the Government/Binding Model
As this is the maximal projection, the projection of the verbal category ends here; other lexical items inserted into the larger structure will determine other aspects of the sentence as a whole. It is clearly important that structures be regulated by lexical information: categorial features projected into a structure must be
posed of projections of heads selected from the lexicon' (Chomsky, 1993, p, 8),
ments and features projected into the structure is needed, called the Projection Principle:
   (41)  Projection Principle
                                        The categorial properties of structures are projected from the lexicon.
that spread across the page since every phrase is projected upwards maximally,
Minimalist Program (MP) approach to structure using a process called Merge, in which the trees are built up out of pairs of categories, tends to produce a starker tree and will be outlined in chapter 7.
   THE PROJECTION PRINCIPLE
   A principle which ensures that lexical information remains constant at all levels of syntactic representations (e.g, D-structure and S-structure). Lexical information enters at D-structure with the insertion of lexical elements. This information is then projected into the structure in accordance with X-bar principles.
                                The next chapter shows how the Projection Principle also serves to present this information from being changed by the action of transformations.
   Some elements of structure, however, cannot be analysed as complements or specifiers but can nonetheless accompany the head within a phrase. Consider the following examples:
   (42)  a. John slept.
         b.  John slept very soundly.
The verb sleep is intransitive and hence does not select for a complement, so the adverbial phrase very soundly in (42b) cannot be a complement. This element is known as an adjunct. Adjuncts are non-selected modifiers of heads and as such
structure in indefinite numbers (unlike complements). Thus, in addition to the examples in (42), we can also have:
anchored to the properties of the inserted lexical items. 'An X-bar structure is com-
Thus a general principle governing the relationship between inserted lexical ele-
Unfortunately the Projection approach tends to produce cumbersome trees
losing the clear advantages of the visual presentation of structures as trees. The
they can appear fairly freely, being both optional and able to be included in a
                                                                3 Structure in the Government/Binding Model 79
   (43)  John slept very soundly in his bed all night dreaming beautiful dreams.
   There has been a certain amount of controversy about how adjuncts are to be incorporated into a structure within the X-bar framework. One idea is that they are introduced by a recursive rule, which introduces another instance of the projection level it expands and thus may apply again to its own output, creating a loop that can go on indefinitely. In general recursion is a property of a rule that can call on itself; Chomsky sees it as one of the unique capabilities of the language faculty (Hauser et al,, 2002). An instance of a recursive rule might look like the following:
   (44)  X' X7 YP
This rule produces the following kind of structure:
Now the lower X' could in turn be expanded into an X' recursively, yielding:
   (46)                  X'
X7         YP
X'         YP
Obviously this could go on for ever and hence an indefinite number of adjuncts could be added to the structure, which accords with adjunct properties, as seen by the limitless recursion of very in English:
   (47)  The prime minister was very very very very ... wrong.
Furthermore, if adjuncts are inserted by this kind of rule, we predict that they should be further from the head than are complements, as complements are sisters to the head, i.e. alongside it in the tree beneath the same node. In many cases this prediction is borne out:
   (48)  a. The dog chewed the slipper vigorously,
         b.  * The dog chewed vigorously the slipper.
80 3 Structure in the Government/Binding Model
The VP structure in (48a) would therefore look like the following:
    (49)
V"
V'
chewed the slipper
   However, this treatment of adjuncts is not without problems, the main one being the redundancy introduced by the structural approach to the recursive nature of adjuncts. Given that adjuncts are not selected elements, their inclusion into a structure is predicted to be unrestricted on lexical grounds; it is therefore redundant to have this fact follow from the mechanisms which regulate the structure. We will return to the treatment of adjuncts later when we will be in a better position to appreciate alternatives.
3.4   Theta Theory
The view of D-structure that developed out of the 1960s was as a structural representation of certain semantic relationships between elements, 'Phrase structure rules of a very simple kind generate an infinite class of D-structures that express semantically relevant grammatical functions and relations' (Chomsky, 1981a, p. 67). To take the example of the passive structure, the element that sits in the subject position at S-structure is interpreted as the object of the verb and hence is assumed to sit in object position at D-structure:
   (50)  D-structure:       - was chewed the slipper.
         S-structure:  The slipper was chewed
However, this raises questions about the definition of object and subject positions and why certain elements sit in them but not others. When we compare subjects and objects, one of the most obvious facts is their relationships with the verb. In:
   (51)  The dog chewed the slipper.
the dog is interpreted as the one doing the chewing and the slipper is the thing getting chewed. This is not because of our pragmatic knowledge - knowing that dogs tend to chew things and slippers are the likely objects of a dog's attentions. If the subject and object are switched round, we get:
3 Structure in the Government/Binding Model 81
   (52)  The slipper chewed the dog.
This sounds a pragmatically anomalous sentence in which the slipper is doing the chewing and the dog getting chewed. Yet people can still interpret the sentence even if they find it ridiculous. This means that the subject and object positions are associated with certain interpretations that are forced by the syntax.
   From a semantic point of view, what is involved here is the relationship between elements known as arguments and predicates. A predicate is something which expresses a state or a relationship and an argument is something that plays a role in that state or relationship. Thus in (52) the predicate is the verb chew expressing a relationship and the arguments are the NPs the dog and the slipper, the things involved in the relationship. Different arguments play different roles with respect to the predicate. Thus, the subject of chew plays the role of the 'chewer' and the object plays the role of the 'chewee', the thing chewed.
   More generally the subject of a large number of verbs is the one that deliberately and consciously carries out the action described by the verb, a semantic role known as agent. The dog is then the agent in (52). The argument that is acted upon by the agent, i.e. the one sitting in object position, is called the patient, exemplified by the slipper in (52). Roles such as agent and patient are known generally as thematic roles, or 0-roles (theta roles) for short. To take some examples, in:
   (53)  Pete drank a pint of Adnams.
Pete is the agent in subject position, who deliberately drank something; a pint of Adnams is the patient in object position that the agent drank. In:
   (54)  The government banned speeches fomenting terrorism.
the subject the government is the agent that banned something; the object speeches fomenting terrorism is the patient that the agent banned.
   However, not all subjects are interpreted as agents, and not all objects as patients. For example in:
   (55)  John sent a letter to Mary.
John is the agent of send and a letter is patient. Mary has the role of recipient, the receiver of something, indicated by the preposition to. Alternatively, we might think of Mary as the goal, i.e. the end point of the action described by the verb. In:
   (56)  Mary received a letter from John.
even though Mary is subject, this element is still interpreted as recipient and certainly not as the agent: Mary is not the one who deliberately and consciously performed the act of receiving - you can rarely choose to receive, something e-mail users will bear witness to. Next, consider the following:
82 3 Structure in the Government/Binding Model
   (57)  a. The dog chewed the slipper,
         b.  The dog saw the slipper.
As we have said, the object the slipper is interpreted as patient in (57a) but not in (57b) where nothing actually happens to the slipper as a result of the dog seeing it. We might call this semantic role theme. Moreover, the subject of see in (57b) is not an agent as there is no action performed in this case. We call this argument type an experiencer. Clearly the semantic role that an argument bears depends on the predicate: the subject of chew is an agent and its object is a patient, while the subject of receive is a recipient and the object of see is a theme. This must then be lexical information about how individual verbs behave, showing that chew is different from see, and so it must be stored in the lexical entry for each predicate:
   (58)  chew: [__NP], <agent, patient>
         receive: [_NP PP], <recipient, theme, source>
         see: [_NP], <experiencer, theme>
These lexical entries therefore not only include the subcategorization frames of the verbs detailing what complements they take but also a theta grid supplying information about the roles that their arguments take.
   Although we have now determined that different predicates take different types of arguments, we have yet to say how this affects the specific interpretation of particular arguments in particular positions. In GB Theory it is claimed that predicates assign their 0-roles to specific structural positions and that any element sitting in those positions will be interpreted as bearing the assigned roles. So the agent of chew is assigned to the subject position and the patient is assigned to the object position:
   (59)  The dog chewed the slipper.
           tv-
              agent     patient
The verb chew has then affected the structure of the sentence by assigning its two 0-roles to the appropriate positions.
   We can distinguish two types of 0-role, as seen in:
   (60)  a, John broke the window.
         b.  The hammer broke the window.
The subject of break in fact can have several possible interpretations: either as agent, as in the more natural interpretation of (60a), or as instrument, i.e. something through which an action is carried out, as in (60b). There is clearly some interaction between the role the subject is interpreted as bearing and the subject itself, as hammers are not the sort of thing that can be agents (except of course in a metaphorical sense The hammer of God), though John could be interpreted as instrument in the case that someone throws him through the window! However, the interpretation of the object remains the same in both the sentences in (60): the
3 Structure in the Government/Binding Model 83
window is patient no matter what choice of subject we make. This suggests that the interpretation of the object is solely determined by the verb itself, contrasting strikingly with the interpretation of the subject, as seen in:
   (61)  a, John broke the window, b. John broke his leg.
In (61a) the subject can be interpreted as agent or, less naturally, instrument. But in (61b) the subject has a different set of possible interpretations. In the most natural interpretation John is seen as the one whose leg is broken rather than the one that actually does something, though the agent interpretation is possible: John could deliberately break his own or, perhaps more likely, someone else's leg. The instrument interpretation is not available in this instance. Thus the range of interpretations differs in both cases, showing that the interpretation of the subject is determined not solely by the verb, but also by the choice of the object. For these reasons the object is known as the internal argument, suggesting that it is closer to the verb both semantically and structurally, while the subject is known as the external argument, suggesting a greater distance between it and the verb.
   We can now consider the principles that govern 0-role assignment. The internal argument is always the complement of the verb and structurally speaking is the verb's sister, i.e. is one of two constituents immediately dominated by the V', their mother:
   (62)                    V"
V'
V        N"
         (John) broke his leg
The internal 0-role is therefore assigned to the sister of the assigning head. This not only explains why the internal argument is close to the head that 0-assigns it, but also accounts for the fact that only the verb determines the range of 0-roles available to the object; inside the V' there is nothing but the verb to influence the internal argument:
The case of the external argument is different. In particular, both the verb and its complement play a role in determining the 0-roles available to the subject. This suggests that what actually assigns the external 0-role is the combination of
84 3 Structure in the Government,/Binding Model
the head and its complement: in other words, the V'. If a similar restriction to that for the internal 0-role applies to the assignment of the external 0-role, we might expect it to be assigned to the sister of the V', which is the specifier position:
    (64)
VP
NP     V'
Towards the end of the 1980s, it became increasingly popular to hypothesize that the subject actually originates inside the VP - the VP-Intemal Subject Hypothesis. Koopman and Sportiche (1991) argued extensively in favour of this on both theoretical and empirical grounds. For now the proposal will be accepted on the grounds that it simplifies the principles governing 0-role assignment.
   The general principle governing 0-role assignment is therefore that 0-roles are assigned to the sister positions of the 0-role assigning element, known as the Sisterhood Condition:
   (65) Sisterhood Condition
         A 0-role assigning element assigns its 0-role to its sister.
'0-marking meets a condition of "sisterhood" that is expressible in terms of X-bar theory ...: a zero-level category a directly 6-marks p only if p is the complement of a in the sense of X-bar theory' (Chomsky, 1986b, p. 13). This condition, along with restrictions asserted by X-bar Theory, plays a role in the definition of a well-formed D-structure, putting the relevant arguments into appropriate structural positions. The Sisterhood Condition is one of the principles of a module called Theta Theory (0-Theory) which applies directly to D-structure:
Figure 3.5 Government/Binding Theory (Theta Theory added)
   Another principle of Theta Theory regulates how 0-roles are assigned. For example, an argument cannot be inserted into a structure without having a legitimate 0-role. If there is an object with an intransitive verb, the result is ungrammatical:
   (66) * Mary smiled John.
3 Structure in the Government/Binding Model 85
(apart from a few exceptions where the object reiterates the verb She smiled a big smile). In this case, John receives no 0-role and the ungrammaticality indicates that arguments must bear 0-roles. Moreover, arguments are not able to distribute a single 0-role over several positions:
   (67)  * the dog chewed the slipper the bone
The conclusion is then that every argument must have at least one 0-role assigned to it and that every 0-role can be assigned to one argument at most. Indeed, it appears that an argument can only bear one 0-role:
    (68) * the dog chewed the slipper disintegrated
The slipper cannot be both patient of chewing and experiencer of disintegration at the same time, without adding a substantially more complex structure.
    This suggests that 0-roles and arguments are in a one-to-one correspondence with each other, captured by a principle known as the Theta Criterion (0-Criterion):
    (69) Theta Criterion
          All theta roles must be assigned to one and only one argument.
          All arguments must bear one and only one theta role.
Together, the Sisterhood Condition and the Theta Criterion constitute Theta Theory.
   THETA THEORY
   A module of the grammar dealing with the assignment of semantic roles (0-roles), such as agent, patient and goal, to arguments in a sentence. It consists of two basic principles:
   -  The Sisterhood Condition states that 0-roles are assigned to sisters of the assigning element. Internal 0-roles are assigned directly from a head to its complement, while external 0-roles are assigned compositionally by the head and its complement, via the X' to its sister, the specifier. This assumes the VP-Intemal Subject Hypothesis, which claims that the subject originates in the specifier of the VP and then moves to the specifier of IP at S-structure.
   -  The Theta Criterion states that 0-roles can be assigned to only one argument and arguments can bear only one 0-role. Thus 0-roles and arguments are in a one-to-one correspondence.
86 3 Structure in the Government/Binding Model
EXERCISE 3.3.
   1  Devise a sentence for each of the theta roles that have been mentioned in this section and then devise a single sentence that exemplifies all or as many as possible of them:
         patient        recipient
         experiencer    source
         agent          theme
   2  The subject of a verb such as seem may be realized by the semantically empty pronoun it:
         It seems John likes fishing.
      However, there is another way to express the meaning of this sentence without using a meaningless subject:
         John seems to like fishing.
      How can this sentence mean virtually the same thing as the one with the meaningless subject if 0-roles are assigned to sister positions?
3.5   Control Theory and null subjects
So far we have been concentrating on the structure of phrases and have said relatively little about the higher structure of the sentence. The next section will investigate sentence structures in more detail, but certain aspects of the treatment of sentential elements, particularly the subject, need to be introduced first. Consider the following sentence:
   (70) There arrived a mysterious package.
Because of the Theta Criterion that arguments and 0-roles are in a one-to-one correspondence, the verb arrive takes a single argument: the one who arrives. In (70) this is obviously the mysterious package, which bears the 0-role assigned by arrive. This means that the subject of this sentence, there, appears to lack a 0-role. Indeed, this element is essentially meaningless and is referred to as an expletive or pleonastic subject, both terms meaning "meaningless" in this context. So the subject there does not actually violate the Theta Criterion, as the subject is not an argument of any predicate and so does not need a 0-role. However, one might wonder what it is doing there in the subject position at all. Indeed the subject is obligatory in English; leaving it out is unthinkable:
   (71) * Arrived a mysterious package.
3 Structure in the Government/Binding Model 87

;'fc

■i!;:

Since virtually all sentences of English have subjects, this suggests strongly that a grammatical principle is involved, namely the Extended Projection Principle (EPP), which states that all clauses must have a subject. 'The two principles - the projection principle and the requirement that clauses have subjects ™ constitute what is called the extended projection principle (EPP)' (Chomsky, 1986a, p. 116), Thus, in situations where there is no semantic subject, an expletive subject has to be inserted, as in (70).
   The EPP does, however, appear to have several exceptions. One typically involves the subject position of non-finite clauses, which appear to be subjectless in most languages:
   (72)  The dog tried [to chew the slipper].
Although there is no apparent syntactic subject of the non-finite clause enclosed in brackets in (72) [to chew the slipper7, semantically the subject of the main clause the dog clearly acts as the missing subject: the dog is the one both doing the trying and the chewing. However, allowing both verbs to assign their external 0-roles to this subject would mean that one argument would end up with two 0-roles, in violation of the Theta Criterion. Nor can the subject be the sister of two V's at the same time, and so it would not satisfy the Sisterhood Condition either.
   Semantically, the situation is very similar to the following:
   (73)  The dog thinks [he chewed the slipper].
One obvious interpretation of this sentence has the dog doing both the thinking and the chewing. But this would involve two distinct arguments: the dog and he. We can interpret both of these arguments as the same because the pronoun he may be referentially dependent on the higher subject the dog. The situation where two elements are co-referential is often indicated by giving them the same index, shown in subscripts:
   (74)  The dog thinks [he^ chewed the slipper].
The problem of the subjectless non-finite clause could be solved if we were to suppose that there is a pronoun-like element which sits in the subject position and which is referentially dependent on the higher subject:
   (75)  The dog tried [pronounK to chew the slipper].
Under this assumption, the non-finite clause has a subject, thus satisfying the EPP and there are, moreover, two independent subjects to bear one 0-role each, satisfying the Theta Criterion. Of course, the problem with this suggestion is that no such pronoun is visible in the actual sentence. But, put another way, perhaps the situation does not sound so hopeless. What sentence (75) claims is that there is a syntactic and semantic subject of the non-finite clause, but that it is not phonetically realized. It is not new to talk about there being phonologically empty syntactic elements. Most grammarians accept the existence of null morphemes which
88 3 Structure in the Government/Binding Model
fill out paradigms to keep the description as regular as possible. For example, Hungarian intransitive verbs show the following conjugations:
setalok  I walk           
s£talsz  you (sing.) walk 
setal    he/she/it walks  
setalunk we walk          
setaltok you (pi.) walk   
setalnak they walk        
Only when the subject is third person and singular setdl does the verb have no inflection. However, the alternative to assuming the absence of any inflection is to assume that there is actually a morpheme here which is unpronounced. The same could be claimed for the null pronoun subject of the non-finite clause: it too is simply unpronounced.
   The null pronoun subject of non-finite clauses is often called PRO and has a number of properties peculiar to itself. First, while it is obviously a nominal element, it has a far more restricted distribution than other NPs.PRQ is only, ever found in subject position in non-finite clauses and is banned from object position and finite clause subject position:
   (77) a. * The dog chewed PRO.
         b. * The dog thinks [PRO chewed the slipper].
These positions are said to be governed, which is a crucial technical relationship in GB Theory. Essentially a governed position has a relationship to a particular element called a governor. Lexical heads, such as nouns, verbs, adjectives and prepositions, are governors along with the inflection of the finite clause. The former all govern their complement positions whilst the latter governs the subject position. To take some examples:
In (78), the bold heads are all governors and the arrows show the elements that they govern. As PRO is unable to occupy any of these positions, it appears that it cannot sit in a governed position and, since nothing governs the subject position of the non-finite clause, the non-finite marker to not being a governor, this is the only position in which this element can occur. This is known as the PRO Theorem:
   (79) PRO Theorem
        PRO can only sit in ungoverned positions.
3 Structure in the Government,/Binding Model 89
This theorem will figure in the next chapter, where an attempt to derive it from more basic principles will be discussed.
   The referential capabilities of PRO also distinguish it from other elements. As seen in (74), a pronoun like he can be referentially dependent on another element in a higher clause. However, clearly this is not necessary and the pronoun could refer to someone else. Hence the following representation, where the fact that the NP and the pronoun have different indices (jj) indicates disjoint reference, is per-fectly possible:
   (80)  The dogi thinks [hej chewed the slipper].
i.e. it was someone unnamed such as the owner that chewed the slipper.
   PRO, on the other hand, is less flexible and often its reference is fixed to a particular element. For example consider the following sentences:
   (81)  a. John persuaded the dog^ [PRO; to stop chewing his slipper], b. Johnj promised the dog [PROs to stop chewing his slipper].
In the first example, it is the dog who will stop chewing the slipper, but in the second it is John who will desist from slipper-chewing! The element which fixes the reference of PRO is the controller, "Control theory determines the potential for reference of the abstract pronominal element PRO" (Chomsky, 1981a, p. 6).
   The determination of which element is the controller is complex. As can be seen in examples like (81), the properties of the verb which selects the non-finite clause as a complement have some part to play. The verb persuade is referred to as an object control verb since its object acts as the controller. The verb promise, on the other hand, is a subject control verb since its subject is the controller.
   Further complications arise with other examples:
   (82)  a. John asked the dog [PRO to stop chewing his slipper].
         b. John asked the doctor [how PRO to stop chewing his slipper].
As (82a) shows, ask appears to be an object control verb. However, in (82b) it is not the object which controls PRO. In this case, the subject could be the controller (i.e. John wants to know how he can stop chewing slippers) or there might not be any controller at all (i.e. John wants to know how it is possible for anyone to stop chewing slippers). In the second instance, PRO has arbitrary reference, similar to the generic pronoun one:
   (83)  John asked the doctor how one can stop chewing slippers.
Finally in:
   (84)  [PRO to be or riot to be], that is the question.
we see another instance of PRO with arbitrary reference; being or not being is not ascribed to anyone in particular.
90 3 Structure in the GovernmentfBinding Model
  The principles governing control phenomena are clearly complex. They form another module of the grammar, known as Control Theory, which also applies to D-structure. This can now be added to our model:
Figure 3.6 Government/Binding Theory (Control Theory added)
   CONTROL THEORY
   A module concerning the reference of the empty subject PRO, which has | either controlled reference or arbitrary reference. In controlled cases, the controller is either the subject or the object, depending on the verb: \
     John promised Bill PRO to lelave.                                    1
     John persuaded Bill PRO to leave.                                    j
                                                                                                  I
   In arbitrary cases, PRO is interpreted as having generic reference, similar to the reference of the pronoun one:
     PRO to leave would be impolite (for one to leave would be impolite).
   There is yet another instance where the EPP appears not to hold, though this phenomenon is more restricted cross-linguistically than control phenomena. In some languages a pronoun subject of a finite clause can be dropped'. For example, the Hungarian verbs in (85) can all be taken as complete sentences whether or not there is an overt subject:
   (85)   (En) leiilok.
          I sit down.
          (Te) leiilsz.
          You sit down.
          (6)        leiil.
          He/she/it sits down.
          (Mi) leiilunk.
          We sit down.
          (Ti)     leiiltok.
          You (pi.) sit down.
          (Ok) leiilnek.
          They sit down.
3 Structure in the Government/Binding Model 91
This possibility is not available for all languages, as can be seen from the following English verbs, which by themselves cannot constitute a complete sentence:
   (86) * (I) sit down.
         * (He) sits down.
   Once, again, however, the fact that these subjectless sentences are interpreted as though they do have pronominal subjects in some languages indicates another instance of a phonologically null element. The phenomenon is often referred to as 'pro-drop'. Languages which allow such null-subject sentence are called prodrop languages and those which do not are called non-pro-drop languages.
   The pro-drop parameter seems to be one of the major parameters of variation between languages, sometimes cutting across language families. For example French is a non-pro-drop language, but all other Romance languages, including Italian, Spanish, Portuguese and Romanian, are pro-drop languages. The Germanic languages are all non-pro-drop while other language families, such as Slavic, are all pro-drop. Indeed across the world the vast majority of languages appear to be pro-drop. The box gives some examples of languages that are pro-drop and non-pro-drop.
Some pro-drop languages             Some non-pro-drop languages      
(with null subjects, i.e. allowing  (without null subjects, i.e. not 
the empty element pro to be         allowing pro as subject)         
subject of the sentence)                                             
Italian Chinese                     German                           
Arabic  Greek                       French                           
Portuguese Spanish                  English                          
Hebrew  Japanese                    Dutch                            
   There is some controversy concerning how to analyse pro-drop. On the one hand it is tempting to include it with control phenomena, accounting for the parameterization in terms of greater and lesser restrictions on the distribution of PRO, However, the standard approach is to separate pro-drop and control phenomena as involving two different phonologically empty pronouns. The empty pronoun in pro-drop phenomena goes by the name of Tittle pro', i.e. pro, as opposed to its big cousin PRO. One advantage of having two distinct empty pronouns is that the difference between pro-drop and non-pro-drop languages shows up as presence or absence of pro: pro-drop languages have pro, non-pro-drop languages do not. Another advantage is that we can more easily describe differences in the behaviour of the two pronouns if they are treated as separate. For example, the referential possibilities of pro tend to be very similar to those of an overt personal pronoun, unlike PRO which has controlled referential possibilities.
   The analysis of the parametric difference between pro-drop and non-pro-drop languages also has its controversies. One of the first analyses was Rizzi (1982), who claimed that whether a language licenses pro in subject position depends on its agreement system. Above we mentioned the notion of government, claiming lexical heads and finite inflection to be governors. Another, stronger notion is proper government, which claims there is a set of proper governors restricted to lexical
92 3 Structure in the Government/Binding Model
heads, and excluding the finite inflection. Many pro-drop languages differ from non-pro-drop languages in terms of the richness of their agreement systems. Compare the Hungarian example to the English one:
1st sing. setal-ok  walk   
2nd sing. setal-sz  walk   
3rd sing. setal     walk-s 
1st pi.   setal-unk walk   
2nd pi.   setal-tok walk   
3rd pi.   setal-nak walk   
Hungarian has a different agreement form for each of the six members of the agreement paradigm whereas English only has two forms in all. The richness of the agreement systems of pro-drop languages, Rizzi claimed, allows their finite inflections to be treated as proper governors and therefore the condition which licenses pro is that it must be properly governed. This in turn is part of a more general condition to be discussed in the next chapter called the Empty Category Principle.
   The positive aspect of Rizzi"s original analysis of the pro-drop parameter was that the proposed small difference between languages ~ whether or not finite inflection is a proper governor - accounts for a wider range of phenomena than just whether or not the language allows a pro subject. Amongst other properties, Rizzi listed the following as being typical of pro-drop languages:
•  not having pleonastic subjects
•  the ability to "invert" subjects and VPs
•  the ability to extract wh-elements more freely out of certain clauses.
   "[An] interesting topic ... is the clustering of properties related to the pro-drop parameter, whatever this turns out to be. In pro-drop languages (e.g. Italian), we find among others the [above listed] clustering of properties.... Non-pro-drop languages (e.g. French and English) lack all of these properties, characteristically" (Chomsky, 1981a, p. 240). We can demonstrate these properties with the following Italian examples:
   (88)  Sembra che Gianni sia ammalato.
         (seems that John is ill)
         It seems that John is ill.
While the English sentence in (88) needs an expletive subject, there is no such element in the Italian version.
   (89)  Ha telefonato Gianni.
         (has telephoned John)
         John has telephoned. (* has phoned John)
In this case the subject follows the VP, which is generally not possible in English.
3 Structure in the Government/Binding Model 93
   Finally (90) shows an interrogative clause/ which/ as the English translation demonstrates/ is not possible in English.
   (90) Che credi           che verra?
         (who believe-2.sing. that come)
         * Who do you believe that will come?
The grammatical English version obligatorily leaves out the word that, known as a complementizer:
   (91) Who do you believe will come?
The point is that sentences (88-90) are all grammatical in Italian/ a pro-drop lan-guage/ but are ungrammatical in English/ a non-pro-drop language. What is more, the grammaticality always has to do with the subject of a finite clause. In Rizzi's analysis this followed from the assumption that the Italian finite inflection properly governs the subject whereas the English one does not.
   In the early days of GB Theory it was hoped that this sort of analysis would provide plenty of explanatory content as it suggested that small grammatical differences could be responsible for quite wide-ranging differences between languages in terms of the surface phenomena that they demonstrate. In turn this could play a role in accounting for how the process of language acquisition could be completed so quickly and so effortlessly by the child. All a child needs to learn is whether the finite inflections of their language are a proper governor or not, and knowledge of a great many other things follows. "When this parameter is set one way or another/ the clustering of properties should follow. The language learner equipped with the theory of UG as part of the initial state requires evidence to fix the parameter and then knows the other properties of the language that follow from this choice of value7 (Chomsky/ 1981a/ p, 241).
   Rizzi (1986) modified his analysis slightly to accommodate the fact that in some languages pro seems to be allowed in positions other than the subject of a finite clause. For example Hungarian pronominal possessors may be dropped as well as subjects/ as in:
   (92) Lattam        az (te) anyukadat.
         (saw-lst sing, the (you) mother-2nd sing.-acc.
         I saw your mother.
And in Italian objects can sometimes go missing:
   (93) Un dottore serio visita - nudi.
         (a doctor professional visits naked)
         A professional doctor examines his patients when (they are) naked.
   Rizzi claimed that pro needs to be licensed by certain elements/ the identity of which is open to parametric variation/ and moreover its content must be recovered from the licensor. English selects no licensor for pro and hence pro cannot
94 3 Structure in the Government/Binding Model
appear. Standard pro-drop languages, on the other hand, select finite inflection as a licensor of pro and hence they have pro subjects of finite clauses. Hungarian also has a nominal licensor and Italian a verbal one. Thus the range of licensors differs from one language to another. As for the recovery of the content of pro, a morphologically rich inflection allows this in a straightforward fashion; that is to say you may be able to tell the subjects number and gender from the verb forms even if the subject is actually absent. For instance in the Hungarian sentence:
   (94)  Lemegyiink a kocsmaba,
         (down-go the pub-to)
the link shows the verb is first person plural and that it agrees in number and person with the pro subject, which is therefore interpreted as meaning 'we' and the sentence means We're going down the pub. Note further that the Hungarian nominal also agrees with the possessor facilitating the recovery of the content of this element. Something special must be assumed for the content of the pro object in Italian as it is clear that the content cannot be recovered directly from the verb. The content of the missing object is also special in that it is always generic, similar to the interpretation of the pronoun one in the following:
   (95)  One has to put up with such things.
Building on the fact that verbs assign 0-roles to their objects, Rizzi suggests that a pro which is 0-marked by its licensor will receive a generic interpretation. As the subject is not thematically related to the inflection, this will obviously not be true for pro in subject position.
   This development helps account for further facts that were problematic for the original theory. For example, while German is not normally considered a prodrop language, it can drop expletive subjects as in:
   (96)  Gestern wurde pro lange diskutiert.
         (yesterday was-3.sing. long discussed)
         Yesterday it was discussed/the discussion went on until late.
So German finite inflection is a licensor for pro, but it is not rich enough to allow referential content to be recovered from it. A non-referential, expletive pro has no content and therefore is possible.
   However, this theory faces a number of problems. For one thing, there are some non-pro-drop languages which demonstrate similar phenomena to those exemplified in (88-90). Scandinavian languages, for example, allow wh-movement out of clauses which start with a complementizer, as shown by the following Norwegian example:
   (97)  Hvem tror du at har stjalet sykkelen?
         (who think you that has stolen bike-the)
         Who do you think (*that) has stolen the bike?
3 Structure in the Government/Binding Model 95
but, like all Germanic languages, they are not generally pro-drop:
   (98) * pro Glitrer som diamanter.
         ( glitters like diamonds)
         It glitters like diamonds.
Furthermore, some pro-drop languages do not display the same set of effects as Italian. Japanese, for example, is a pro-drop language, but does not demonstrate any overt wh-movement. Worse still, there are pro-drop languages which do not have rich agreement systems. Chinese is a pro-drop language, but has no verbal agreement whatsoever:
   (99) Wo (I)
         Ni (you sing.)
         Ta (he/she/it)    ai Lisi.
         Women (we)        like Lisi.
         Nimen (you pi.)
         Tamen (they)
Essentially the pro-drop parameter can be set independently of the other phenomena, disappointingly from the point of view of the theory of parameter setting.
   Another approach has attempted to explain why languages with rich inflection systems, such as Italian and Hungarian, and languages with no inflections at all, such as Chinese, should be pro-drop, whereas languages with patchy inflection systems, such as English, are not. Comparing the three types of language shows what those with rich inflections and those with no inflections have in common:
          Italian  Chinese English 
1st sing. parlo    shuo    speak   
2nd sing. parli    shuo    speak   
3rd sing. parla    shuo    speaks  
1st pi.   parliamo shuo    speak   
2nd pi.   parlate  shuo    speak   
3rd pi.   parlano  shuo    speak   
While Italian has a different form for each part of the present tense paradigm, Chinese has the same form throughout. In English, however, while most parts of the paradigm are the same, one form speaks is different from the others. Thus what connects Italian and Chinese is that they both have uniform paradigms: uniformly different or uniformly the same. The English paradigm is non-uniform, with speaks being the exception that destroys uniformity of both types. Jaeggli and Safir (1989)
96 3 Structure in the Government (Binding Model
propose that the notion of morphological uniformity explains pro-drop and that a language's inflections may be analysed as + or - uniform. A +uniform inflection is what licenses pro.
   Although this theory explains why languages at the opposite ends of the inflectional spectrum should behave similarly to each other, it raises a number of questions. First, there is no obvious connection between morphological uniformity and the ability to drop pronoun subjects, and second, it is not. entirely clear what counts as a uniform paradigm.
   A possible solution to the first problem may have something to do with the recovery of the identity of the null pronoun such as pro, A rich agreement system obviously helps us to recover the content of a null subject, as does the context in which the sentence is uttered: if we are talking about John and we say went home, it would be natural to assume that it was John who went home rather than someone else. For example in English many people keep diaries in which they omit the first person I (Haegeman, 1990):
   (101) Got up. Had breakfast. Went to work.
simply because it is blindingly obvious who the subject of a diary is. If a language has no agreement morphology, this reliance on the context of situation would be the only way to recover the content of a null subject. It may be then that recovering the content of the null subject needs either inflection or discourse context, but that the latter is only available in the absence of distinct agreement inflection. Recovery of the content of a null subject in a language with agreement inflections can only happen if all possibilities are marked: on hearing the form speak we are not able to tell if the subject is first or second person, singular or plural.
   Huang (1984) suggests that languages come in 'subject prominent' and 'topic prominent' types and that in the latter the notion of discourse topic plays a central role in the recovery of the identity of null elements. He points out that Chinese, which he claims is topic prominent, can have both null subject and null objects fairly freely and that unlike Italian the null object is not interpreted generically but is dependent on the topic:
   (102) Zhangsan shuo Lisi bu renshi.
          (Zhangsan say Lisi not know)
          Zhangsan says that Lisi does not know him.
In this case, the missing object would be interpreted as identical to the topic, so if the conversation was about Bill, (102) would be interpreted as Zhangsan says that Lisi doesn't know Bill. The null object in this case, however, could not be interpreted as co-referential with the main clause subject, which is unexpected if it is simply a null pronoun, as an overt pronoun could refer to this subject. We might refer to this phenomenon as topic-drop rather than pro-drop. In Huang's theory, these sentences do not involve pro, however, but contain an empty topic operator which is moved to the front of the sentence; it is the reference of this operator that is fixed by the topic. As such the sentence is more like the English:
3 Structure in the Government [Binding Model 97
   (103)     Bill/ John says Mary doesn't know, with Bill unpronounced.
   The second problem is more troubling. The verb paradigm we have been examining contains six entries for the present tense/ being first/ second and third person in singular and plural. But is it essential that there be uniformity across exactly these six? There are some languages which show gender distinctions as well as person and number; some have more than a singular and plural contrast/ say the dual form found in Old English or three- or even four-way number distinctions. Is uniformity in these language defined with respect to the complete paradigm7 or only with reference to the six forms mentioned so far? Perhaps not all slots in more complex paradigms have to be taken into consideration. For example Moroccan Arabic has nominally a 14-way agreement paradigm in the imperfect tense, including a dual form for all three persons and a masculine/feminine distinction in second and third persons singular and second and third persons plural/ as shown in (104). From the start this does not appear to be uniform as gender distinctions are not uniformly made in all instances. Moreover not all of the 14 parts are morphologically distinct; items (f) first dual, (j) first plural and (1) second plural feminine are all the same form tkolm-na:
   (104)  a.
          b.
          c.
          d.
          e.
          f.
          S-
          h.
          i,
          j*
          k.
          l.
          m.
          n.
By all accounts, this does not appear to be a uniform paradigm, but Moroccan Arabic is nevertheless a pro-drop language. However, the six greyed cases in (104), which ignore dual, show uniformity in that each is morphologically marked. While it appears that three person and two number distinctions are important for determining uniformity in a paradigm, it is not clear why this should be.
2nd sing. masc. tkalm-ta                                 
3rd sing. masc. tkalm                                    
                                                         
1st dual        tkalm-na                                 
2nd dual        tkolm-tuma                               
3rd dual masc.  tkslm-a                                  
3rd dual fern.  tkolm-ata                                
                                                         
2nd pi. masc.   tkslm-tum                                
2nd pi. fern.   tkolm-na                                 
                ■- ■■■. ■■ ■ ■■■. .■ .r *m> ■. .r 1/ . v 

98 3 Structure in the Government (Binding Model
   Another question concerns which paradigm counts when determining morphological uniformity. The examples so far all involve the present tense. Why should this take priority over other tenses? If the past tense paradigm is crucial, then English would come out as morphologically uniform:
   (105) I/you/she/he/it/we/they looked.
(with the exception of the was (were contrast). Moreover, strictly speaking Hungarian is not morphologically uniform in its intransitive paradigm, as the third person singular form is the same as the base, for example setdl (walk). However, the paradigm for verbs such as lat (see) when they take a definite object is uniform in agreement:
          "walk" (intrans) "see it" (def. object) 
1st sing. setal-ok         lat-om                 
2nd sing. setal-sz         lat-od                 
3rd sing. s£tal            lat-ja                 
1st pi.   setal-unk        lat-juk                
2nd pi.   setal-tok        lat-jatok              
3rd pi.   setal-nak        lat-jak                
Given that Hungarian is pro-drop, it must be the definite object paradigm that counts. But why should this be?
   Finally, in some languages pro is allowed in some instances but not in others, as in modem Hebrew where pro-drop is possible in main clauses in past and future tenses with first and second person subjects, for example:
   (107) ani axal-ti                  pro axal-ti (I ate-l.sing.)
but not in present tense main clauses or main clauses with third person subjects (Borer, 1984):
   (108)  a. ani/at/ hi/ . . . oxelet *pro oxelet
               (I you she eats) b. hem axlu                *pro axlu
               (they ate)
As the third person is unmarked in all tenses and all persons are not distinguished in present tense, it seems that the possible appearance of pro in Hebrew is directly related to the richness of the particular inflection rather than to the uniformity of whole paradigms.
3 Structure in the Government/Binding Model 99
  ALTERNATIVE APPROACHES TO PRO-DROP
    Rizzi (1982): Rich agreement systems allow finite inflections to be proper governors and pro subjects are licensed by proper governors. Therefore pro-drop languages are those with rich agreement.
    Rizzi (1986): What licenses pro is parameterized and languages select different possibilities (inflection, nouns, verbs, etc.). The content of pro must also be recoverable from its licensor, so rich agreement allows all | null subjects to be recovered and poor agreement allows only expletive null subjects. Thematic licensors allow a generic interpretation of pro objects.
    Huang (1984): Languages are either subject or topic prominent. Subject prominent languages (such as Italian) can license pro with rich agreement, but topic prominent languages (such as Chinese) can have empty topic operators associated with subject or object positions, which are dependent on discourse conditions.
    Jaeggli and Safir (1989): Languages either have morphologically uniform (uniformly different or uniformly the same) or non-uniform inflections (some different, some the same). -Hmiform inflections license pro, -uniform inflections do not.
1  Write down the present tense and past tense forms for a language you know other than English (if you know no other language find a grammar book for one). According to morphological uniformity, is this language likely to be pro-drop or non-pro-drop?
2  Here are the present tense forms for verbs in different languages. Assuming that the spelling reflects the pronunciation, do you think they are pro-drop or non-pro-drop languages?
Persian        Dutch  Icelandic Finnish German  
(a.k.a. Farsi)                                  
'read'         'work' 'bite'    'ask'   'play'  
1. mikhanam    werk   bit       kysyn   spiel e 
2. mikanid     werkt  bitur     kysyt   spielst 
3. mikhanad    werkt  bitur     kysyy   spielt  
4. mikhanim    werken bitum     kysymme spielen 
5. mikhanid    werken biti      kysytte spielen 
6. mikhanand   werken bita      kysyvat spielen 
We shall see in chapter 5 that pro-drop has provided a rich area for research and speculation about children's acquisition of their first language.
EXERCISE
3.4
100 3 Structure in the Government/Binding Model
3.6   Further developments in X-bar Theory
This final section of the chapter introduces the developments in X-bar Theory during the 1980s that led to a still more general theory of structure.
3.6.1  IP and CP
Until now, the structural issues discussed have only concerned thematic elements such as nouns, verbs, adjectives and prepositions, i.e. the four lexical categories. But what about the other categories that have been mentioned only in passing, such as determiners, inflections and complementizers? These functional categories differ from the thematic categories so far discussed in that they play no direct role in assigning or receiving 0-roles. In fact, the semantic interpretation of such elements is often secondary to that of the thematic elements which express the basic content of the proposition. For example, consider the role played by the auxiliary verb must and the complementizer that in the following sentence:
   (109) John said that he must leave.
The main content of the embedded clause asserts that someone identified by the pronoun he (perhaps John or perhaps someone else unnamed) is involved in an action expressed by the predicate leave. The modal auxiliary must adds the extra overtone that his leaving' is an obligation. It is not, however, clear that the complementizer that actually adds very much in the way of semantic content. To make this clearer, consider another sentence:
   (110) John asked if he must leave.
The complementizer that in (109) demonstrates that the embedded sentence is declarative whereas the if in (110) shows that it is interrogative, i.e, based on a question. There is then a contrast between that and if as complementizers.
   As thematic categories apparently play a more important semantic role in the meaning of the sentence, they are traditionally called major categories and the functional categories are often referred to as minor categories.
   Until the 1980s it was assumed, following traditional grammar, that the syntactic importance of functional categories reflected their minor semantic importance and therefore was secondary to lexical categories. Hence X-bar Theory was about the thematic categories of nouns, verbs etc. and had little to say about functional categories such as complementizers. As no theory of the syntactic treatment of functional categories was developed, they were treated in a rather off-hand way, and were included in the positions where they seemed to fit best. At the same time, there was one major structure which seemed to be outside the X-bar system altogether, namely the sentence itself. The beginning of this chapter represented the clause as an 'S' dominating the subject NP and the VP:
   (111)
S
3 Structure in the Government/Binding Model 101

But the fact that this S lacks a head means it clearly breaches the X-bar principle that every phrase must have a head.
   In fact, structure (111) is already a simplification, as by the end of the 1970s a further position had been introduced to accommodate those elements that demonstrate finiteness or non-finiteness in the clause. These range from modal auxiliaries to the non-finite marker to and the finite inflection -s:
   (112)  a. John will leave,
          b.  I expect [John to leave].
          c.  John always leaves.
As every sentence needs one of these elements and they are in complementary distribution with each other, i.e. there is no possibility of:
   (113)  * John will leaves.
it was assumed that they form a single category making up a third obligatory part of the sentence. This category became known as inflection or INFL for short, later abbreviated simply to I:
   (114)        S
NT I V"
It is obvious how this I position accommodates modal auxiliaries and the infinitival to, as they appear in between the subject and the VP. It is more difficult to accommodate the inflections of tense and agreement, which attach to a verbal element. As this issue involves movement, it recurs in the next chapter, so we will ignore for now the fact that inflectional bound morphemes do not appear between the subject and the VP in the same place as other I elements.
   I is obviously a functional category rather than a lexical category. Hence it stands outside the X-bar conventions applying to lexical categories. But note that structure (114) includes a word-like element, the I, that is not the head of any phrase, and a phrase-like element, the S, which has no head. Given that clauses are categorized as finite or non-finite - exactly what the I element marks - it is easy to jump to the conclusion that I is in fact the head of the clause. At the start of the GB period the usual belief was as follows: "Let us assume further that VP is a maximal projection and that the S-system [i.e, the clause] is not a projection of V but rather of INFL' (Chomsky, 1981a, p. 164). Only in the mid-1980s did it became regularly expressed in X-bar terms, using the diagram below:
   (115)
■:si
IP
N"
r
i
V"
102 3 Structure in the Government/Binding Model
This proposal claims not only that the I element is the head of the clause, and hence that the clause is the maximal projection of the inflection, an IP, but also that the VP is the complement of the inflection and the subject its specifier:
   (116)         IP
                I        'V"
                        comp
This offers a straightforward account of the word order of the English sentence in terms of X-bar syntax, as the complement follows the head while the specifier precedes it. There is also structural evidence that the inflection and the VP form a single constituent in the sentence as they may coordinate with other I + VP sequences:
   (117)  He may attend the conference but won't present a paper.
Furthermore, only VPs can follow inflections, indicating a restrictive relationship between them, and supporting the claim that the VP is the complement of the inflection.                        .
   As soon as one functional element is seen to take part in the X-bar system, the pressure is on to assume that they all do, leading to a general theory of structure in which all elements conform to X-bar Theory, whether functional or lexical. Around the same time that the inflection was accommodated into the X-bar framework, similar ideas were therefore proposed for the complementizer. As pointed out earlier, complementizers are elements like that which introduce embedded clauses and they usually carry features distinguishing finite and non-finite as well as declarative and interrogative. For example, the complementizer that introduces finite declarative clauses while for introduces non-finite declaratives:
   (118)  a. I think that he may know.
          b. I was anxious for him to know.
As we have seen in (110), the complementizer if introduces finite interrogatives. During the 1970s, it had been established that the complementizer (C) forms a constituent with the clause that it introduces; however, C was not included with the subject, the inflection and the VP as part of the basic clause structure. If the basic clause is in fact an IP, this indicates the following structure:
There are three options for the identity of the constituent labelled with the question mark in tree (119): it might not be a headed constituent at all, in which case
3 Structure in the Government/Binding Model 103
it does not fall within the remit of the X-bar system; it might be headed by the IP; or it might be headed by the complementizer. The second of these options is the least feasible as it would either place the complementizer in an adjunction position/ adjoined to the IP/ or would require extending the IP to a third projection level:
   (120) a.        I"               b.        r
that he will leave          that he will leave
As the complementizer is not recursive/ the adjunction proposal in structure (120a) of having one I" within another I" is improbable/ while the proposal in structure (120b) that clauses extend to a triple bar level I'" undermines the claim that the inflection conforms to the same theory of phrase structure developed for the thematic elements: nouns and verbs/ etc., project only two levels according to standard X-bar Theory.
  What remains are the proposals that the constituent containing the complementizer is either headed or not headed. If it isn't headed/ then we recreate a similar position needed for the analysis of the 'S' node: on the one hand there is. a word-like category/ the complementizer/ which heads no phrase, on the other a phrase-like element, the constituent containing the complementizer and the IP - let's call it S' (as it was called during the 1970s) ~ which has no head:
Once more this configuration strongly suggests that the complementizer should be taken as the head of the constituent, as is also suggested by the fact that the complementizer plays a role in determining the declarative or interrogative status, what may be referred to at the force, of the entire clause. Thus, we assume the following structure:
                                                                                        (122) CP C'
C          IP
For the time being, we will skip over what counts as the specifier of the CP, returning to it in the next chapter. Note that structure (122) claims that the IP functions as the complement of the complementizer; this claim is supported by the observation that only an IP can follow a complementizer, indicating a restrictive relationship between them.
104 3 Structure in the Government [Binding Model The full structure of the clause that we end up with is then as below: (123)           CP
will
V'
  V
  I
chew
NP
the slipper
The clause is now structured into three hierarchical parts, all of which conform to usual X-bar Theory. At the bottom is the VP supplying the basic thematic elements which make up the proposition. Recall that under the VP-Internal Subject Hypothesis, the underlying position of the subject is in the specifier of the VP. So the VP contains the verbal predicate and all its arguments at the level of D-structure, though the subject moves out of the VP at S-structure. Next above the VP is the inflectional IP system which provides the distinctions of finiteness through the introduction of an inflectional head I. Its specifier is the surface position of the subject, as represented in (123). Finally at the top comes the complementizer CP system which introduces the force of the clause: i.e. whether it is declarative or interrogative.
■lli
lit
a
m
■alt-
:a§
■aa
It
;k«!
a
a? •
a
il
m
aa
lilt
via
■■■al
i a m.
3 Structure in the Government/Binding Model 105
3:62 The DP Hypothesis
Another functional element to be reanalysed within a more general X~bar Theory was the determiner. The standard assumption within phrase structure theory was always that determiners such as the and a sit in the specifier of the NP:
    (124)        NP
Det       N'
          the       N
                 dog
Once again we encounter a word-like element that is not the head of a phrase and therefore seems not to take part in X-bar Theory. There is also something strange about it sitting in the specifier of the NP. This position is the one which the possessor of the NP is thought to occupy, e.g. John in John's dog; the fact that determiners and possessors are in complementary distribution in English supports the assumption that they occupy the same position. However, the possessor is a phrase while the determiner is a word, and in other structures it seems that word positions and phrase positions are strictly demarcated. Indeed, in other languages the determiner and the possessor are not in complementary distribution. For example, in Hungarian it is quite common to find both together in the same NP:
    (125)   a Jozsi kutyaja
           (the Jozsi dog-3rd sing.)
           Jozsi's dog
But, if the determiner does not sit in the specifier of the NP, where does it sit? Moreover, if the determiner conforms to X-bar Theory, there must be a phrase that it heads, but what is that phrase? These puzzles were solved by Abney (1987) with the DP Hypothesis, which claims that the determiner is the head of the nominal phrase, not the noun. From this perspective, the noun heads a phrase which acts as the complement of the determiner:
    (126)       DP
D'
D   NP 
the n; 
     N 
dog
106 3 Structure in the Government [binding Model
Again, there are good reasons to consider the determiner as a head of the nominal phrase. For example, it is often the determiner and not the noun which contributes the property of definiteness or indefiniteness to the whole phrase:
   (127)   a. the dog b. a dog
The phrases in (127a) and (127b) are definite and indefinite respectively, but the noun is identical in both cases. The distinction between the two lies with the determiners and therefore it is the determiner which projects its properties of definiteness to the phrase rather than the noun.
   A number of advantages follow from the DP analysis. One is that the structure of the nominal phrase exactly mirrors that of the IP:
We commented above on the similarity between the subject of the clause and the possessor of the nominal phrase, which was one of the motivations for the development of X-bar Theory in the first instance. Under the DP Hypothesis, the structural parallelism between the two is exact: both sit in the specifier of a functional element which selects as a complement the phrase headed by the thematic element to which the subject and possessor are related.
   This analysis also increases the number of structural positions within the nominal phrase, another advantage: not only do we separate the possessor from the determiner, but we also introduce a second specifier position - that of the noun. Abney (1987) argued that this position is needed to accommodate what are traditionally called the post-determiners - determiner-like elements which typically come after standard determiners:
   (129) those many/few f several ideas
These elements cannot be attached within the traditional NP analysis, as the only place for them would be adjoined to the N'. But they are not recursive and so are not well analysed as adjuncts. Furthermore, they always precede adjectival modifiers, which are adjoined to N'. The NP specifier position within the DP analysis provides us with an ideal place to site post-determiners:
3 Structure in the Government/Binding Model
10 7
(130)
DP
D'
  D
those
AP
many
N'
N'
                       stupid         N
                                                    ideas
   While the DP Hypothesis accounts for those languages in which possessors and determiners are not in complementary distribution/ it does of course raise the problem of accounting for this pattern in languages where they are. As the possessor and the determiner do not occupy the same structural position/ there must be some other reason why they cannot co-occur. Abney suggests that an abstract determiner obligatorily accompanies the possessor/ a possessive determiner (pos), and that this is the element that is in complementary distribution with other determiners:
   (131) a. his pos dog
         b.  the dog
         c.  * his pos /the dog
   A final advantage of the DP Hypothesis is that it allows a principled account of the complementary distribution between determiners and pronouns. If, as is usually assumed/ the noun is the head of the nominal phrase/ one might have thought that pronouns replace the noun when they pronominalize an NP. But in this case, we might expect that pronouns could appear with an accompanying determiner/ like other nouns. This is not so:
   (132)  * the him
However, this is not the same kind of restriction as with certain classes of nouns/ such as proper nouns for example, which usually have no determiners, but may appear with them under special circumstances:
   (133)  a. * The Mary left.
          b.  She's not the Mary that I used to know.
          c.  There's a Mary at the door.
          d.  I know three Marys.
Pronouns, however, cannot occur with determiners under any circumstance:
   (134)  a. * The she left.
           b.  * She's not the her that I used to know.
           c.  * There's a her at the door.
           d.  * I know three hers.
108 3 Structure in the Government(Binding Model
One possible account of this pattern would be to assume that pronouns replace not the noun, but the determiner. But, under the NP analysis, this would involve proposing an abstract noun head, which has no motivation at all given that the entire content of the NP is provided by the pronoun in the specifier position:
    (135)       NP
Det N' 
her  N 
     e 
    The DP Hypothesis, on the other hand, maintains the assumption that pronouns replace determiners without the need to postulate an abstract noun. Given that the determiner is the head of the nominal phrase, it is the only element that is required in the DP. In other words, we analyse pronouns as 'intransitive' determiners:
    (136)  DP D'
            D
            her
3 Structure in the Government [Binding Model 109
Provide possible trees for the following DPs in Hungarian:
Csaba haza
(Csaba house-3rd sing.)
Csaba's house
Csilla minden slbotja
(Csilla every ski-stick-3rd sing,
all of Csilla's ski sticks
Attilanak a tehene (Attila-dative the cow-3rd sing, Attila's cow
a Jozsi sok hibaja (the Jozsi many mistake-3rd sing. Jozsi's many mistakes
EXERCISE
3.5
Which of these is problematic for the analysis discussed above? Could this problem be solved if we supposed that the Hungarian DP had a further level of functional projection similar to the CP of the clause?
3.6.3  The Split INFL Hypothesis
A further important development within X-bar Theory again concerns the inflectional elements. In English the inflections of the verb indicate tense and subject agreement features, though, as we have pointed out, distinctions for subject agreement are particularly poor in English. Importantly the verb has just one morpheme attached to it which expresses either past or present tense with third person singular agreement:
   (137)  a. He/I/you/ ... jumped, b. He jumps.
This corresponds to the claim that there is just one inflection node in the clause. However, in languages other than English tense and agreement are represented by independent morphemes. Consider the following Hungarian example where not only is past tense represented by '-t' but also the first person singular agreement is shown as '-am7:
   (138)  Ugral-t-am.
          (jump-past-1.sing.)
          I jumped.
   If there is only one inflection node in a sentence, where do the separate inflectional morphemes fit? One possibility is that they occupy their own head position and project their own phrase, i.e. AgrP (Agreement Phrase) and Tense Phrase (TP):
110 3 Structure in the Government/Binding Model
(139)
AgrP
DP
Agr'
A
pro       Agr TP
-am
T
VP
t
                        V
                                   ugral
Pollock (1989) argued that a structure similar to this is in fact universal, even for languages without separate tense and agreement morphemes. His evidence comes from French. Assuming inflections are generated as separate elements from the verb and that these come together as the result of some syntactic process, there are two possible surface positions in which the verb might occur: in the verb position, with the inflection adjoined to the verb, or in the inflection position, with the verb adjoined to the inflection. This seems to be a parametric difference between languages. For example, the following sentences suggest that English verbs remain inside the VP while French verbs move to the inflection position:
   (140)  a, John [VP often [VP kisses Mary]].
          b. Jean embrasse [VP sou vent [VP - Marie]].
   Assuming that the adverb is adjoined to the VP, in the English example the verb kisses sits between the adverb often and the object Mary, showing that it remains in the verb position. In French, however, the verb embrasse precedes the adverb souvent indicating that it is no longer within the VP, but in the position posited for the inflection, namely between the VP and the subject. Essentially the same facts concerning the verb seem to hold in negative sentences, though things are slightly more complicated in both English and French: English verbs remain inside the VP and French verbs sit in the inflection position:
   (141)  a. John did not [VP kiss Mary].
In the English example, the verb kiss sits between the negative element not and the object Mary, indicating a VP internal position. However, the dummy auxiliary do is inserted into the inflection position. We will discuss this phenomenon in the next chapter; we will overlook it for the moment as it has little bearing here. In
         b. Jean n'embrasse pas [VP - Marie].
3 Structure in the Government/Binding Model 111

IS
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                ■■


o;;j
;a::i
■ -livj

;
the French example, the verb embrasse is to the left of the negative element pas, which we might suppose is the equivalent of the English not; again the indications are that the verb is in the inflection position. We also have a second negative morpheme, the n', which acts as a clitic stuck to the front of the verb (not to be expanded on here). The reason for introducing the negative sentences is to demonstrate that there is another possible position for the verb to occupy between the verb and inflection positions. This can be seen in French infinitival clauses in which the verb never sits in the inflection position, but may occupy a position to the left of a VP adjoined adverb:
   (142) Ne pas embrasser [VP souvent - Marie] c'est triste. (not           to-kiss     often    Mary is     sad)
         To not often kiss Mary is sad.
In this example, the verb embrasser is to the right of the negative ne pas, showing that it is not in the inflection position, as it is in the finite clause in (141b). However, it is to the left of the adverb, showing that it is not in the VP either. But if the verb embrasser is neither in I nor in V, where is it? Pollock argued for the existence of another head position between the I and V positions, which he took to be the agreement head position, even though French does not have separate tense and agreement morphemes. Thus, he argued that something similar to (139) is applicable to French and, indeed, universally.
   The structure that Pollock actually argued for is slightly different to (139), however, in that he positioned the AgrP below the TP:
   (143)         TP
DP        r
T        AgrP
Agr'
Agr        VP
Pollock's argument was that the difference between French finite and infinitival clauses on which he based his analysis is a difference in the tense head T, not the agreement. In the finite clause, the verb is in a position that it cannot occupy in the infinitive clause and this suggests that French finite tense and infinitive tense have different properties. However, as the verb is in the highest position in the finite clause, but not in the highest position in the infinitival, this suggests that tense is the highest inflectional head. Other linguists, on the other hand, argued equally strongly for the structure in (139). For example, the order of the Hungarian morphemes suggests that the verb attaches first to the tense and then to the agreement, arguing that the tense node is nearer to the verb than the

112 3 Structure in the Government/Binding Model
agreement. A solution was proposed by Chomsky (1995a), who claimed that both parties were right: there is an agreement head above tense and one below tense. The highest agreement element is associated with agreement with the subject and the lowest one is associated with agreement with the object, which some languages show morphologically. Thus, Chomsky's solution is as follows:
Towards the end of the 1980s the idea that there was a rich system of functional heads built on top of the VP gave rise to much research. The idea became known as the Articulated INFL Hypothesis. Currently the status of such structures is arguable. Some researchers still assume such a complex system of functional projections, while others have retreated to a more minimal set. Chomsky (1995a) himself has argued that agreement, both subject and object, should be viewed as features rather than structural nodes, and has essentially reverted to a version of the CP/IP analysis of the clause. 'As matters stand here, is seems reasonable to conjecture that Agr does not exist and that [agreementj-features of a predicate P . .. are added optionally as P is selected from the lexicon. Note that this carries us back to something like the analysis that was conventional before Pollock's (1989) highly productive split-I theory' (Chomsky, 1995b, p. 377).
   ARTICULATED INFL
   Towards the end of the 1980s, what had been thought of as a single structural node for the inflectional elements was reanalysed as being made up of a number of separate inflectional heads, TP, AgrP, etc., each of which projects its own phrase. There was argument over whether the agreement phrase was above or below the tense phrase, until Chomsky (1995a) suggested that there are two agreement phrases, one above and one below the tense phrase.
3 Structure in the Government (Binding Model 113
3.6A The VP Shell Hypothesis
Finally we turn to developments concerning the structure of the VP. All of the structures drawn so far have involved either a transitive or an intransitive verb: i.e. verbs with either zero or one complement. However, verbs may take more than one complement - so how can two or more complements be accommodated within an X-bar structure? Two obvious solutions come readily to mind. First, if all complements are sisters to the head, multiple complements should all be at the same structural level, under the first projection of the head:
(145)        V"'           
V             NP           
receive a letter from John 
However, this structure involves a node with three branches, something not needed previously. The fact that binary branching trees are sufficient for most structures reflects a restriction on possible phrase structures known as the Binary Branching Condition, which states that structures are at most binary branching. This would then rule out a structure such as (145), because it has three branches. An alternative would be to reject the assumption that all complements are sisters to the head and allow some to be generated in what we have been calling the
adjunct position:             
(146)             V'          
               V'        ^PP^ 
V                 ^ from John 
                  NP          
receive           a letter    
This structure has a number of advantages over (145) apart from its conformity to the Binary Branching Condition. Most importantly, if we give up the definition of complements as sisters of the head and adjuncts as those elements which sit in adjoined positions, we reduce a redundancy in the previous accounts, as the notions "complement" and "adjunct" can also be defined in terms of their semantic relations to the head: complements are selected by the head and therefore appear in the head's subcategorization frame, whereas adjuncts are non-selected modifiers and do not appear in the subcategorization frame. The Sisterhood Condition on thematic role assignment ensures that complements are inserted into a structure closer to the head than adjuncts, though we must assume that thematic roles that cannot be assigned directly to the sister of the head will be passed on to its projection to be assigned to its sister. This does not add any extra grammatical mechanisms as we already assume the same thing for the external 0-role.
114 3 Structure in the Government/Binding Model
   Despite the advantages of (146) over (145), there are reasons to believe that even this structure is not correct. As pointed out by Larson (1988), several phenomena seem to show that the first complement is structurally higher than the second, whereas (146) has the second argument in the structurally higher position. Here we will demonstrate just two of these phenomena. The first involves the referential properties of reflexive pronouns, the details of which will be provided in the next chapter. For now, all that is important is to note that a reflexive pronoun can be referentially dependent on an argument which is higher than itself, but not lower. So a reflexive object can refer to a subject, but not vice versa. In the following examples, we again use co-indexation to represent co-reference:
   (147)  a. John* congratulated himself*, b. * Himself* congratulated John*.
The relationship involved in this phenomenon is known as c-command, which can be defined in the following way:
   (148)  c-command
          An element c-commands its sister and everything dominated by its sister.
   To clarify, consider the following tree diagram:
(149)
congratulated himself
The sister of the subject John in this structure is the T. As the object himself is included in the T, the subject c-commands the object. The sister of the object is the V and therefore the object c-commands the V. The subject is obviously not included in the V and so the object does not c-command the subject.
   Thus we can say that a reflexive pronoun can only refer to a c-commanding element. When a verb has two objects, the second object can be referentially dependent on the first, but not vice versa:
   (150) a. John showed Bill* himself* (in the mirror), b. * John showed himself* Bill* (in the mirror).
                                                                                                                                                                                                                                                                               -/A

m
i
■a
.a
■m
A l
■AA
■::U
m.
m
■v;i
     i-
■.i
V;!
|
a
v-;4
                                                            3 Structure in the GovernmentfBinding Model 115
This suggests that the same structural relationship holds between the two objects as between a subject and an object. As the subject c-commands the object, the first object therefore must c-command the second. Going back to structure (146), this makes exactly the opposite claim: here the second complement c-commands the first We might therefore conclude that this structure cannot be correct,
   A second observation which shows that the first object of a double object predicate c-commands the second concerns negative polarity items, such as anyone. Such elements can appear if they are c-commanded by a negative element. Thus anyone can function as the object of a sentence with a negative subject, but anyone cannot occur in subject position with a negative object:
   (151)  a. No one likes anyone, b. * Anyone likes no one.
In a double object construction, a negative polarity item can appear in the second object position with a negative element in the first object position, but not vice versa, demonstrating that the first object c-commands the second:
   (152)  a. I told no one anything, b. * I told anyone nothing.
   However, for the first object to c-command the second would require a structure something like the following:
(153)
V'
V        ?
receive DP
a letter
?
?         PP
from John
The part of this structure with the labels '?' looks suspiciously like a phrase with the DP in its specifier position and the PP in its complement position:
(154)
V'
V
XP
receive DP
a letter
X/
from John
116 3 Structure in the Government/Binding Model
The question is: what is the head of this phrase? Larson (1988) proposed that the head is the verb and that this gets into its preceding position via a movement. Thus, underlyingly the phrase looks like the following:
   (155)
                                                                       V
DP       V7
a letter
V
PP
                        receive from John
This analysis additionally proposes an empty shell of a VP generated on top of the contentful VP to provide a landing position for the verb to move to: hence it is called the VP Shell Hypothesis.
   Others have suggested that the head of the VP shell is not always just an empty position for the verb to move to, but may contain an abstract verbal element which the overt verb attaches to via movement. Thus consider the following observations:
   (156)  a. The ship sank.
           b.  They sank the ship.
           c.  They made the ship sink.
Verbs such as sink are sometimes called ergative verbs and appear to have both an intransitive and a transitive use as in (156a) and (156b). The difference is that the transitive use involves causation - something made the ship sink -as shown by the fact that (156b) and (156c) mean similar things. The subject here is interpreted as the one that causes the sinking to happen. In the intransitive example, however, the subject is the thing that undergoes the sinking. The word order facts and the near synonymy of (156b) and (156c) can be accounted for under the following assumptions. First, the basic position for the element that undergoes the sinking is the specifier of the verb, thus accounting neatly for (156a) and (156c):
   (157)  a. [VP The ship [v sank]].
           b. [ypThey made [VP the ship [V' sink]]].
(157a) shows a partial structure, ignoring the functional structure that would be built above this VP and the subsequent movement of the subject to the specifier of the IP. The causative verb make is assumed to take a VP complement identical
3 Structure in the Government/Binding Model 117
in all relevant respects to the VP in (157a). Sentence (156b) raises a number of
be no causative verb? And why is the theme argument the ship after the verb in (156b) but before it in (156a)? One answer to these questions assumes there is a causative verb in (156b) which is phonologically empty and the overt verb moves to attach to it;
   (158)  a. Ivp They e [VP the ship [V' sank]]].
Note that this is exactly the same kind of movement found in the VP-shell hypothesis with a verb moving from a lower verbal position to a higher one. Here,
move to, but contains a meaningful verb of its own.
   More recently, this line has been pursued further, suggesting that the verb of the VP shell is what is responsible for assigning the external 0-role in general, particularly the agent. This verb, known as a light verb (typically represented as a lower case V), is present only when an agent role is assigned, and hence one possible analysis of the active/passive distinction might be in terms of the presence or absence of the light verb:
The structure in (159a) represents the core proposition of a sentence that would be realized as:
   (160) Mark gave the money to Sarah.
questions: why is this structure interpreted as a causative when there appears to
          b. Ivp They sank-e [VP the ship [Y - ]]].
however, the upper verbal position is not just an empty position for the verb to
   (159)   a.
vP
b,
VP
Mark v
VP
the money V
PP
given
to Sarah
the money V
PP
                               gave
to Sarah
whereas (159b) represents the proposition of the passive sentence:
   (161) The money was given to Sarah.
118  3 Structure in the Government/Binding Model
  VP SHELLS
  In order to maintain the Binary Branching Condition in the case of verbs with multiple complements, and to capture the observations that show that the first complement c-commands (is structurally higher than) the second, Larson (1988) proposed that the VP structure should be articulated into two parts; an upper VP shell, empty of lexical elements, and a contentful VP containing the verb and its complements in specifier and complement positions. The verb is then assumed to move to the head position of the VP shell:
   We will conclude this section, and this chapter, by briefly mentioning that just as the IP and VP have undergone an 'articulation' process, analysing them into a series of functional projections headed by more specific heads, such as tense, agreement and causative verbs, the CP has also been subject to a similar analysis. Rizzi (1997) has argued that what has been referred to as a single projection headed by the complementizer is better seen as a number of separate projections of heads which introduce notions such as topic and focus. The suggestion remains contentious, especially in the light of the recent move towards simplification of the clausal architecture, almost returning to a simple IP analysis. However, some linguists are still following the line of research started in the late 1980s, which has become known as the 'functional explosion'. As Chomsky himself has advocated a more minimal approach, we will not pursue this further.
3.7 Summary
This chapter has concentrated on issues of structure that developed throughout the 1980s within GB Theory. Many of these serve as precursors to the Minimalist
3 Structure in the Government/Binding Model 119

v-'-'-'V
■ ..
■ CA.
Program, to be discussed in chapters 7 and 8. Most of the discussion so far concerns the development of X-bar Theory, which became increasingly general throughout the period of GB Theory, starting off as a theory of the structure of phrases headed by lexical elements, N, V, A and P, and ending up as a completely general theory of all syntactic structure. The most important development in this respect was extending X-bar Theory to cover functional elements, such as complementizers, inflections and determiners. From these the interest in functional elements and their role in syntax increased and later developments such as the Articulated INFL Hypothesis followed. There were also developments in thinking about the structure of the VP, in particular the VP-Internal Subject Hypothesis and the VP Shell Hypothesis, both of which contribute to a view of the VP as a hierarchical structure with all but the lowest argument being included as specifiers of VP shells built one on top of another.
   Other modules of the theory with special relevance for D-structure, i.e. Theta Theory and Control Theory, have also been discussed. Theta Theory deals mainly with the assignment of thematic roles from predicates to arguments and contains two basic principles: the Theta Criterion and the Sisterhood Condition, which play a role in regulating how and where 9-roles can be assigned and therefore determining which arguments appear where. Control Theory introduced the notion of phonologically empty arguments, which in many ways behave like pronouns in terms of their distribution and referential properties. Control Theory itself is concerned with the referential properties of PRO, an empty category which always sits in ungoverned positions, according to the PRO theorem (79). We contrasted this with another empty pronoun, pro, which sits in governed positions in certain languages.
   Due to the highly interactive nature of the modules of GB, it is virtually impossible to discuss any single module without reference to the others. This chapter has made numerous references to modules of the grammar which will be more fully discussed in the next chapter. The modules that are relevant for S-structure (and beyond) come into their own as conditions on what movements can or must take place.
Discussion topics
1  To what extent do you think deconstructing the theory into a proliferation of modules simplifies or obfuscates?
2  Does the distinction between deep and surface structure strike you as obvious and familiar, or do you find it novel and startling?
3  Do you feel any. of the claims about English reflected in the rewrite rules are discoveries about English or are formalizations of already known grammatical rules?
4  How would you now define the subject of the sentence? How does this compare to your definition before reading this chapter?
5  To what extent do you think that the fact that the generation of the sentence is now driven by the lexicon diminishes the unique status of syntax?
6  Recently Hauser et al. (2002) seem to be suggesting that the only unique aspect of the 'narrow7 language faculty is recursion. Can this really be so important?
120 3 Structure in the Government/Binding Model
                                                          7 The most widely used artificial language in the world, Klingon, allegedly with , one native-speaker baby (http://www. tvwiki.tv/wiki/D%27Armond„Speers)/ has no tense forms yet appears to be non-pro-drop, as in naDev tlhlnganpu'
tu'lu '(There) are Klingons here': http:// stp.ling.uu.se/ ~zrajm/nerd/klingon/ TKDAddenda/kliadd4.html#4. Is this because its devisers were chiefly English speakers or because it is actually an invented language, i.e. not subject to UG?
4  Movement in Government/ Binding Theory
The previous chapter introduced the idea that the structure of an expression in any language can be described at two levels: D-structure and S-structure. These levels are linked by transformational rules, which mostly involve moving elements from one place to another - movement, alias dislocation. The levels are also the places at which various principles, assembled into modules, apply. The prim ciples that apply at D-structure govern the basic organization of the structure, as we have seen in the last chapter, while the principles that apply at S-structure regulate movement. For example, an S-structure principle may require an element to be in a certain position which it does not occupy at D-structure. So, for the sentence to be grammatical, the element will have to vacate its D-structure position and move to the position it is required to be in at S-structure. There are also occasions where S-structure principles prevent certain movements from happening, thus explaining why some movements are possible, others not. An interesting aspect of the Government/Binding (GB) grammar is that many principles are not motivated solely by considerations of movement, but also have roles to play in other phenomena. Before these can be discussed in detail, however, we should first look more thoroughly at how the notion of movement has developed in Chomskyan linguistics over the years.
4.1   An overview of movement
4.1.1 A-movements
As discussed in the previous chapter, the notion of a transformation started off as a powerful device that could alter structures in unconstrained ways. However, it was clear from the start that, to attain the goal of explanatory adequacy, this flexibility could not be maintained, as no real theory of language acquisition would be possible without restrictions on the mechanisms that grammars could potentially utilize, i.e. children would simply be unable to learn language if any
122 4 Movement in Government/Binding Theory
structure at all was possible. Building in restrictions meant that the transformational rules themselves could be simplified and generalized.
   By the late 1970s three types of movement transformations were recognized, which played a role in the generation of numerous structures (Chomsky, 1977). One of these types involves the movement of a Determiner Phrase into a subject position. Several constructions involve this kind of movement, the most obvious being the passive. In a typical passive, a DP originates in the object position and then moves to the subject which is vacant:
   (1)  a. -            was amended the contract,
                  ,__________=_____________I
                  T■
        b. The contract was amended -.
This analysis, which dates back to the first transformational analyses of the 1950s, implicitly assumes a principle made explicit by Baker (1988), called the Uniform Theta-Role Assignment Hypothesis, or UTAH for short: arguments which bear the same 0-role are generated in the same D-structure position. Thus, if arguments which are interpreted similarly occupy different surface positions, the difference must be attributed to movement. This is the case for instance with the comparison of passive and active sentences, where the object of the active and the subject of the passive bear the same 0-role:
   (2)  a. The lawyer amended the contract.
        b. The contract was amended by the lawyer.
   A similar movement can be seen in the so-called middle construction:
   (3)  These potatoes mash well.
This construction is restricted to a smaller set of transitive verbs, and has other restrictions which need not concern us here. The point is that an element which is interpreted as the object of the Verb (these potatoes) ends up in the subject position. We might propose therefore that this movement is similar to the passive, with the object moving to the vacant subject position:
   (4)  a. -              mash these potatoes well.
               ^--------------------1
        b. These potatoes mash -              well.
   A further example of this type of movement involves the so-called unaccusative verbs such as arrive. While these appear to be intransitive at the surface, they have some properties of transitive verbs. In English the most obvious difference is that unaccusative verbs can appear in the 'there construction' but intransitive verbs cannot:
   (5)  a. There arrived a party from Cricklewood.
        b. * There telephoned a party from Cricklewood.
4 Movement in Government/Binding Theory 123
'Si
■.■■TiiTy
In this construction, the subject is taken by the pleonastic element there. The element that would normally be interpreted as the subject (a party from Cricklewood) appears after the verb, in what seems to be an object position. Another difference is that intransitive verbs can take a special object which in some sense duplicates the meaning of the verb, known as a cognate object:
   (6) a. He died a terrible death.
        b.  He smiled a rueful smile.
        c.  He danced a merry dance.
Unaccusative verbs, however, do not take cognate objects, a property they share with transitive verbs:


m

■m
Wi : v.yA

■."‘i
A-?
a
f!
•'3
■m
m
Aj
   (7)  a. * He arrived an arrival.
        b. * They destroyed a destruction.
Presumably the reason why transitive verbs cannot take cognate objects is because these objects cannot really bear 0-roles and, according to the 0-Criterion seen in chapter 3, a verb which has a 0-role to assign to an object must do so. Unless the same is true for unaccusative verbs, i.e. that they have a 0-role to assign to an object, it is unclear why they should not be able to take a cognate object.
   Data from other languages also show that unaccusative verbs have certain things in common with transitive verbs. For example, in Italian, the pronoun ne can be cliticized in front of the verb if it is associated with the object but not with a postposed subject (recall from the previous chapter that Italian is a pro-drop language, and one property of some pro-drop languages is that the subject can follow the VP):
   (8)  a. Luigi ne ha insultati molti.
            (Luigi of-them has insulted many)
            Luigi has insulted many of them, b. * Ne telefonano molti.
            (of-them telephone many)
            Many of them telephone.
However, we can get ne-cliticization from an apparently postposed subject of an unaccusative verb:
   (9)  Ne arrivano molti.
        (of-them arrive many)
        Many of them arrive.
Thus in (9) there is no postposed subject, for this element is really an object.
   All this adds up to the conclusion that unaccusative verbs are really transitive verbs, selecting for a single complement. However, this complement may move to the subject position, just like the objects of passive and middle verbs:

!
124 4 Movement in Government/Binding Theory
   (10)  a.         -           arrived a party of linguists.
                    f
         b, A party of linguists arrived        ~.
The movement of a Determiner Phrase to a subject position is also found with raising verbs. The difference is that the Determiner Phrase moves from another subject position rather than from an object position:
   (11)  a. - seems [John to have lost his passport].
               T “ ~ i .        ..
         b. John seems [ - to have lost his passport].
At first sight, an S-structure such as (lib) looks identical to those introduced in the previous chapter which involve a controlled empty pronoun, PRO. Compare the following:
   (12)  a. John seems to have lost his passport, b. John expects to have lost his passport.
However, the many differences between these two constructions lead us to believe that (12a) involves a movement while (12b) does not. First of all the subject in (12a) is not thematically related to the verb seem: it is not John that seems, so to speak, but That John has lost his passport', as seen clearly in the following sentence, which, while synonymous with (12a), does not involve the movement of the subject:
   (13)  It seems that John has lost his passport.
Here the subject of seems is an expletive it; the DP John is semantically related to lost and so must be generated in the lost clause at D-structure in both (12a) and
(13) according to the Sisterhood Condition of Theta Theory (chapter 3, section 3.4), Turning to (12b), in this case the S-structure subject is thematically related to the Verb of the main clause: it is John that is doing the expecting. This DP must then have been generated in the main clause at D-structure and so did not move there from the embedded clause. Indeed there is no clause similar to (13) in this case:
   (14)  * It expects that John has lost his passport.
This clause is only grammatical if the subject is interpreted as referential, i.e. linked to an actual thing, and not as an expletive, showing that the verb expect is lexically different from the verb seem: while expect has a thematic subject of its own, seem does not. Their lexical entries can then be represented as follows:
   (15)  a, seem: [CP], <theme>
         b. expect: [CP], <experiencer, theme>
While both verbs subcategorize for a clausal complement, seem has no other arguments while expect has an experiencer subject. Verbs like seem, which have
4 Movement in Government,/Binding Theory 125
no subject of their own but allow the subject of their clausal complement to move to their own subject position, are known as raising verbs.
   So far we have seen that passive, middle, unaccusative and raising verbs all allow DPs to move to their subject position and that the processes involved are very similar. In fact, if we accept the VP-Intemal Subject Hypothesis, this type of movement seems to happen in virtually every clause. The assumption is that the subject originates in the specifier of VP at D-structure and then moves to the 'canonical subject position', i.e. the specifier of IP:
   (16)  a. -            will [the princess kiss the frog].
                 .-----------------1
         b. The princess will [    - kiss the frog].
This involves the movement of a DP (the princess) into a subject position and therefore has much in common with all the other movements we have discussed. While this movement seems to involve only DPs, the restriction turns out to be tighter than this: only argument DPs can be involved. Thus an adjunct DP is never moved in any of these constructions:
   (17)  a. The lawyer amended the contract last night.
             * Last night was amended the contract.
         b.   We mashed the potatoes well every time.
             * Every time mash the potatoes well.
         c.   There arrived a package last week.
             * Last week arrived a package.
         d.   It seems that each day John loses his passport.
             * Each day seems John to lose his passport.
Moreover, these arguments always move to the subject position, a typical place for arguments. The positions in which arguments tend to appear can be referred to as Argument positions or A-positions. 'An A-position is one in which an argument such as a name ... may appear in D-structure; it is a potential 0-position' (Chomsky, 1981a, p. 47). All the movements seen so far move arguments from one A-position into another and so this kind of movement has been called A-movement.
Typically the pleonastic subject that appears with a raising verb like seems is it, while that which appears with an unaccusative is there:
   It seems that Celia is sewing socks. (* there seems that..,)
   There arrived an architect from Angola. (* it arrived . ..)
However, in the following sentence there is a there subject of the raising verb:
   There seems to be a problem with the pancakes.
EXERCISE
4.1
How is this possible?
126 4 Movement in Government,(Binding Theory
4,1.2   A-movements
Let us contrast the above movements with another type: wh-movement. The previous chapter described how a wh-element moves to the front of the clause in wh-questions:
   (18) a.         -          the lawyer has amended which contract?
        b. Which contract has the lawyer amended          -?
No details were mentioned of the structural position that the wh-element moves to. It is clearly to the left of the subject in the IP specifier position and so presumably outside the IP, i.e. within the domain of the CP above the IP. The earlier discussion of the CP claimed that the CP is headed by a complementizer which takes the IP as its complement:
   (19)      CP
             C
C          IP
Given that the C is a head position and the wh-element is a phrase, we would not expect the wh-element to move to C. However, as yet we have found no use for the specifier of CP. As the specifier position is one where phrases can go, this might well be where the wh-element moves. This solution has the additional advantage of leaving the C position available to accommodate the auxiliary, also fronted by another movement to be discussed later. Thus, a sentence with a fronted wh-element will have the following structure:
amended -
4 Movement in Government/Binding Theory 127
   Now that these details are in place, this movement can be compared with others. Although the moved wh-element here happens to be a DP, wh-movement does not always involve DPs. For example APs and PPs can undergo the same type of movement:
   (21) a. How bad did the news seem?
        b. By whom was the murder victim discovered?
Moreover, not only does the moved wh-element not have to be a DP, it doesn't even have to be an argument; wh-adjuncts can undergo the same movement:
   (22) a. When did the maid change the linen?
        b. For how long was the miscalculation kept from the public?
Thus, this kind of movement seems different from the previous movements which only involved DP arguments. The position to which the wh-element moves also differs from that in A-movement in that the specifier of the CP is not a position where we tend to find arguments and hence it is not an Apposition. Those positions that are not A-positions are called A-positions (the bar on 'A' standing for 'non'); this kind of movement is therefore referred to as A-movement, 'Other positions we will call "A-positions" in particular, the clause-external position occupied by operators such as who (Chomsky, 1986a, p. 80).
   There are other movements which move elements into A-positions and so fall into the same category as wh-movement. English for example has a handful of constructions that go against its standard SVO order by moving elements in front of the Subject. For example both topicalization and negative-fronting (as in Never again would I fly British Airways) move elements to some position to the left of the subject (and so out of the IP). But this time the landing site of the movement is to the right of the complementizer. In:
   (23)    I thought that this exam, everyone would pass. this exam has been moved in front of the subject, while in:
   (24) I vowed that never again would I fly British Airways.
never again has been fronted before the Subject (together with compulsory inversion of the auxiliary would).
   Some have taken this evidence to support the claim that there is further functional structure built on top of the IP than the CP alone, as mentioned in chapter 3. Rather like the idea of an articulated IP, where various functional projections are built on top of the VP headed by functional heads expressing tense and agreement, an 'articulated CP' could have projections headed by functional heads expressing such notions as topic and focus. These would provide specifier positions for various elements to move to, each being an A-position. However, given that the Articulated INFL Hypothesis is still highly debatable, not everyone is willing to accept the Articulated CP Hypothesis either.
128 4 Movement in Government/Binding Theory
EXERCISE
4.2
   Many relative clauses are formed by what looks to be the same process utilized in wh-interrogatives:
      (i)  The man [who you met].
      (ii) I asked [who you met].
   What arguments can you give that who in (i) is moved to the specifier of the CP?
4.1.3  Head movements
The last type of movement to be discussed differs from those seen so far in involving the movement of heads of phrases rather than of the phrases themselves, hence is known as head movement. The most obvious kind of head movement is subject-auxiliary inversion, which has already slipped into the discussion without comment. Head movement can be found in certain interrogative clauses, either accompanying wh-movement or by itself in yes-no questions:
   (25)  a. When will they - arrive? b. Will they - arrive on time?
As seen in (25), an auxiliary verb such as will generated in the I position ends up to the left of the subject they in these kinds of interrogatives. Given that the wh-element moves to the specifier of the CP, it is natural to assume that the auxiliary moves to the actual C position:
   (26)        CP
C/
  C         IP
will DP           T
   In English, movement from I to C is restricted to certain clause types, such as interrogatives, and only involves auxiliary verbs. English main verbs never move to the C position. So:
   (27)  * Arrived they - on time?
                                                               4 Movement in Government/Binding Theory 129
is ungrammatical as it involves movement of the head arrive. The grammatical alternative is:
   (28)     Did they - arrive on time? where the auxiliary do is in C position.
   Because the main verb seems stuck inside the VP in English, inversion structures which do not contain an underlying auxiliary verb demonstrate do-insertion, involving the insertion of the expletive auxiliary do which undergoes the movement to C. In other words, in English, one way of distinguishing a main verb from an auxiliary is whether it can be fronted in interrogatives:
   (29)  a. May I go?
         b.  *Go I?
         c.  Will he arrive?
         d.  ^Arrive he?
leaving some interesting borderline cases in many people's usage:
   (30)  a. Dare I go?
         b.  Do I dare to go?
         c.  Have I to see him?
         d.  Do I have to see him?
   However, languages that allow main verbs to move to C do not require do-insertion, For example, French forms questions in a similar way to English, except that French main verbs behave like auxiliaries by moving to the front:
   (31)  a. Avez-vous chante dans la classe?
             (have you sung in the class)
             Have you sung in class? b. Chantez-vous bien?
             (sing you well)
             Do you sing well?
In German, it has been argued, all finite verbs move to the C position, not just those in interrogatives. This reflects a property common to other languages, such as Scandinavian languages, called Verb Second (V2); some element, whether subject, object or adverbial, is placed in the first position of the clause, followed by the finite verb in the second position:
   (32)  a. Ich sange ein Lied gestern.
             1st 2nd
             I sang a song yesterday, b. Ein Lied sange ich gestern.
   1st 2nd
Gestern sange ich ein Lied.
1st 2nd
c.
130 4 Movement in Government [Binding Theory
A well-accepted analysis of the German clause has the element at the front of the clause moving to the specifier of the CP and the verb moving to C:
   (33)             CP
Ein Lied C              IP
   Support for this analysis comes from the fact that in embedded contexts where the C position is filled with a complementizer the Verb does not appear in the second position/ but stays at the end of the IP/ as the German IP is head final:
   (34) Ich sage [CPdass, [IP ich ein Lied gestern sange]].
        (I said that I a song yesterday sang)
        I said that I sang a song yesterday.
Head movement is also involved when Verbs move out of the VP into the I position. In the previous chapter, movement was used to show that verbal inflections are generated as the head of their own phrase and that they become attached to verbs via a movement. In many cases this movement involves the verb. For example, English aspectual auxiliaries have and be are generated in their own VP projection (as they are not in complementary distribution with modal auxiliaries and so are not generated in I). Their movement to I is shown by the following observations:
   (35) a. I should [VP never [VPhave met him]].
        b. I had [VP never [VP- met him]].
The adverb never is presumably adjoined to the left of the VP and, when the I position is filled by a modal auxiliary, the aspectual have is inside the VP to the right of the adverb. However, in the absence of a modal, the aspectual auxiliary is finite and sits to the left of the adverb. So the auxiliary moves from within the VP to join with the tense and agreement inflections.
   Again, languages differ over which verbs can move from V to I. We have seen that English main verbs cannot move to C, undergoing inversion, though French verbs can. It turns out that English main verbs cannot move to I either, though French verbs can, as we might expect:
                                                             4 Movement in Government/Binding Theory 131
    (36) Jean embrasse souvent Marie.
         (John kisses often Mary)
         John often kisses Mary.
Sentence (36) shows that the French main verb embrasse precedes the VP adverb souvent, stranding its object Marie inside the VP by moving to the inflection position.
This is not possible in English:
    (38)  a. * John kisses often Mary, b, John often kisses Mary.
Instead, the Verb kisses remains inside the VP, to the right of the left-adjoined adverb often. However it should be noted that the verb is still finite, bearing inflections for tense and agreement. Thus the verb must still get together with the inflections somehow, even though it does not leave the VP, The most obvious solution to this puzzle is to assume that the inflections in this case move down onto the Verb:
    (39)           IP
DP        r
John I           VP
- AP VP
often
V'
V
kisses
Mary
132 4 Movement in Govemment/Binding Theory
This is more or less the analysis that Chomsky had proposed in 1957, known as affix hopping (Chomsky, 1957). This assumed that inflections were generated to the left of the verbal element that bears them at the surface and are then subject to a transformation which shifts them one step to the right:
   (40)  a. John es often kiss Mary.
         b. John - often kisses Mary.
The contentiousness of this analysis is that it involves a rightwards and downwards movement, which is not only very unusual (all of the movements we have so far considered are leftwards and upwards), but also seems to violate a number of principles to be discussed later in this chapter.
   The final case of head movement to be mentioned involves the movement from one verbal position to another. In the previous chapter our analysis of causative constructions involved an abstract causative verb with a VP complement headed
by a thematic verb:  
(41) VP              
  DP                 
   A                 
they V               
     (make)          
the ship V
                                  sink
To get the right word order, it is proposed that, when the causative verb is abstract, the thematic verb moves from its own V position to that of the causative verb. This is rather like V to I movement, especially if the abstract causative verb is taken as a bound morpheme which must attach to the main verb. Indeed, in some languages the causative can be expressed as an overt morpheme which attaches to the verb, as in the following Hungarian example:
   (42) Janos el -gur-it -at -ta a labdat.
         (Janos away-role-cause~past-3.sing. the ball-acc.)
         John rolled the ball.
This example shows how the causative morpheme sticks to the end of the verb, as do the tense and agreement morphemes. The morphologically complex verb is created by successive movement of the verb, first to the causative head, then to the tense head and finally to the agreement, picking up the morphemes as it goes:
4 Movement in Government,/Binding Theory 133
(43) a. t-AgrP             ta [PP           at [vP   it [VP el-gur a labdat] 
                                                   .................i        
                                                   T                         
     b. I-AgrP             ta [jP           at [vP el-gur-it [VP - a labdat] 
                                                   _ i                       
     c. t-AgrP             ta [pp el-gur-it-at [vP           [VP - a labdat] 
                                  .......i                                   
        V                                                                    
     d. LgrP el-gur-it-at-ta [TP             ~ fvP         - [VP - a labdat] 
Koopman (1984) notes the following word order patterns in Vata, a Kru language of the Ivory Coast of Africa:
   A li saka.
   (we eat rice)
   We ate rice.
   A la saka li.
   (we have rice eat)
   We have eaten rice.
What is the underlying position of the verb in this language and how can head movement help us to account for the surface word order?
   TYPES OF MOVEMENT
   A-movement: movement of DPs to .argument positions, e.g.
    Movement to subject: [IP will [VP the Labour Party win again]]
                       [IP the Labour Party will [VP - win again]] A-movement: movement of phrases to non-argument positions, e.g. Movement to specifier of C: [CP[1Pthey will win the election when]]
                             [CP when will [IP they win the election -]] Head movement: movement of heads of phrases rather than whole phrases,
    e.g. Subject-auxiliary inversion: [CP [IPthey will win the election]]
                                                               [CP will [iP they - win the election]]
4.2   Further developments to the theory of movement
4.2.1 The Projection Principle and Trace Theory
No details of the restrictions on transformations have so far been provided. Some basic restrictions originated in the 1970s but continued to be assumed in GB Theory. One aimed to curtail the power of transformations so that they could not make
EXERCISE
4.3
134 4 Movement in Government/Binding Theory
just any kind of change at all to a structure. D-structures should be left intact by a movement transformation, which simply moved things about. This restriction was known as the Structural Preservation Principle. Effectively this ensures that the landing sites of a moved element should already be present at D-structure and not be created just for the purposes of the movement itself. This is clearly demonstrated in A-movements, which use the specifier of IP as their landing site. This canonical subject position is obligatorily present in all clauses, as dictated by the Extended Projection Principle (EPP), introduced in the previous chapter. Thus A-movements move NPs into a position which is present, but vacant, at D-structure. We can assume the same is true for other movements we have reviewed. So the specifier of CP, the I and the C position are all potentially present at D-structure, but are vacant so that they can be moved into by the relevant element.
   As a consequence of structural preservation, the extraction site of a moved element cannot disappear when it is vacated as this would alter the structure from its D-structure configuration. So extraction sites are left empty at S-structure. This has already been hinted at above by marking the extraction site with a However, the empty slot created by moving an element out of one position is very different from the empty slot which exists at D-structure and can be filled by something moving into it. Nothing for instance can move into an empty position from which an element has moved, as demonstrated by the following example:
   (44) * Will he haven't - read the paper?
In (44) the modal will moves to C, vacating the I position. In English, aspectual auxiliaries such as have can move from V to I and only in I can the contracted negation n't attach to the auxiliary. The ungrammaticality of (44) indicates that the movement of have is impossible. The reason is that when an element moves, the position it vacates is not entirely empty but is filled by a trace of the moved element. A trace retains the syntactic and semantic properties of the moved element, to which it is linked, but has no phonological realization; it exists as an abstract invisible entity that is never physically present in speech - a kind of invisible place holder. '[W]hen a category is moved by a transformation, it leaves behind an empty category, a "trace"' (Chomsky, 1986a, p. 66). So the trace left behind when a DP moves from object position to subject position in a passive structure will be a DP which shares the same 0-role and referential properties as the moved element, but will not be pronounced:
   (45) The paper; was read t;.
Here the trace is represented by a t which is co-indexed Q with the moved element to show the link between them. In the previous chapter we encountered the phonologically empty pironouns PRO and pro, which share with traces the fact that they are unpronounced. Together these elements are called empty categories - elements necessary to the D-structure of the sentence and its interpretation that never occur visibly in the S-structure. There is, in fact, substantial evidence of
4 Movement in Government/Binding Theory 135
various sorts to support the hypothesis that empty categories do appear in rep-resentations at various syntactic levels7 (Chomsky, 1986a, p. 67).
   For some purposes traces can be considered independent elements. However, in other ways we might not want to view a trace as a separate entity from the moved element. For one thing, they share the same 0-role as the moved element; yet the Theta Criterion insists that 0-roles can only be assigned to one argument and cannot be shared out. This is solved by the notion of a chain. A movement chain consists of the moved element and its trace (in fact, as we will see, an element can move more than once and so there can be more than one trace in a chain). 'A chain is the S-structure reflection of a "history of movement", consisting of the positions through which an element has moved from the A-position it occupied at D-structure7 (Chomsky, 1986a, p, 95). So the sentence:
   (46)     Whenj is2 the concert t2ta? has two chains. One:
   (47)  (whenu ta)
links when to its original place at the end of the sentence. The other:
   (48)  (is2/12)
links is to its original position.
   Each chain acts as a single entity so far as principles such as the Theta Criterion (chapter 3, section 3.4) are concerned. Thus this principle can be restated as:
   (49)  Theta Criterion
         All theta roles must be assigned to one and only one argument chain. All argument chains must bear one and only one theta role.
In GB Theory structural preservation was more neatly tied into the way X-bar Theory determined D-structures under the Projection Principle. As seen in the previous chapter, GB Theory states that syntactic levels of representation are projected from the lexicon, in the sense that information in the lexicon determines major aspects of structure. In D-structure the principles of X-bar Theory create category-neutral templates for structures; all the details of particular structures are 'projected7 from the lexical items inserted into these templates. Thus the category of the head itself determines the category of the phrase and subcategorization properties of the head determine properties of the complements. That is, the lexical entry:
   (50)  drive: [V, DP]'
for a Verb determines that the phrase to be generated has to be a Verb Phrase and that it has to have an Object; someone drives something:
136 4 Movement in GovernmentfBinding Theory
              (51)  Sarah drives a Ford.
              If S-structures are also linked to lexical information, from which they can-not deviate, it follows that no movement can radically alter a structure: if a verb is transitive at D-structure, it remains transitive even if its object moves at S-structure, as the trace of the object will still remain in the complement position. To take another example, transforming:
              (52)  John will get the sack.
               to:
                (53)  Willj John get the sack?
             leaves the trace t in the same position that will originally occupied:
                (54)          CP
              c
                       c             IP
will DP                r
           Furthermore, head movements do not change the categorial status of the positions the elements move to. When an inflectional element such as a modal auxiliary moves to C, C does not become I projecting an IP, but remains C projecting a CP. So in (54) the auxiliary will is still attached to a C in a CP even if it originated from an I in an IP.
EXERCISE In the following sentences insert traces into the appropriate positions and co-4.4           index them with the relevant moved element:
                 They should arrive on time.
                 John saw Mary.
                 Who could Henry fight?
                 This room, what colour would you paint?
4 Movement in GovernmentfBinding Theory 137
          4.2.2  Substitution and adjunction
          The majority of the movements considered so far have involved taking an element from one position to another which is vacant. We can envisage this as filling the empty position with the element undergoing the movement known as substitution: one element simply substitutes for the empty slot by moving into it. However/ some movements are not like this. For example/ V to I movement/ I to V movement and causative V to V movement all move an element to a position which is already occupied/ resulting in the formation of a morphologically complex element. This movement is assumed to proceed in the following way: when an English inflection moves to the verb, it Sticks' itself onto the verb, creating a morphologically complex verb containing both the original verb and the inflection:
             (55)       r
V
          So here the past tense ~ed has moved from the head of I to the V already filled by like and has become joined to it to get the combined element liked, in the spoken language getting the [t] pronunciation/ in the written language losing an <e>. The structure that is created is essentially an adjunction structure with the inflection adjoined to the Verb. For this reason this kind of movement is known as adjunction.
          4.2.3   Move a
          The structure-preserving nature of movement discussed above imposes severe restrictions on what can move where. For example/ phrases can only move to phrasal positions and heads can only move to head positions. Ultimately there is no need to state such restrictions as part of the movement rules themselves as they follow from the general Projection Principle, This enables us to simplify the movement rules further/ thus carrying out the aim of making more and more general statements rather than structure-specific rules. The restrictions introduced in the following sections allow even more simplification/ so that, at the start of the 1980S/
138 4 Movement in Government/Binding Theory
           it could be proposed that there was only need of one movement rule Move a, where a stands for any element. This simply stated that elements can move about in a tree - "move any category anywhere7 (Chomsky, 1982, p. 15). The details of movements - what actually moves where - were controlled by restrictions placed on movements in general.
              The model of GB Theory developed throughout the previous chapter can therefore be modified as follows:
                                                    S-structure
Figure 4.1 Government/Binding Model, amplified from chapter 3
             CONCEPTS OF MOVEMENT
             The Projection Principle (Chomsky, 1981a, p. 29): Representations at each syntactic level (i.e. D- and S-structure) are projected from the lexicon, in that they observe the subcategorization properties of lexical items.
             Trace: the invisible marker left behind in the structure when some element moves
               Mary has written a novel.
               Hasj Mary t3 written a novel?
             Chain: an object created by movement consisting of the moved element and all its traces, for example the chain (wherelf tt) in the sentence
               Where! did you put it tj?
             Adjunction: moving an element to another element already in the landing site, with the moved element adjoining to the other:
               -ed like —» liked
              Move a (i.e. move any element anywhere): the maximally generalized transformational rule which simply licenses movement - the details of actual movements themselves are settled by the interaction of other modules of the grammar.
4 Movement in Government/Binding Theory 139
The following sections extend this model by adding yet more modules which interact with Move a in complex ways to provide a detailed analysis of movements.
By definition a substitution movement can apply only once, as when an ele- EXERCISE ment is moved into a vacant position, it is no longer vacant. Adjunction, on the 4.5 other hand, can in principle be applied any number of times. Topicalization, such as:
   My mother, I always listen to.
has been analysed by some as a substitution movement, moving the topic into the specifier of an abstract Top7 head of a phrase (TopP) which is part of the articulation of the C-system. Others, however, have claimed it to be an adjunction movement, adjoining the topicalized element to either the IP or the CP.
Considering the following data, how many TopPs can there be and where do they come in relation to the CP? Is it better to view topicalization as an adjunction movement?
   Today, I will meet my brother in Edinburgh.
   Today, in Edinburgh, I will meet my brother.
   Today, in Edinburgh, my brother, I will meet.
   I said that today, in Edinburgh, my brother, I will meet. Today, in Edinburgh, this man, who would want to meet?
4.3   Bounding, Barriers and Relativized Minimality
Chapter 2 introduced the notion that grammatical movements are short as a consequence of restrictions imposed directly on movement by the independent module of Bounding Theory. Bounding has a long history in transformational grammar dating back to work in the 1960s by Chomsky (1964) and Ross (1967). We have already mentioned Ross's wh-Island Constraint (p. 71), which formed part of a general programme of research aimed at identifying several constructions which did not allow anything to move out of them, collectively called islands. But this approach was very construction specific and hence descriptive rather than explanatory in nature. In the 1970s a more general approach to bounding was introduced, which lasted well into the 1980s, involving a main principle known as Subjacency. This section introduces Subjacency and its successors, Barriers and Relativized Minimality.
43.1   Subjacency
The rationale for Subjacency is to prevent movements being overly long by identifying certain bounding nodes in a structure that cannot be crossed. A more
140 4 Movement in Government/Binding Theory
   general theory can be built in this way instead of by identifying the specific constructions that prevent movement, as the Islands approach attempted to do. However, it is impossible to identify a single node which is never crossed by some movement. Many movements involve shifting an element from one clause to the next, clearly crossing the clausal nodes IP and CP. These would have been obvious nodes to nominate as absolute bounding nodes in order to prevent long-distance movement. Yet, while clause to clause movement is a distinct possibility, it seems to be impossible to move out of two clauses in one go. Recall the restrictions on A- and A-movement discussed in the previous chapter (pp. 70-3). Long distances can be achieved by an element moving successively in short hops, leaving traces behind, rather than by moving in one giant leap:
      (56)  a. [CP Who* did [IP he think [CP t* [IP she saw tj]]]?
                      ft______________________ft____________!
            b. * [Cp WhOj did [IP he ask [CP when [IP she saw tj]]]]?
                      ^--------------------*--------------------1
      (57)  a. [1P He, seems [1P ts to be likely [1P tj to leave]]].
                    ft___________ft________________I
          b. * [jP Hej seems [CP that [IP it is likely [!P tt to leave]]]].
                   L--------------*--------------1
  So instead of there being a single node which is an absolute bounding node, preventing anything moving over it, bounding nodes seem to 'gang up' to prevent movement: while one of them can be crossed by a single movement, more than one cannot be crossed in one go. Thus the principle of Subjacency can be stated as follows:
     (58) Subjacency
          No movement can cross more than one bounding node.
  Looking at the examples in (56) and (57), it seems that IP can be identified as the bounding node blocking movement across it, as both the ungrammatical movements cross only one CP but two IPs.
     Ross identified a number of constructions as islands. For example, a nominal phrase which contains a clause also behaves as an island:
     (59) *[cp WhOj did [!P he hear [DP the rumour [CP tt that [IP she murdered tj]]]]?
  In this example, the wh~element makes two movements: one to the specifier of the noun complement CP, and one to the specifier of the matrix CP. As both of these movements cross only a single IP, the Bounding Theory so far discussed cannot account for the ungrammaticality. To rectify this, we might claim that DP
                                                              4 Movement in Government/Binding Theory 141
  is also a bounding node, noting that the second movement moves over a DP and an IP/ violating Subjacency if both are bounding nodes.
     There is some parameterization of what counts as a bounding node in any given language. For example/ in Italian/ some grammatical sentences appear to violate the wh-Island Constraint:
     (60)  Tuo fratello, a cui mi domando che storie abbiano (your brother to whom myself ask-l.sing. which stories have~3.pl. raccontato. told)
           * Your brother/ to whom I wonder which stories they have told.
   In this example, the wh~phrase a cui (to whom) has moved out of the 'have told' clause to form a relative clause/ despite the fact that this clause begins with another wh~phrase/ che storie (which stories). As we can see from the translation/ this sentence/ though grammatical in Italian/ is ungrammatical in English. The movement/ like other wh~Island violations/ crosses two IPs which/ as we have established/ are bounding nodes for English. The grammaticality of (60) means that IP cannot be a bounding node in Italian/ but this does not mean that Italian has no bounding nodes and hence allows unrestricted long-distance movement. For example/ Italian/ like English/ does not allow a movement out of a complex nominal phrase:
     (61)  * Tuo fratello/ a cui temo la possibility che abbiano
            (your brother/ to whom fear-l.sing. the possibility that have~3.pl. raccontato tutto. told       everything)
            * Your brother, to whom I fear the possibility that they have told everything.
     Rizzi (1982) has claimed that these data are accounted for by assuming that the bounding nodes for Italian are CP and DP rather than IP and DP: in other words what counts as a bounding node is a parameter of variation between languages. The Italian setting of the Subjacency parameter allows elements to be extracted out of two IPs but not out of two CPs. Thus/ a moved wh-element can skip one specifier of CP, allowing a violation of the wh~Islarid Constraint:
     (62)   [CP wh* [iP .,. [CP wh [|p . . . tJJ]J
                 k______________________I
  The movement in (62) crosses two IPs but only one CP; so it is ungrammatical if IP is a bounding node, as in English/ but grammatical if CP is the bounding node/ as in Italian.
142 4 Movement in Government/Binding Theory
EXERCISE
4*6
       1  Here are some sentences involving movement out of islands. Identify where the bounding nodes are and how many are moved over by each movement and conclude whether Subjacency accounts for all island phenomena;
             * Whoj did you meet a man who introduced tx to Bill?
             * Whoj is the fact t that she met tj worrying?
             * Whoj is h that she met t worrying?
             * Whoj did you see Bill and tx?
             * Whox is a picture of tt on your desk?
       2  In Hungarian, wh-elements move to the front of the clause, but cannot normally move out of a clause. So cases of long-distance wh-movement are produced by the use of a dummy wh-element in the scope position and the wh-element at the front of its own clause:
             Mit mondott Janos, kit         szeret Mari?
             (what said lohn, who (acc.) likes Mary)
             Who did John say Mary likes?
          What setting of the Subjacency parameter might produce this strategy of question formation?
       SUBJACENCY
       The Principle of Subjacency: any movement can cross at most a single bounding node.
       Bounding Nodes: these can be CP, IP or DP, subject to parameterization specific to the language.
       The Subjacency parameter: this selects the bounding nodes for a particular language, for example English selects IP and DP while Italian selects CP and DP,
     4.3.2 Barriers
     A number of the islands identified by Ross are, however, not accounted for so readily by Subjacency. For example, Ross noted that a clause that sits in subject position is an island:
       (63)    * [CP Who* waSj [*P [CP tj that [IP you met tj] t^ unexpected]]?
     Here the object within the sentential subject moves first to the CP specifier position of its own clause and then to the specifier of the matrix CP. Both of these movements cross only one bounding node and hence Subjacency does not account for the ungrammaticality. Yet the phenomenon seems part of a wider set of observations which indicate that it is easier to move out of a complement than a subject. For example, a wh-element can be moved out of an object DP, but not out of a subject:
4 Movement in Government/Binding Theory 143
     (64) a. [CP Who* did [lP you draw [DP a picture of tj]]?
           b. * [Cp WhOi did [IP [DP a picture of tj fall off the wall]]?
  In this case the grammatical (64a) causes problems for Subjacency as the movement appears to cross a DP and an IP. Yet these observations fall in line with those which demonstrate it is easier to move out of a complement clause than a clausal subject.
     In fact, this can be extended to show that it is easier to move out of a complement construction than out of any other kind of construction. Thus, for example, a wh-element cannot move out of a clause or a DP that functions as an adjunct:
     (65) a, * [CP WhOi did [IP he leave [CP because he met tj]]?
                    (cf. He left because he met Mary.)
           b. * [CP When* did [IP he meet Mary [DP the day before tj]]?
                    (cf. He met Mary the day before Wednesday.)
  Whether a construction causes problems for movement depends on where that construction is situated rather than on absolute properties of the construction itself. In this case, both the Island approach and Subjacency are mistaken as they both assume that it is properties of specific constructions which block movements.
     In his Barriers monograph, Chomsky (1986b) initiated a new approach to bounding by proposing that certain constructions become barriers to movement not because of what they are but because of where they sit in a structure. 'A potential barrier may be exempt from barrierhood by an appropriate relation to a lexical head' (Chomsky, 1986b, p. 12). Given that it is structures other than complements which create problems, Chomsky proposed that complements have a special property which prevents them from being barriers to movement. This special property he called L-marking, related in some direct way to a lexical head. Obviously complements are related to lexical heads in that they are selected by them. Thus a complement is an L-marked construction while a subject and an adjunct are not L-marked. It would be simple to claim that constructions which are not L-marked constitute barriers to movement, but unfortunately one non-L-marked element is never a barrier to movement, namely IP. To get round this, Chomsky proposed that something is a Blocking Category, i.e. something which is a potential barrier, if it is not L-marked; barriers are then defined as Blocking Categories which are not IPs. This is incorporated in the following set of definitions:
     (66) Bounding Principle
           A movement which crosses a barrier is ungrammatical.
     (67) A construction is a barrier if it is a Blocking Category but not IP.
     (68) A construction is a Blocking Category if it is not L-marked.
     (69) A construction is L-marked if it is selected by a lexical head.
     This theory, however, is still too simple as it predicts that a wh-element can be extracted directly out of a complement clause, instead of having to move first to the CP specifier position. If this were true then we would not get wh~Island effects.
144 4 Movement in Government/Binding Theory
What is needed is to define any CP as a barrier, unless an element moves into its
respect to it. To achieve this means defining barriers for elements in particular positions and these will not necessarily be barriers for elements in other positions.
    tion' (Chomsky, 1986b, p. 12). Simply put, the CP inherits its barrier status from the IP which is a Blocking Category for an element that it contains. So a CP can become a barrier, even if it is L-marked, by dominating a Blocking Category for a moving element. An element inside IP would then be allowed to move out of the IP, as this is a Blocking Category but not a barrier, but it could not move out of the CP. Therefore only movement to the specifier of the CP is possible. Once the element has moved out of IP, the IP is no longer relevant for it, and hence the CP ceases to inherit its barrier status. From this point on the element can move out of the CP as long as this is L-marked. We can schematize this in the following way, where italics indicate Blocking Category status, bold face indicates barrier status and arrows represent L-marking
       (70) a. He thinks [CP [w she likes who]3?

    specifier position. At this point the CP will remain a barrier if it is not L-marked, but will cease to be a barrier if it is. Barrierhood is not an absolute property, but is also defined taking into consideration where the moved element is with
           Chomsky achieves this by making the CP a barrier for an element inside the IP by 'inheritance'. 'Let us suppose that CP inherits barrierhood from IP, so that CP will be a barrier for something within IP but not for something in the pre-IP posi-
             b. He thinks [CP who; [IP she likes tj]?
                Whoj does he think [CP t| [IP she likes tj]?
4 Movement in Government/Binding Theory 145
   Using the definitions given above say whether the bold elements in the struc-    EXERCISE
   tures below count as a barrier for an element in the position marked X:          4.7
      believe [CP that [IP ... X ... ]] believe [CP that [IP ... X believe [CP X that []P . . . ]] left [CP because [IP ... X ... ]] left [CP X because [IP , .. ]]
4,3.3  Relativized Minimality
Clearly the Barriers framework is very complicated. Despite its importance/ it was not long before simpler approaches to boundedness were proposed. The most influential was Rizzi's Relativized Minimality approach (Rizzi 1990). Like Barriers, this theory of boundedness does not set up absolute barricades for movement, but relativizes the conditions in which movement is blocked to whatever it is that is moving. The general idea is simple: all movements should be to the nearest relevant position, where what constitutes a relevant position depends on the moved element. If it is a head that is moving, then the nearest relevant position is the nearest head position. If an argument is undergoing A-movement, then the nearest relevant position is the nearest A-position; and if an element is undergoing A-movement, then the nearest relevant position is the nearest A-position, as seen in the following examples:

A
                                                                                                                         (71) a. b.
(72)   a. b.
(73)   a. b.
  [CP Could* [IP he tj [VP have seen the message]]]?
* [CP Have* [IP he could [VP t* seen the message]]]?
  [CP Had* [IP he t* [VP t* seen the message]]]?
  [CP What* did [IP you think [CP t* [IP he saw t*]]]]?
* [CP What* did [IP you ask [CP where [1P he saw tj]]]?
  [IP He* seemed [!P t* to be likely [IPt* to win]]]?
* [IP He* seemed [!P it is likely [IP t* to win]]]?
Sentence (71) shows the effects of bounding on head movement. The inflectional modal auxiliary can move to C as C is the next head position up from I. The aspectual auxiliary have cannot move to C, however, as this entails moving over the I which is nearer to the auxiliary but filled by the modal. Only if there is no modal, allowing the aspectual auxiliary to move first to I, can this head then move to C.
  In the early 1980s GB Theory these observations were accounted for by the Head Movement Constraint (Travis, 1984), which stated that heads cannot move over the top of other heads. If we compare this situation to those in sentences (72) and
(73), which represent familiar bounding effects for A- and A-movements, we can see a similar effect: a wh-element can move to a higher specifier of CF as long as it does not move over another wh-element in a lower CP specifier, and a subject
1
146 4 Movement in Government/Binding Theory
  can undergo raising as long as it is not raised over another subject position. Rizzi's insight was to see these restrictions as instances of one general restriction:
     (74) Relativized Minimality
           No X can move over another X, where X is either a head, an argument or a wh-element.
  This simpler restriction on movement has been very influential for research to the present day. Yet it is not entirely obvious how it can account for the observations which motivated the Barriers approach, namely that complements are easier to move out of than non-complements. Indeed, within GB Theory there was no one approach to bounding that could straightforwardly account for all boundedness restrictions on movement. We will see, however, that many of the ideas discussed in this section re-emerge in more current treatments.
    RELATIVIZED MINIMALITY
    An element must move to the nearest relevant position, defined in relation to what movement is involved: head movement, A~movement or A~movement.
  4.4  Case Theory
  The previous chapter introduced Case phenomena but gave no details of the principles which govern them. Case Theory is in fact a central part of GB Theory which plays an important role in the analysis of certain movements, as we shall see after some other principles that belong to this module have been introduced.
  4.4.1    Abstract and morphological Case
  It is important to establish first what phenomena are covered by Case. Traditionally the term Case refers to the form that nominals take, say in Latin mensa, mensam, mensas etc. (table), often depending on their function in a sentence. For example, in Hungarian the subject noun is typically in the nominative form which is unmarked, but the object noun is in the Accusative form marked by a -t affix:
     (75)  a.   Janos      elment,
                (John-nom. a way-went)
                John left.
           b. Latom        Janost.
                (see-l.sing. John-acc.)
                I see John.
4 Movement in Government/Binding Theory 147
As we see from the translations, English does not mark Case distinctions on its nominals: where Hungarian has two forms, Janos and Jdnost, the same form of the noun John appears in both subject and object position in English* Only in pronouns is there a formal difference between subjects and objects in English:
   (76)  He/she/they admired him/her/them.
Thus English rarely marks Case distinctions morphologically.
   Nevertheless this does not mean that the notion of Case is redundant for languages without morphological Case distinctions, as Case plays an important role in many languages in determining the distribution of DPs, not just the form of nominals. Tn some languages, Case is morphologically realized, in others not, but we assume that it is assigned in a uniform way whether morphologically , realized or not" (Chomsky, 1986a, p. 76). We therefore need a more general notion of Case distinct from the traditional notion of nominal morphological form, to be called abstract Case (as opposed to morphological Case). Abstract Case is a property which is borne by a nominal element as a result of occupying certain positions. Languages vary as to whether abstract Case actually has morphological consequences; nevertheless its presence is universal. Case Theory is the module in the GB grammar which addresses issues of abstract Case.
4,4.2  The principles of Case Theory
One of the main principles of Case Theory is the Case Filter, briefly mentioned in the previous chapter. This ensures that all overt DPs occupy positions to which Case is assigned ~~ bear in mind that PRO does not sit in a Case position:
   (77)  Case Filter
          'Every phonetically realized [DP] must be assigned (abstract) Case' (Chomsky, 1986a, p. 74).
This naturally raises the question of what determines which is a Case position and which is not. The answer lies in the principles of Case assignment. The first thing to note is that certain elements are assumed to have the ability to assign certain Cases. The most straightforward instance of this is Accusative Case:
   (78)  John invited them.
As Accusative Case is associated with objects, the most obvious assigner of this Case is the verb. This is supported by the fact that in some languages not all verbs take Accusative objects. For example, in German the objects of some verbs have dative or Genitive Case:
   (79)  a, Sie hilft ihm.
              (she helped him-dat.)
              She helped him.
148 4 Movement in GovernmentfBinding Theory
            b. Er kormte sich des Lachens nicht enthalten.
                (he could himself the-gen. laughter not refrain)
                He couldn't stop himself from laughing.
   As the Case of the object depends on which verb is used, the verb is the ultimate . Source of the object's Case. In English, all verbs assign Accusative Case and in fact Accusative is the usual Case for verbs to assign in most languages. Accusative Case is known as a structural Case, as it is generally assigned to the structural position of the verbal object. Dative and genitive objects depend on particular verbs and are said to bear inherent Case, which differs in a number of ways from structural Case:
      We distinguish the "structural Cases", objective and nominative, assigned in terms of S-structure position, from the "inherent Cases" assigned at D-structure.... Inherent Case is associated with 0-marking while structural Case is not, as we should expect for processes that apply at D-structure and S-structure, respectively. Thus, we assume that inherent Case is assigned by a to [DP] if and only if a 0-marks [DP], while structural Case is assigned independently of 0-marking. (Chomsky, 1986a, p. 193)
      The objects of prepositions can also be Accusative:
      (80)  I sent a letter to him.
   though in some languages some prepositions may have dative or genitive objects:
      (81)  a. Ich habe kein geld bei mir.
                (I have no money with me-dat.)
                I have no money with me. b. Das Dorf liegt diesseits des Flukes.
                (the village lies on-this-side-of the-gen, river)
                The village lies on this side of the river.
   Prepositions can then be treated in a similar way to verbs, as assigners of structural Accusative Case, or, in some languages, as inherent Case assigners.
      Nominative Case is typically assigned to subjects. However, not all subjects have Nominative Case. For one thing, PRO sits in the subject position of non-finite clauses and this DP is considered not to bear Case. When infinitival subjects are overt, they are in the Accusative:
      (82)  a. For him to leave now would be unacceptable, b. I believe him to be rich.
   All examples of Accusative subjects involve non-finite clauses; it is then the finite clause which has nominative subjects. So the Nominative Case assigner must have something to do with the finite inflections for tense and agreement. It is often claimed that agreement is responsible for Nominative Case assignment on the basis
                                                               4 Movement in Government/Binding Theory 149
that subjects of Portuguese infinitives are nominative when the infinitive bears agreement morphology as noted by Raposo (1987):
   (83)  Sera dificil eles aprovarem a proposta.
         (will~be difficult they to-approve~3.pl. the proposal)
         It will be difficult for them to approve the proposal.
From these observations we can conclude the following about Case assigners:
   (84)  a. Verbs and prepositions assign Accusative Case, b. Agreement assigns Nominative Case.
   The next question to arise is where these Case assigners assign their Cases to. There are strict restrictions on Case assignment. Just because a verb/ for example/ has an Accusative Case to assign does not mean that it can assign it to any DP in a structure. Typically verbs and prepositions assign their Cases to their DP complements/ and so the restrictions on Case assignment are similar to those on theta-role assignment discussed in the previous chapter (section 3.4). However/ the restrictions on Case assignment are slightly looser than those on 0-roles as Case may be assigned to DPs that are not arguments of the Case assignee under special circumstances. For one thing/ Case is not always assigned by a thematic element. Agreement is a functional element/ yet it is responsible for assigning Nominative Case to the subject. Other functional elements also assign Case. Consider the Accusative subject in (82a)/ repeated as:
   (85)  For him to leave now would be unacceptable.
As there is no agreement element on the non-finite inflection/ it is assumed not to be a Case assigner. This clause begins with a for complementizer/ in the absence of which the sentence would be ungrammatical:
   (86)  * Him to leave now would be unacceptable.
If sentence (86) is ungrammatical because the infinitive subject lacks Case and hence violates the Case Filter7 we must attribute the grammaticality of (85) to the ability of the complementizer for to Case-mark the subject. In addition this complementizer for has the form of a preposition (indeed is often called the prepositional complementizer) and that it assigns Accusative Case is therefore not surprising. Thus this complementizer is another functional category with the ability to assign Case.
   A further difference between Case and 0-role assignment is that Case can be assigned by a Verb to a DP which is the argument of another predicate. Consider the Accusative subject in (82b)7 repeated as:
   (87)  I believe him to be rich.
Clearly this him is thematically related to the be rich predicate and as such can be taken as the subject of the infinitival clause him to be rich. As we have said/ the
150 4 Movement in Governntent/Binding Theory
    non-finite inflection cannot assign Case and so the Accusative Case borne by the subject must come from elsewhere; the obvious choice is the Verb believe, as there is no for complementizer. If true, this is radically different from 9-role assignment as no predicate can assign a 9-role to another predicate's argument.
      Examining the instances of Case assignment discussed so far reveals at least three different configurations within which Case is assigned:
    •  from a Verb or Preposition to its complement
    •  from a finite inflection to its specifier and
    •  from a complementizer to the specifier of its IP complement (setting aside believe for the moment):
      (88)      VP V'
            V         DP
    The structural notion of government (chapter 3, section 3.6) was construed so as to capture these three cases of Case assignment. If Case is assigned under government, it follows that in the above configurations the Case assigners govern the positions to which Case is assigned. One way in which government resembles 9-role assignment is that it appears to be a relationship between a head and a related phrase. Thus, the set of governors is selected from the set of heads. Not all heads are governors, however. Recall that PRO sits only in ungovemed positions, yet we find this empty pronoun in the specifier of non-finite IPs. It follows that the non-finite I is not a governor. The definition of possible governors is then:
      (89) a is a governor if a = X° (not non-finite I).
      Although government is not as restrictive as sisterhood, which is the condition relevant for 9-role assignment, it is still very local. For one thing the governors govern elements that are within their own maximal projections and cannot govern beyond this. To account for this, it is useful to define a structural relation which holds between elements within the same maximal projection. This relationship is often called m-command, related to the c-command seen in chapter 3 (p. 114): ,
      (90) m-command
            a m-commands p if the first maximal projection dominating a also dominates p.
4 Movement in Government/Binding Theory 151
(91) CP
C
IP
DP
V
I
VP
V
DP
In the tree in (91), the I m-commands everything inside the IP but not anything immediately in the CP, i.e. the complementizer or its specifier. The subject DP also m-commands everything inside the IP, The Verb m-commands everything inside the VP (the object), but nothing higher.
   In terms of locality it is also important that the governed element is not too deeply embedded within the maximal projection of the governor. One way to achieve this is to propose that government cannot cross certain domains. For example, if maximal projections block government, then a governor will be able to govern its own complement and specifier, but not to govern any element contained within these. This would be too strong, however, as we want the complementizer to be able to govern the specifier of its complement. Note that this complement is an IP and we have already seen that IP is exceptional in not counting as a barrier. This suggests that the notion of "barrier7 could be unified for the purposes of bounding and government. If there was just one definition of a barrier which blocked both movement and government relationships, then our theory would be simplified. Indeed, this was one of the aims of Chomsky (1986b) in proposing the notion of a barrier in the first place. From this perspective we can define government as follows:
   (92) Government (Barriers version)
        a governs (5 if:
        (i)   a is.a governor
        (ii)  a m-commands (5
        (iii) there is no barrier between a and (5.
   Rizzi's notion of Relativized Minimality also attempted to unify locality restrictions on movement and government, but this followed a different tradition of viewing government, dating back to Stowell (1981), in which government is a unique relationship holding between two close elements so that if X governs Y
152 4 Movement in Government/Binding Theory
            then Z cannot govern Y if Z is further from Y than X is. Diagrammatically, we can represent this situation thus:
   (93) Z X Y
           So it is the presence of X, the nearer governor, that blocks the possibility of Z governing Y. This fits into the Relativized Minimality pattern as, if X and Z are potential governors for Y, they will both be of the same type. Viewed from the point of view of movement, an element cannot move from Y to Z over the top of X, where X and Z are the same type, and from the point of view of government, something cannot govern from Z to Y over the top of X, where X and Z are of the same type. In movement, Z and X are potential landing sites, and Y must move to the nearer. For government, Z and X are potential governors and Y can only be governed by the nearer. This gives us the following definition of government:
              (94) Government (Relativized Minimality version)
                   a governs p if:
                   (i)   a is a governor
                   (ii)  a m-commands P
                   (iii) there is no closer governor to p than a.
EXERCISE 1 Which Cases are assigned to the bold DPs in the following sentences? (Hint: 4.8              what form of the pronoun replaces them?) Are the results all totally
                  expected and were there any difficult cases? Can you explain why it is difficult to determine which Case is assigned in these cases?
                     I know that John is hiding.        I never wanted John upset.
                     John, I really don't understand.   John smoking is unpleasant.
                     I expect John has left,            Q: Who left? A: John.
                     I expect John to leave.            There's a fly in my soup.
               2 The text has briefly alluded to Genitive Case, which is borne by DP possessors:
                     John's (his) bodyguard
                  What could assign this case? Is the observation that some DPs can have a PRO possessor problematic for the treatment of the assignment of Genitive Case under the assumptions made above?:
                     John's smoking is unpleasant.      PRO smoking is unpleasant.
4 Movement in Government/Binding Theory 153
CASE THEORY AND GOVERNMENT
'Case Theory deals with assignment of abstract Case and its morphological realisation' (Chomsky/ 1981a/ p. 6),
Case Filter: all overt DPs must be assigned Case.
Case is assigned to all DPs by case assigners:
-  Nominative is assigned by agreement:
      He disappeared.
-  Accusative is assigned by the Verb or Preposition:
      I liked him.
      She gave the book to him.
Case is assigned under government/ which means that the Case assigner m-commands the DP that it assigns Case to and:
(i)
(ii)
there is no barrier between the two (Barriers version) or
there is no closer governor of the DP (Relativized Minimality version).
*1
I
I
t
       4.4.3   Exceptional Case-Marking
       Let us briefly return to the case of the Verb believe, which takes an infinitival clause complement with an Accusative subject.
         (95)  I believe him to be rich.
       As this infinitival clause is not introduced by a for complementizer, the Accusative Case cannot come from the C-position. In fact, this clause cannot be introduced by a complementizer:
         (96)  * I believe for him to be rich.
       So these kinds of clauses have no C-system and verbs like believe can take IP com-plements. Such verbs are known as exceptional verbs precisely because they are exceptions in being able to have IP non-finite complements. Given that IP is not a barrier to government/ it follows that exceptional verbs can Case-mark the subjects of their IP complements/ in exactly the same way that the for complementizer Case marks this subject. Such constructions are called Exceptional Case-Marking constructions, or ECM constructions for short.
154 4 Movement in Government/Binding Theory
               In general, a verb selects a full clause [CP] not [IP]; [CP] not [IP] is the normal canonical structural realisation of proposition. Thus try, not believe, illustrates the general case; such examples [involving believe] are often called "exceptional Case-marking" constructions. In languages very much like English (French and German, for example) these constructions do not exist, and the counterpart of believe behaves like try in English in this respect. (Chomsky, 1986a, p. 190)
            4AA Of-insertion
            As mentioned in chapter 3, according to one view, nouns and adjectives are not Case assigners and thus cannot have bare DP objects;
               (97)  a. * a picture George
                     b. * very fond his mother
            Overcoming this problem involves inserting an expletive preposition, which has the role of assigning Accusative Case to these objects:
               (98)  a. a picture of George
                     b. very fond of his mother
               Chomsky (1986a) was concerned about the repercussions of this theory, however, and wondered why an of could not be inserted to overcome the Case Filter in other cases in which DPs occupied Caseless positions, say the subjects of non-finite clauses:
               (99)  * of John to leave now would be rude
            Moreover, we do not get of-insertion on the subject of a non-finite complement of a noun formed from an exceptional verb:
               (100) a. I believe him to be rich.
                      b.  * my belief him to be rich
                      c.  * my belief of him to be rich
            If Nouns do not assign Case, then we can account for the ungrammatically of (101b), as the nominalized verb would not be able to Case-mark the subject. But again, if of-i nsertion is a way to circumnavigate the Case Filter in these cases, why should it not be possible in (100c)? Chomsky's answer to these problems was to suggest that Nouns and Adjectives do assign Case,
               Suppose we revise... Case theory ..., regarding nouns and adjectives as Case-assigners along with verbs and prepositions. We distinguish the "structural Cases" objective and nominative, assigned in terms of S-structure position, from "inherent Cases" assigned at D-structure. The latter include oblique Case assigned by prepositions and now also Genitive Case, which we assume to be assigned by nouns and adjectives just as verbs normally assign objective Case. (Chomsky, 1986a, p. 193)
4 Movement in Government [binding Theory 155
        Of is simply the realization of the Genitive Case that is assigned by these heads and therefore it is not free to appear anywhere we find a Caseless DP as the theory of o/-insertion would suggest.
          This would still not account for the ungrammaticality of (100c)/ however, as, if the noun is able to assign Genitive Case to its object, it is unclear why it cannot do so to the subject of the non-finite IP in an ECM construction. Chomsky's solution is that genitive and Accusative Cases differ in that the former is an inherent Case while the latter is a structural Case. As we have mentioned, structural Cases are assigned to elements which occupy certain structural positions: i.e. those governed by Case assigners. Inherent Cases, on the other hand, seem more restricted in that they are only assigned to. specific arguments and as such the assignment of inherent Case is much more like the assignment of 0-roles. If this is the case then inherent Cases cannot be assigned to DPs which are not arguments of the Case assigner, and hence we cannot get ECM constructions involving the assignment of inherent Case.

A
       INHERENT AND STRUCTURAL CASE
          Structural Case is assigned by a Case assigner (Verb, Agreement) to the position that it governs and thus any DP which occupies this position will receive the assigned Case no matter what its thematic relationship to the Case assigner.
          Inherent Case is assigned by a particular Case assigners to a lexically specified argument and therefore it can only be assigned to a DP which is thematically related to the Case assigner.
£
        4.4,5 Adjacency and conditions of Case assignment
        One last issue remains to be discussed concerning the basic principles governing Case assignment. The original idea, on which the above discussion has been based, was that all Cases are assigned under the same condition, namely government. However, there are differences between Accusative and Nominative Case assignments.
          The most obvious concerns the direction that these Cases are assigned in: verbs and prepositions assign Accusative Case to their complements which are to their right, while agreement assigns Nominative Case to its specifier which is to its left:
(101)        VP
V      DP
                                           IP
dp r
156 4 Movement in Government/Binding Theory
It turns out that the direction in which Cases are assigned in is parametric and does not depend on the positions that DP arguments happen to occupy according to the principles of Theta Theory, For example, Koopman (1984) analyses Chinese as having a head-final VP, as most complements and adjuncts tend to precede the verb. However, Chinese objects tend to follow the verb ('aspect' is abbreviated to 'asp/):
   (102)  Zhangsan zuotian zai xuexiao kanjian-le Lisi.
          (Zhangsan yesterday at school saw-asp. Lisi)
          Zhangsan saw Lisi at school yesterday,
Koopman accounts for this by assuming that Chinese objects are generated in front of the verb, but then move behind it, as the verb assigns its Case to the right. If Chinese assigned Case to the left, then the object would not have to move and the sentence would have SOV word order. Given that this word order is possible in many languages, it follows that languages differ with respect to the direction in which Accusative Case is assigned. English obviously chooses to assign Accusative Case, and Genitive Case, if nouns and adjectives assign this, to the right. But, from this point of view, Nominative Case is exceptional in being assigned to the left.
   Another difference between nominative and Accusative Case can be seen in the following observations. Generally PP complements can come in any order, but DP complements must come immediately after the verb. Consider the following
contrast:                                                  
(103)     a. I spoke to the students about the assignment. 
          b. I spoke about the assignment to the students. 
(104)     a. He put the book on the shelf.                 
          b. * He put on the shelf the book.               
This contrast can be accounted for in terms of a restriction on Case assignment. If Case assignment is not only limited structurally, under the notion of government, but also is limited linearly, i.e. the Case assigner must be adjacent to the DP that it Case-marks, then it follows that PP complements do not have to be adjacent to their predicate but DP complements do. This is known as the principle of Adjacency. Adjacency is also at work in the following examples:
   (105)  a. * We were anxious for tomorrow him to arrive early, b. * I believe sincerely him to be rich.
Note that with finite complement clauses the complementizer or the governing verb do not have to be adjacent to the subject:
   (106)  a. We were anxious that tomorrow he should arrive early, b. I believe sincerely that he is rich.
The difference lies in the fact that with the finite clause the subject is assigned Case by the inflection, not by the verb or the complementizer. Therefore there is no Adjacency requirement between these elements and the subject. In (105),
4 Movement in Government/Binding Theory 157
however, the subject is dependent on the complementizer and the verb, respect-ively, for its Case; hence it must be adjacent to these elements.
   The Adjacency requirement is parametric since not all languages insist that objects be adjacent to their verbs, as the following Hungarian example shows:
   (107)  Lattam tegnap Janost.
        . (saw-1.sing, yesterday John-acc.)
          I saw John yesterday.
If Nominative Case is assigned from the finite inflection, this differs from Accusative Case assignment as the inflection and the nominative subject do not have to be adjacent:
   (108)  He obviously will correct the error.
In this case, the finite inflection (will) is separated from the subject to which it assigns Nominative Case by the adverb obviously.
   To account for the differences between nominative and Accusative Case assignment, it has been proposed that they are assigned under different conditions: Accusative Case is assigned under government, as we have described, but Nominative Case is assigned under the condition of specifier-head (spec-head) agreement - a relationship assumed to hold between all heads and specifiers which results in both elements sharing features. Morphologically this relationship is overtly realized in cases of subject agreement, where the subject and the agreement head obviously have matching features.
   (109)  a. A lanyok elmentek.
              (the girls away-went-3.pl.)
                                                                                                     The girls left, b. En elmentem.
              (I away-went-l.sing.)
              I left.
   We can also see it more abstractly in wh-movement, where the complementizer head shares the interrogative feature of the wh-element in its specifier position, thus determining the whole CP as an interrogative clause. In English, Case assigned under government is oriented to the right and is subject to adjacency, while Case assigned under spec-head agreement is oriented to the left and is not subject to Adjacency:
(110)
government
 rightward
  adjacent
                                               agreement
                                                leftward
not necessarily adjacent
158 4 Movement in Government [Binding Theory
            This then accounts for the differences between nominative and Accusative Case assignment.
         ADJACENCY
              Some languages such as English require case assigners to be adjacent to the DP that receives Case:
                I liked him very much, versus * *1 liked very much him.
              Others, such as French, have no such requirement:
                J'aime beaucoup la France. (I like very much France.)

              Let us recapitulate the principles of Case Theory before moving on to the role of Case in the analysis of movement. Perhaps the most basic is the Case Filter which determines that all overt DPs must have Case. This universal principle governs the distribution of DPs in all languages, even though they may differ in terms of the amount of Case that is actually realized morphologically. We then define a set of Case assigners, with verbs and prepositions generally being seen as Accusative Case assigners and finite inflections, or more specifically the agreement element, being the Nominative Case assigner. Languages may differ in terms of whether they have inherent Case-assigning elements and, if so, which of these assign which Case. Since these differences concern the properties of individual verbs or prepositions, they are lexically determined. Finally come the general principles governing how Case is assigned. Some cases are assigned under the restriction of government and may be left- or right-oriented. This will determine the position of the DP with respect to the Case assigner. For example, a Verb Object language assigns Accusative Case from the verb towards its right while an Object Verb language assigns Accusative Case to the left. This kind of Case assignment may or may not be subject to the Adjacency requirement. Other Cases, such as nominative, can be assigned under the specifier-head agreement relationship, which may differ in terms of its direction from the parameter setting for the Cases assigned under government and will not require adjacency.
EXERCISE 1 Are the following constructions ungrammatical for the same reason?:
4.9
                    * John hit he
                    * a picture him
              2 What problems for the Adjacency condition are presented by the following observations?
                    He criticized severely every proposal the board made.
                    Whom did you want to see?
                    That kind of attitude, I really can't put up with.
                    I gave Sarah the money.
4 Movement in Government/Binding Theory 159
       4.4.6  Case and movement
       Now we turn to the role Case Theory plays in structures involving movement. The relevance of Case for movement stems from the fact that Case Theory is a module of S-structure and so the principles of Case Theory are irrelevant for D-structure: DPs are positioned only with respect to the requirements placed on them from Theta Theory and the X~bar modules. Some of the positions that these modules require DP arguments to sit in will be positions to which Case is assigned, being governed by a Case-assigning head. Others, however, will not. This will not matter at D-structure as the principles of Case Theory do not apply there. But at S-structure it will matter; unless the Caseless DP is moved to a position where it can get Case, the structure will be ruled out as ungrammatical by the Case Filter. Thus Case plays a motivating role for certain movements.
         This last statement needs some qualification, however. Saying that Case motivates movement is to speak in metaphors: it is not as though there is some homunculus sitting in the language faculty who moves things around for particular reasons. Movement in GB Theory, viewed as Move a, is something which may or may not take place with no restrictions on the actual operation itself. Some of these movements will produce grammatical structures, conforming to all the principles of other modules, and some will not. Similarly, if a movement does not take place, this may sometimes result in a grammatical structure, sometimes not, depending on whether other principles are satisfied. In the cases under discussion, if the movement does not take place, the result will be ungrammatical as the result is a DP sitting in a Caseless position. On the other hand, if the relevant movement does take place, i.e. moving the DP to a Case position, the result will be a grammatical structure.
         Obviously the only movements that Case Theory has anything to do with involve DPs. There is one kind of movement we have discussed above which exclusively concerns DPs: A-movement. Let us consider how Case is involved in this kind of movement. Recall that in raising structures, the subject of the non-finite clause moves to the subject of the raising verb:
         (111)  a. Johnj seems [t* to like Mary], b. It seems [John likes Mary].
       As we have seen, the subject position of an infinitival clause is generally a Caseless position, unless it is Case-marked by a for complementizer or an exceptional verb. As neither of these is present in (111a), the infinitival subject is Caseless and if it were to remain in this position the result would be ungrammatical:
         (112)  * It seems John to like Mary.
       In (111b), on the other hand, the complement clause is finite, and hence its subject receives Case from the finite inflection. Therefore the structure is grammatical with the subject remaining in the complement clause.
         Another straightforward instance of Case-motivated movement concerns the movement of the subject out of the specifier of VP, where it originates according to the VP-Internal Subject Hypothesis, into the canonical subject position, the specifier of IP:
160 4 Movement in Government/Binding Theory
             (113)  [IP Hes will [VP ts write a letter]].
          From what we have said about Case assignment in English, the specifier of VP is obviously a Caseless position: the verb cannot assign its Accusative Case to this position as Accusative Case is assigned under government to the right; the inflection cannot assign its Nominative Case to this position as this is assigned under spec-head agreement to the specifier of the IP, Therefore, if the subject were to remain inside the VP, it would violate the Case Filter at S-structure and the resulting structure would be ungrammatical:
             (114)  * [|P It will [VP he write a letter]].
             In passive structures, the object moves to the subject position and cannot remain in object position:
             (115)  a. John* was identified t;. b. * It was identified John.
          We could account for this phenomenon if the object position of a passive verb were Caseless. In this case, the object would be ungrammatical if it did not move; moving it to the subject position where it could receive Nominative Case would overcome the problem. But verbs are generally seen as Accusative Case assigners. Why would a passive verb be different? There are a number of possible answers to this question. We know that when a verb is passivized its argument structure is altered so that it no longer has an external 0-role to assign to its subject. Clearly this change happens in the lexicon before the verb is inserted into the structure. It is possible that at the same time the verb is altered in terms of its Case-assigning abilities: its Accusative Case is absorbed as a result of passivization. Another possibility is that passivization alters the category of the verb to something more like an adjective. This is supported by the fact that passive verbs have a very similar distribution to adjectives:
             (116)  a. John was interrogated. John was tall, b. the interrogated man the tall man
          If adjectives do not assign Case, then the change of category of the verb would explain why passivized verbs do not assign Case. However, this theory does not work under the assumption that adjectives assign Genitive Case, as passive verbs cannot take objects marked as genitive by the preposition of,
             (117)  * It was interrogated of John.
          Jaeggli (1986) proposed that the changes that passive verbs undergo do not take place in the lexicon but are syntactic in nature. He suggested that both the external 0-role and the Case of the verb are assigned in passive structures, but not to their usual places. Instead these elements are assigned to the passive morpheme itself.
             Whatever the actual analysis of passivization, passive verbs follow a robust generalization concerning verbs and Case assignment. Burzio (1986) noted that verbs which failed to assign an external 0-role also fail to assign structural Case, We
4 Movement in Government/Binding Theory 161
        can see this in the case of unaccusative and middle verbs. Again, both of these involve the object obligatorily moving to the subject position:
           (118)  a. Three men arrived.         * It arrived three men.
                  b. The bread cut easily.      * It cut the bread easily.
        Both these constructions fail to have external arguments and, given that the object has to move, they both fail to assign Accusative Case to their objects. Burzio's generalization is quite mysterious as it is not at all clear why the assignment of 0-roles and Case is linked in this manner. However, it does seem to hold true for a number of constructions in a wide variety of languages.
           An interesting observation concerns the difference between structural and inherent Cases in passive contexts. We have said that inherent Cases are assigned in a similar way to 9-roles in that they are assigned to specific arguments and not to structural positions. That Nominative Case is structural is obvious by the fact that whatever argument sits in the subject position, it will get Nominative Case. Thus an object that moves to the subject position or a subject of a lower clause that raises to a subject position will all bear nominative. If we passivize a verb which assigns an inherent Case, however, it does not lose this Case:
           (119)  a. Sie hilft ihm.
                      (she helps him-dat.)
                      She helps him. b. Ihm wird geholfen. (him-dat. was helped) He was helped,
        For this reason it is usually assumed that inherent Case, unlike structural Case, is assigned at D-structure and hence is not affected by movement.
      CASE AND MOVEMENT
          Case is an S-structure module which means that DPs do not have to occupy Case positions at D-structure. If a DP generated in a Caseless position at D-structure remains there, however, the Case Filter will apply at S-structure and rule the structure ungrammatical. If the offending DP moves from its Caseless position to one that is Case-marked, the Case Filter will be satisfied. This analysis applies to a wide number of structures, including:
            subject movement:   [John! will [tj meet his bank manager]).
             passivization: unaccusatives: middles: subject raising:
[The bank manage^ will [be met ta]].
[The bank manager a [arrived h late]].
[The bank manage^ [scared tj easily]].
[John! seems [h to have intimidated the bank manager]].
        All of these movements are A-movements and so we can conclude that Case motivation is a property of this type of movement.
162 4 Movement in GovernhientfBinding Theory
EXERCISE A-movements used to be called NP-movements (DP-movements, under cur-4.10        rent assumptions). Why was this?
            The Case Theory module can now be added to the GB Model we have been constructing over two chapters.
Figure 4.2 Government/Binding Theory (Control Theory added)
         4.5   Binding Theory
         While the principles of Case Theory determine the S-structure positions of DPs, there are other requirements that certain DPs have to satisfy, which have to do with their referential properties rather than with Case. Take, for example, the following sentences:
           (120)  a. John said that Bill admires himself.
                  b.  John said that Bill admires him.
                  c.  He said that Bill admires John.
         The difference between (120a) and (120b) is merely that the object of the embedded clause is expressed as a reflexive pronoun in the first and a personal pronoun in the second, yet the two sentences cannot mean the same thing. In (120a) the reflexive pronoun must be interpreted as referring to Bill and John cannot be its antecedent, whereas in (120b) the personal pronoun cannot refer to Bill but may take John as its antecedent, or alternatively it may refer to someone not mentioned in the sentence. Note, however, that while personal pronouns may have antecedents, in (120c) the pronoun cannot be taken as referring to either John or Bill. What is demonstrated by these examples is that elements with certain referential properties must appear in structural positions defined with respect to their antecedents. The collected set of principles which govern these phenomena is known as Binding Theory. Below we shall introduce these principles and show how they have a regulating role to play with respect to movement.
4 Movement in Government/Binding Theory 163
          4.5.1   Binding Theory and overt categories
          4.5.1.1  Anaphors, pronominals and r-expressions
          Let us start by looking at the referential properties of reflexive pronouns such as himselfthemselves etc. As seen in (120a), a reflexive pronoun takes a nearby antecedent and cannot refer to something that is too far from it. Thus Bill is a possible antecedent but John is not. Furthermore, a reflexive pronoun must have an antecedent and cannot refer directly to something not mentioned in the sentence, as personal pronouns can:
            (121)  a. * Himself is tall, b. He is tall.
          Reflexive pronouns are not the only kind of pronoun which behave like this. Reciprocal pronouns such as each other must also have antecedents, which need to be close by:
            (122)  a. * Each other met.
                    b. The boys said the girls know each other.
          In (122b) the reciprocal pronoun each other must refer to the girls and cannot be taken as referring to the hoys. Elements which have these particular referential properties are known as anaphors.
            Personal pronouns behave differently from anaphors. Not only can they take more distant antecedents, but they do not have to have an antecedent at all:
            (123)  a. George thinks the girls like him. b. He left.
          The only impossible antecedent for a personal pronoun is one which is too close to it:
            (124)  George likes him.
          In sentence (124), him cannot refer to George but must be taken as referring to someone not mentioned in the sentence. Personal pronouns therefore have exactly the opposite referential possibilities to anaphors. Elements which behave like this are known as pronominals.
            Pronouns are not the only elements which have referential properties, however. DPs such as John, that girl and the island just off the coast of France all refer to something in the world. But such DPs have different referential properties from either anaphors or pronominals in that they can never have an antecedent no matter how far away it is:
            (125)  He said she likes Hillary.
164 4 Movement in Government (Binding Theory
         In this sentence, the DP Hillary can be taken as co-referential with neither of the pronominal subjects and must be interpreted as an independent element in the sentence. These elements are called x-expxessions.
         4.5.2.2  The role of indices
         When we speak of reference, we are obviously talking about a semantic phenomenon. Yet it is clear that reference plays a role in determining the distribution of certain DPs, and distribution is a syntactic matter. It is well accepted in generative grammar since Chomsky (1957) that syntax and semantics are independent systems, related only by the semantic component interpreting the syntactic representation. It follows therefore that there must be something syntactically represented which is interpreted as reference. In GB Theory, reference is partially represented by the syntactic device of an index. For example, we have seen that a pronominal cannot refer to something which is too close, but it may take an antecedent which is further from it. These two situations can be represented in the following way:
           (126)  a. John; thinks Billj admires him*.
                   b. * Johnj thinks Billj admires furry
         What the indices represent is not so much the actual reference of an element as whether two elements are co-referential or disjoint in reference. Thus giving John the index i and Bill the index /, we claim that these two elements do not refer to the same entity. More importantly, by giving the pronoun him the index i or j we claim that it is co-referential with any element which bears these indices. Thus, in (126a) the pronoun is co-referential with John and in (126b) it is co-referential with Bill Note that these two examples represent two different sentences with different grammatical statuses: the first one is grammatical, the second is not.
           The use of indices shows how anaphors and pronominals stand in complementary distribution with each other:
           (127)  a. John, thinks Billj admires hin^/himself^
                   b. John; thinks Bill^ admires hirry/himselfj
                   c. John* thinks Billj admires hi mk/himself *k
         Viewing the examples with different indexing as different structures, in all those structures in which the pronominal is grammatical, the anaphor is ungrammatical in the same structural position and vice versa.
           But how do indices come to be part of a structure? One thing is clear: they are not lexically determined: it is not a lexical fact about the pronoun him that it must be co-referential with John, or any other DP for that matter. Moreover, as indices are structural elements which are interpreted semantically, not semantic elements themselves, they should be created along with the structure. In GB Theory it was proposed that indices are freely assigned to DPs, presumably at D-structure. Each indexation gives rise to a different grammatical entity which is then subject to the relevant principles and deemed grammatical or ungrammatical accordingly.
4 Movement in Government/Binding Theory 165
         There is a certain amount of controversy over the use of indices to represent co- and disjoint reference. The main problem is that they are themselves too simple to represent some of the more complex situations. Simple indices can represent the situations in which two DPs refer to exactly the same bit of the world or where they refer to entirely different bits. But it is possible for references to overlap or for one reference to be included in another wider one; as in the following examples:
         (128)  a. English men generally like cricketers, b. I think we should leave.
       In (128a) the set of English men includes some of the set of cricketers and vice versa. Thus these two sets have overlapping references which would not be described if they had either the same indices or different ones. In (128b) the reference of 1 is obviously a part of the reference of we and again this is captured neither by co-indexation nor by disjoint indexation. The issue is, however, whether or not these considerations play a role in determining grammaticality. If they do not, then there is no point in trying to establish complicated systems of indexing to represent them more accurately. The issues are complex and we will not go into them here, as they tend to be ignored in most literature on binding phenomena. But some have argued that such issues should not be ignored and indeed the assumptions of standard Binding Theory founder exactly on this point.
       4.5.1.3  c-command and binding
       Let us now turn to the details concerning the referential behaviour of anaphors, pronominals and r-expressions. These are the foundation for the principles of Binding Theory, the relevant module of the grammar.
         We have already established that anaphors must have close antecedents and that pronominals cannot. However, this needs to be modified in the light of sentences like the following:
         (129)  a. * John/s mother likes himself;, b. John/s mother likes him;.
       The ungrammaticality of (129a) is hard to account for in terms of the distance between the anaphor and its antecedent, as there is little difference in these terms between this structure and the grammatical fohni likes himself. The main difference lies not in the distance between the anaphor and its antecedent, but in the structural position of the antecedent. In the grammatical case the antecedent is in the subject position and in the ungrammatical case it is in the possessor position within the subject. As a result, the structural relationship holding between the anaphor and the antecedent is different, and in GB Theory it is assumed that this Structural relationship is important for the principles of Binding Theory.
         A closer look at these structures readily shows the structural relationship which must hold between an anaphor and its antecedent;
166 4 Movement in Government finding Theory
(130)
The previous chapter introduced the structural relationship of c-command, defined as follows:
   (131) c-command
          An element c-commands its sister and everything dominated by its sister.
From this we can see that the subject DP c-commands the object DP as the subject's sister, the T, dominates the object. However, the possessor DP inside the subject does not c-command the object which is not included in the D', the sister to the possessor. Therefore the condition on anaphors is that they must have close c-commanding antecedents. The grammaticality of (129b) demonstrates again that pronominals behave in a complementary way to anaphors and cannot have close c-commanding antecedents, though they can have close antecedents that do not c-command them.
   It is clear from the above that co-indexation and closeness do not play much of a grammatical role by themselves but that it is the combination of the two that is important. For this reason we define the relationship of binding on the basis of these:
   (132) Binding
          a binds p if and only if:
          (i)  a and p are co-indexed and
          (ii) a c-commands p.
4,5.1 .4 Binding principles
The structural definition of binding given above allows us to establish easily the principles of Binding Theory that govern the behaviour of referential elements. Let us start with anaphors. These must have a close c-commanding antecedent. An antecedent is by definition an element that the anaphor is co-indexed with and so an anaphor must be bound within a certain local domain, stated as a simple principle of Binding Theory:
    (133)  Principle A
           An anaphor must be bound locally.
                                                              4 Movement in Government,/Binding Theory 167
Now consider pronominals. These cannot have a local c-commanding antecedent and therefore they cannot be bound locally. If we define a notion free to mean not bound, then the principle governing pronominals can be stated thus:
   (134)  Principle B
          A pronominal must be free locally.
The fact that this principle is the exact opposite to Principle A, governing the behaviour of anaphors, accounts for the complementary behaviour of the two.
   R-expressions do not have antecedents, locally or not. However, once again, the notion of c-command can be seen as important in determining the allowable relationships between an r-expression and a co-indexed element:
   (135)  a. HiSj own mother distrusts John;, b. * Hej distrusts John*.
The indexation in (135a) is possible because neither the pronominal nor the r-expression John c-commands the other. Therefore there is no binding relationship established here and the structure cannot violate any binding principle. In (135b), on the other hand, the pronominal c-commands and binds the r-expression. R-expressions can then never be bound:
   (136)  Principle C
          An r-expression must be free everywhere.
Now we have proposed three very simple principles governing the behaviour of each of the three types of DP. What remains in doubt is what locally means in Principles A and B.
4.5.2.5 The governing category
From the examples seen so far, the local domain within which the anaphor must be bound and the pronominal must be free could well be taken as the clause which immediately contains the pronoun. However, this definition turns out to be both too broad and too narrow. Apparently not all clauses count as local domains for binding purposes and, what is more, not all local domains are clauses.
   An example of a clause that does not count as a local domain is the following:
   (137)  John; believes [himselfi to be discreet].
There are several noteworthy things concerning the embedded clause, which obviously is not a local domain as the anaphor can take an antecedent outside of it. First, the clause is non-finite. Finite clauses, on the other hand, are always local domains:
   (138)    * Johnj believes [himselfi is discreet].
168 4 Movement in Government/Binding Theory
However, it is not the case that non-finite clauses are never local domains, as seen in the following:
   (139)  * John; believes [Mary to like himself J.
In this case, John cannot be the anaphor's antecedent as it is too far from it, making the non-finite clause a local domain* The difference between (137) and (139) is that in the grammatical case (137) the anaphor is in the subject position. Thus, the non-finite clause does not count as a local domain for its own subject, though it does for its object. Recall that the subject of the non-finite clause is not governed and Case-marked from within the clause, but depends on the exceptional verb for its Case. Thus, unlike the subject of a finite clause and objects in general, the subject of a non-finite clause is governed from outside its clause.
   From the above discussion it seems that it is the presence of a governor that is important for marking the local binding domain for a pronoun. For this reason this local domain is often called the governing category and may, for now, be defined as follows:
   (140)  Governing category
          p is a governing category for a if fi is the smallest clause which contains a and the governor of a.
This definition of a governing category allows us to revise the three principles of the Binding Theory:
   (141)  Principle A: An anaphor must be bound within its governing category. Principle B: A pronominal must be free within its governing category. Principle C: An r-expression must be free everywhere.
An example of a binding domain which is not a clause, and which will force us to update the definition of the governing category in (140), is as follows:
   (142)  * Bill; saw [DP Mary's picture of himself;].
In this example, the anaphor himself cannot take Bill as its antecedent, despite the fact that both are within the same clause. So there must be a relevant binding domain which is smaller than this clause. The most obvious candidate would be the DP. Note, however, that not all DPs count as binding domains:
   (143)  Bill; saw [DPa picture of himself;].
Clearly it is the presence of the possessor that makes the difference and hence this must be included in the definition of the governing category. The possessor of the DP is in many ways like the subject of a sentence: they sit in structurally similar positions - specifier of DP and IP respectively - and the subject is often translated as a possessor in nominalization:
   (144)  a. The enemy destroyed the city.
          b. the enemy's destruction of the city
4 Movement in Government [Binding Theory 169
For this reason, the possessor is often called the subject of the DP. By the EPP, clauses always have subjects, but DPs only have subjects when there is a possessor. A clause will always be a governing category, but only possessed DPs are. It therefore follows that governing categories are defined by the presence of a subject. The definition of the governing category can be updated:
   (145)  Governing category
          p is a governing category for a if p is the smallest constituent with a subject which contains a and the governor of a.
In standard Binding Theory (Chomsky, 1981a), further additions to this definition accounted for more complex data, though we shall not go into these details here. Work on Binding Theory throughout the 1980s tried either to simplify the approach or to rationalize the complexities. For our purposes, however, the above will suffice.
BINDING THEORY
The theory of binding is concerned with the relations, if any, of anaphors and pronominals to their antecedents' (Chomsky, 1982, p. 6).
   Principles of Binding Theory:
      Principle A: An anaphor must be bound within its governing category. Principle B: A pronominal must be free within its governing category. Principle C: An r-expression is free everywhere.
   The governing category:
      P is a governing category for a if p is the smallest constituent with a subject which contains a and the governor of a.
   So in:
      Peter thought himself to be overpaid, the anaphor himself is bound to Peter by Principle A; while in:
      Peter thought him to be overpaid.
   the pronominal him must be bound outside the sentence to someone whose name we don't know; and in:
      Peter thought the managers were overpaid.
   the managers is bound outside the sentence.
170 4 Movement in Government/Binding Theory
EXERCISE It has long been known that certain movement phenomena and certain refer-4.11           ential phenomena are restricted by similar conditions. Thus, while it is not pos-
                sible for an anaphor to refer out of a finite clause, it is also not possible for a DP to move out of a finite clause:
                  *  John* thinks [himself is good looking].
                  *  John* seems [t2 is good looking].
                Furthermore, while an anaphor object cannot refer to the subject of a higher clause, movement cannot take place from the object position to the subject of a higher clause:
                  *  John! believes [Mary to like himself!].
                  *  John! seems [Bill to like h].
                Under the assumptions of Binding Theory, the facts about the reference of the anaphor follow from Principle A. How can we account for the similarity of the restrictions placed on movement?
              4.5.2   Binding Theory and empty categories
              Having outlined the rudiments of Binding Theory as they apply for overt referential DPs, we now turn to the relationship between Binding Theory and movement. This is made possible by the fact that empty categories, including traces left behind by moved elements, like referential DPs, can be seen as having binding properties. In other words, some empty categories behave like anaphors, some like pronominals, and some like r-expressions; as such these classifications are relevant for all DPs, whether overt or covert. In what follows we will go through each empty category to see how it behaves with respect to the binding principles.
             4.5.2.1 DP traces as anaphors
             It had been noted in the 1970s that the reference of certain pronouns and movement were connected phenomena. Note the similarities in the following sets of data:
                (146)  a, John! likes himself.
                       b.  * John! believes Mary likes himselfi.
                       c.  John! believes himselfi to be honest.
                       d.  * John; believes himself is honest.
                (147)  a. Johni was hurt tr
                       b * Johrii was believed Mary likes tit
                       c.  Johnj was believed t; to be strong.
                       d.  * John; was believed likes Mary.
            As we see in (146a) and (147a), a reflexive pronoun in object position can take the subject of its own clause as its antecedent and a DP can move from object
4 Movement in Government/Binding Theory 171
         position to the subject position of its own clause. This contrasts with (146b) and (147b), which show that it is impossible for an object reflexive to refer to the next highest subject and for a DP object to move to this position. The (c) and (d) examples show further similarities: a reflexive subject can refer to the next subject up from a non-finite clause but not a finite one, and a DP can move from the subject of a non-finite clause to the next subject up but not from a finite clause.
            The data in (146) are captured under the Binding Theory presented above. A reflexive pronoun is an anaphor and hence subject to Principle A of the Binding Theory stating that it must be bound in its governing category. For an object, as in (146a) and (146b), the governing category is the minimal clause and as the anaphor is bound within this in the first but not in the second, the first is grammatical and the second ungrammatical. For a subject, as in (146c) and (146d), the governing category differs depending on where the governor is. In a finite clause the governor is the finite inflection of that clause and so this is the governing category. For an exceptional clause, the governor is the exceptional verb, which is outside this clause. Thus the exceptional clause will not be a governing category and its subject can be bound by an element outside it.
            The data in (147) can be captured in exactly the same way through one simple assumption: the trace left behind by an A-movement is an anaphor which is bound by the moved DP. This effectively restricts movement to positions within the governing category of the trace and hence this imposes further limitations on this kind of movement over and above those already placed on it by Bounding Theory.
         4.5.2.2  wh-traces as r-expressions
         If all traces were anaphors, the same restrictions would show up for all movements. Flowever, a wh-element can be moved out of a finite clause from either the subject or the object position:
            (148) a. Whoj did John think [CP t; [IP Mary liked tj]?
                   b. Who; did John think [CP t* [iP t; liked Mary]]?
         Clearly the traces in subject and object positions in the above examples do not behave like anaphors as they are not bound within the finite IP. Indeed, these traces are not bound in the usual way at all. In all the other cases of binding considered so far, the antecedent has been in an A-position. But in (148) the antecedents of these traces sit in the specifier of CP, an A-position. Thus, if these traces are bound at all, they are bound in a very different way to anything else we have come across. Furthermore, the following data demonstrate that such traces are not allowed to be bound from an A-position:
            (149)  a. WhOi [IP tj thinks [iP hej should win]]?
                   b. * WhOj does [IP he; think [IP tj should win]]?
         In (149a) the trace is bound by the wh-element and the pronominal is bound by the trace. As none of these facts violates any principle of the Binding Theory, the sentence is grammatical and is interpreted as meaning 'who is the person X, such that X thinks that X should win?'. In (149b), however, the trace is bound by the pronominal in the subject position. The meaning of this sentence would be identical to the first and, as this meaning is perfectly well formed, we cannot account
172 4 Movement in Government/Binding Theory
         for the unacceptability of this sentence in semantic terms. Given that the pronominal is not bound within its governing category, there is no violation of Principle B and so the ungrammaticality of the sentence must be something to do with the trace. The trace of a wh-element cannot be bound from an A-position, suggesting that these traces behave in many respects like r-expressions.
           TRACES AND BINDING THEORY
           Traces left by A-movements are anaphors and must be bound within the governing category by the moved element. Therefore no A-movement can cross a governing category:
               John! was believed [ta to like Bartok].
              * Johnt was believed [tT liked Bartok].
           Traces left behind by A-movement are r-expressions and therefore cannot be bound by anything in an A-position:
               Whoj did he2 say [tt likes Bartok]?
              * Wh6| did he} say [tj likes Bartok]?
          4.5,23 Pro and the PRO Theorem
          So far we have found empty category equivalents to anaphors and r-expressions. This leads us to expect that there must be an empty pronominal, as indeed appears to be true. The empty subject in pro-drop languages, pro, behaves very much like an empty personal pronoun. This empty category does not have to have an antecedent, though it can have one as long as this is far enough away, as the following Hungarian sentences show:
(150)
a.
b.
Pro el-ment.
    (away-went-3.sing.)
He left.
JanoSi mond-ta hogy pro* el-ment.
(John said-3.sing. that away-went-3.sing.) John said that he left.
This means that there are empty category equivalents to all three types of overt referential elements:
       anaphor      pronominal    r-expression 
overt  reflexive personal pronoun proper noun  
covert A-trace         pro          A-trace    
Figure 4.3 Types of referential elements
4 Movement in GovernmentfBinding Theory 173
   But there appears to be one empty category omitted from this table, namely PRO, which does not fit neatly into any of the boxes. On the one hand, this empty category is a little like an anaphor in that it often must have an antecedent/ in cases of obligatory control. But/ unlike the binding of an anaphor/ the control of PRO is often restricted to a particular controller/ whether a subject or an object depending on the control verb. Furthermore/ in cases of arbitrary control/ PRO has no controller and so appears to be unbound. In the cases where PRO is bound/ the question of whether it is bound within its governing category is also difficult to answer. This is because PRO always sits in ungoverned positions. This part of Control Theory is often called the PRO Theorem. As PRO is ungoverned/ its governing category cannot, be determined since this is defined by the presence of a governor. Thus PRO appears never to have a governing category. This observation allows us to explain the PRO Theorem in terms of Binding Theory. As we have discovered/ pronominals and anaphors have complementary binding properties - one must be locally bound while the other cannot be. Nothing can be both an anaphor and a pronominal at the same time as such an element would be a contradiction: something that must simultaneously be bound and not bound within its governing category! Yet there is one way out of the contradiction that provides a possible analysis for PRO. An element would be able to conform to both Principles A and B of the Binding Theory, if it could do so vacuously. Two contradictory regulations can be upheld if they are always inapplicable. For example/ imagine a religion which decreed both that hats must be worn in church and that hats were not permitted in church. Followers of this religion could uphold both decrees simply by never going to church! In the same way, a linguistic element can satisfy the requirement that it must be bound in its governing category and that it must be free in its governing category if it never has a governing category. Thus/ if PRO is categorized as both an anaphor and a pronominal/ we can explain its non-appearance in a governed position. Were PRO ever to be governed/ it would automatically have a governing category and therefore be forced to conform to contradictory requirements. "Since PRO is a pronominal anaphor/ it is subject to Principles A and B of the binding theory, from which it follows that PRO lacks a governing category and is therefore ungoverned" (Chomsky/ 1982/ p. 21).
   This treatment of PRO allows us to see the elements to which Binding Theory applies in a simpler light. It seems that there are things which are anaphors7 things which are pronominals, things which are both anaphors and pronominals, and things which are neither anaphors nor pronominals (r-expressions). This system requires just two binary features [± anaphor], [± pronominal]. The table below demonstrates how this works out:
          ■f pronominal   -pronominal 
+anaphor                   reflexive  
               PRO          A-trace   
-anaphor personal pronoun proper noun 
               pro          A-trace   
Figure 4.4 Elements of Binding Theory
174 4 Movement in Government (Binding Theory
EXERCISE
4.12
Most of the cells of figure 4.4 are filled with both an overt and a covert example, the one exception being the pronominal anaphor, for which there is only a covert exemplar: PRO. However, it is fairly obvious why there is no overt exemplar of this type of DP. All overt DPs must be assigned Case, according to the Case Filter, and Case is assigned under government. Thus all overt DPs must be governed and so have governing categories. It follows that there can never be an overt pronominal anaphor as such an element would always be a contradiction.
   The following examples all contain empty categories which have a role in their ungrammaticalities. Which of them are ungrammatical due to Binding Theory violations and which are not?
      * Whoj does hisi mother love           expect for PROt to leave.
      * John! seems for tx to like Mary. * Whoj did youj try t to win?
      * John! was promised PRO to leave. * John! thinks Bill wants proj to leave.
      * John! likes proj.                * It! seems makes sense.
4.5.2.4 The place of Binding Theory in the Government (Binding Model: the need for more levels
Where does Binding Theory sit in the general GB Model we have been developing? The fact that Binding Theory has an effect on movement indicates that it must apply after movement has taken place, i.e. at S-structure. Unfortunately, however, the matter is not quite as simple as this as some binding facts seem to be established before movement. Take the following sentence:
   (151)   [Which picture of himselfjj did Johr^ display tj?
Here the wh-phrase including the anaphor has been moved to the specifier of CP at S-structure. In this situation the subject does not c-command the anaphor and therefore should not be able to bind it. Nevertheless the sentence is grammatical. Note that at D-structure, the subject does c-command the anaphor as it is part of the phrase generated in object position at this level. But we cannot assume that Binding Theory takes place at D-structure as otherwise it would not be able to interact with movement in the ways demonstrated above.
   One possible solution to this problem would be to propose that Binding Theory applies at neither D- nor S-structure, but at another level of representation separate from both, at which all the relevant binding relations can be made to hold. The next section shows that there is independent empirical motivation for such an extra level of representation, as well as conceptual arguments. The inclusion of the Binding Theory into the overall GB Model will therefore be delayed until these arguments have been presented.
4 Movement in Government/Binding Theory 175
4.6   Beyond S-structure and the Empty Category Principle
4.6.1 That -trace phenomena, proper government and the Empty Category Principle
Most of the restrictions on movement seen so far seem to be concerned with the locality of movement operations, constraining how far an element can be moved. However, some phenomena, first noted in the 1970s, suggest that it is also important to take into account the position from which a movement is made, the extraction site, for constraining movement. For example, movement from the object position seems to be easier than from subject position:
   (152)  a. Whah did he say (that) Mary wanted tg
          b. Whoj did he say (That) h wanted a beer?
In (152a), as usual, the complementizer introducing a finite clause is optionally present, as represented by the parentheses, when there is a wh-movement from object position. In contrast, when a movement takes place from subject position, the complementizer must be absent. This is represented in (152b) by including an asterisk within the parentheses, stating that in this case the option of having the complementizer is ruled out. The phenomenon was noted first in Chomsky and Lasnik (1977), who termed it the that-trace effect.
   Nothing so far discussed explains this. The two structures are otherwise identical; their different grammatical statuses derive solely from factors affecting the extraction sites of the movements. Of course, in the extraction site of a movement sits a trace, suggesting that the conditions under consideration here concern the licensing of traces in certain positions. It is perhaps no surprise that empty categories should need to be licensed in some way given that their presence is otherwise not marked overtly. That-trace phenomena indicate that traces are licensed in object position fairly freely, whereas they are licensed in subject position only in the absence of a complementizer. GB Theory proposed that empty categories are licensed by a notion of proper government, which we will define shortly, due to a condition known as the Empty Category Principle (ECP):
   (153)  Empty Category Principle
          Traces must be properly governed.
As we have seen on several occasions, the government relationship holds between a head and elements within its local sphere of influence. Proper government was seen as a more restrictive version of government, limited to a set of 'proper governors'. Given that objects appear to be more easily licensed than subjects and also given that one difference between subjects and objects is that the former are governed by a lexical head (the verb) whereas the latter are governed by a functional head (the inflection), the obvious first step is to claim that lexical heads are proper governors but functional heads are not. Consequently the trace in an object position will always be properly governed; movement from object position might be expected to be relatively free, subject to other conditions
176 4 Movement in Government/Binding Theory
            on movements in general. However if this were all there was to say on the matter, given that subjects are not governed by a lexical head, a trace in the subject position should never be properly governed and hence movement from this position would be impossible. But, while movement from subject position is subject to more stringent conditions, it is not impossible and so these restrictions need to be loosened slightly.
              Considering the data in (152b) again, movement from the subject position is possible, and hence the trace is licensed, when there is no complementizer. A closer look at the movement shows more clearly what is happening. Due to the boundedness of movement, the first movement to take place is from the subject position of the embedded clause to the specifier of CP of that clause. From there the wh-element will move to the higher spec CP:
              (154)  [cp Who; did [IP he say [CP t* * (that) [IP t* wanted a beer]]]]?
                          k___________________kb________I
            Given that the intermediate trace faces exactly the same conditions as any trace involved in spec CP to spec CP movement, the conditions placed on this are irrelevant in accounting for that-trace phenomena. Obviously, what we want to say is that the original trace in subject position is properly governed in the absence of the complementizer, but that the complementizer blocks this government when it is present. The complementizer intervenes between the original trace and the intermediate trace and it could be the intermediate trace that acts as the proper governor. If this is so, the set of proper governors includes the set of lexical heads, plus any antecedent. The upshot is that traces in object position will always be properly governed by the lexical head they are the object of, but subjects can only be properly governed by their antecedents. Government by the antecedent will be possible only if the antecedent is relatively close and the relationship with its trace is not interrupted by a closer element, such as the complementizer.
              While there is much else that could be said, this basic account will serve our purposes. Tire next section introduces observations which at first seem to be the exact opposite to the that-trace effect, suggesting that movement from the subject position is sometimes easier than from object position. The attempt to unify these observations under one set of grammatical principles leads us to extend the overall model, as already hinted at in the section on binding above.
EXERCISE      The original account of the that-trace effect suggested by Chomsky and Lasnik
4.13          (1977) was that structures in which the complementizer was immediately
              followed by a trace were filtered out. This is clearly more stipulative than the ECP account of the phenomena, but consider the following observations and decide which of the two approaches is the more empirically accurate:
                        Who does John think that these days should be conscripted?
                 * Who does John think that should be conscripted?
                 Who does John think that these days they should conscript?
                 Who does John think that they should conscript?
4 Movement in Government finding Theory 177
     THE EMPTY CATEGORY PRINCIPLE AND PROPER GOVERNMENT
        The Empty Category Principle (ECP):
        Traces must be properly governed.
        Proper, government:
           a properly governs fJ if and only if:
           (i)  a governs p and
           (ii) a is lexical or an antecedent.
        Gloss:
        Traces are 'licensed' by being governed by either a lexical head or an antecedent. Thus, traces in object position are always licensed as they are always governed by the head they are an object of. Traces in subject position are never head governed and therefore must be antecedent governed. This gives rise to the that-trace effect as antecedent government is blocked by the presence of an overt complementizer.
      4.6.2  Superiority and invisible movement
      In all of the examples of wh-movement looked at so far there has been only one wh-element per clause. However, some clauses can have more than one wh-element, forming multiple wh-questions. In English multiple wh-questions, only one of the wh-elements moves to the front of the clause and all others remain in their D-structure positions:
         (155)  a. [CP Who; [jP tj saw what]]?
               b. [CP HoWj did [IP they travel where tj]?
      Often multiple wh-questions are interpreted as asking for an answer which pairs up possible referents for the wh-elements. For example, (155a) could be answered as follows:
         (156)  John saw a black car draw up, Bill saw a shadowy figure enter the house and Mary saw someone running down the drive.
      This shows that the unmoved wh-element is still interpreted as an interrogative operator and not like the wh-element found in echo questions, which are also unmoved:
         (157)  a. You saw what?
               b. They travelled where?
      In these type of questions the wh-elements are interpreted as slot fillers for information that was either misheard or disbelieved: something far more specific
178 4 Movement in Government/Binding Theory
         than the interpretation of a typical interrogative operator. This is slightly puzzling as it seems that the unmoved wh-elements in multiple questions are interpreted as moved wh-elements/ though they clearly do not undergo the same movement.
           Another interesting thing about multiple wh-questions is that although in principle either wh-element could move, often it is only grammatical to move one of them. For example compare the following:
           (158)  a Who* h saw what?
                  b * What* did who see
         Note in this case, given the choice between moving the subject or the object, it is the subject that moves more easily than the object. This phenomenon is called the superiority effect, as it is the structurally superior (i.e. higher) wh-element that is able to move! At first sight this appears to be exactly the opposite of that-trace effects, where we saw that the subject is more difficult to move. From this point of view it is difficult to think how superiority might be accounted for by the ECP, as the trace of the object is always properly governed by its verb.
           We thus face two puzzles: why should the unmoved wh-element be interpreted as though it were moved in multiple questions; and why should the movement of an object cause an ungrammaticality? The solution to both puzzles lies in the assumption that there is a level of structural representation beyond S-structure and D-structure.
           Let us justify the need for this extra level by considering the difference between the pronunciation of an expression and its semantic interpretation. So far we have assumed two levels of representation. Clearly S-structure is more closely associated with the pronunciation of an expression than D-structure, as the phonetic string reflects the order of elements at the post-movement level. However, S-structure is not a phonetic representation, but a syntactic one, as it contains all sorts of elements that have no bearing on pronunciation, such as tree nodes and branches. Thus, although pronunciation is more closely related to S-structure than to D-structure, we need to suppose a further representation which reflects pronunciation even more closely. This is Phonetic Form, or PF for short, as introduced in chapter 1. This would fit into the grammatical model thus:
           (159)  D-structure S-structure
                      PF
           Now let us consider where semantic interpretation enters the picture. An early idea was that semantic interpretation was more associated with D-structure, as this represents in an obvious way the thematic relations which hold between a predicate and its arguments. However, not all semantic relations are represented at D-structure. For example, many binding relationships are formed after movement rather than before:
           (160)  John* believes [himself to have been promoted tj.
                                                              4 Movement in Government (Binding Theory 179
    In this example, the antecedent John could not bind the anaphor in its D-structure position as only the subject of the non-finite clause is accessible according to the binding principles discussed in the previous section. Therefore the binding relation between John and himself can only be properly established after the movement. Similarly, the interpretation of a wh-element as an interrogative operator is established only after movement, as at D-structure the wh-elements in interrogative and echo question structures are indistinguishable. Therefore we need to assume that semantic interpretation is taken from S-structure. Thematic relationships established at D-structure are still represented at S-structure even after movement as traces keep track of D-structure positions, so this is not problematic for this assumption.
       However, certain problems remain if both semantic and phonetic interpretation are associated with the same syntactic level. For example, consider the following:
       (161)  Every performer sang a song.
    There are several ways in which this sentence can be interpreted. In one, each performer sings a different song. In this case we say that the quantifier every takes wide scope over the indefinite determiner, as semantically we determine the meaning of every performer for each of its possible referents and then determine the referent for a song for each performer. On the other hand, we might interpret (161) as meaning that there was a single song that every performer sang. In this case, the DP a song has wide scope as its reference is determined first. In English, such scope differences tend not to be overtly reflected syntactically and the same S-structure can be interpreted in different ways. We might think therefore that the difference is purely semantic, nothing to do with the structure of the sentence. But this assumption is challenged by a number of observations. The first is that in some languages these semantic differences are indeed reflected grammatically. For example the English sentence equivalent to (161) translates into two different Hungarian sentences depending on the interpretation:
       (162)  a. Minden eldado elenekelt egy dalt.
                  (every performer sang a song)
                  Every performer sang a song, (wide scope on minden eloado) b. Egy dalt elenekelt minden eloado.
                  (a song sang every performer)
                  Every performer sang a song, (wide scope on egy dalt)
    Basically, the quantifier with the widest scope comes first and hence is higher in the structure. Thus scope facts are reflected syntactically in Hungarian.
       The second observation is that even in English there is an interaction between scope interpretation and certain grammatical facts. Consider the difference between the following:
       (163)  a. Some professor believes [every student to have failed], b. Some professor believes [every student has failed].
180  4 Movement in Government/Binding Theory
        Sentence (163a) is ambiguous in a similar way to (161): in.one interpretation there is a single professor who thinks that all the students have failed and in the other for every student there is a professor who thinks he or she has failed. However, (163b) is not ambiguous, and specifically the reading in which every student has wide scope is missing. Thus, the subject of a non-finite clause can have scope over a higher subject, but a subject of a finite clause cannot. It seems therefore that scope phenomena are not solely semantic in nature and that they should be represented syntactically as well. But while in Hungarian scope is obviously represented at S-structure, it is clear that this is not true for English. Therefore we need to propose another level of representation beyond S-structure in which these facts about English can be represented. This level of representation is often called Logical Form, or LF, deriving from S-structure in the same way that S-structure derives from D-structure:
           (164)   D-structure
S-structure
PF      LF
            It has, however, become clear that other features of semantic interpretation having to do with anaphora, scope and the like are represented not at the level of D-structure but rather at some level closer to surface structure, perhaps S-structure or a level of representation derived directly from it - a level sometimes called 'LF' to suggest logical form! (Chomsky, 1986a, p. 67)
         At LF, we can assume that scope facts in English are reflected in a similar way to that in Hungarian, Presumably the quantified DP with the wide scope is moved to a higher position. However, as PF reflects the syntactic arrangement at S-structure, any movement that takes place between S-structure and LF will have no effect on pronunciation and in effect such movements will be invisible.
            One advantage of these assumptions is that certain differences between languages can be accounted for in a rather simple way. Take the difference between Hungarian and English with regard to scope phenomena. In Hungarian, scope facts are overtly represented and therefore the movements involved must take place between D-structure and S-structure. In English, however, the same movements are covert and so must take place between S-structure and LF. The two languages are similar in their use of the same movement processes, but they differ over where these processes take place. We can see other such differences cross-linguistically. Chinese, for example, does not appear to move its wh-elements in interrogatives, yet the grammar of Chinese is not so different from English if both languages make use of wh-movement, only English does so overtly, Chinese covertly.
            We can now return to superiority phenomena. Recall that in multiple wh-questions, the unmoved wh-element is interpreted as though it moves. This can
4 Movement in Government/Binding Theory 181
    now be handled by assuming that the apparently unmoved wh-element undergoes covert movement at LF, just like Chinese wh-elements. Presumably the covert movement is similar to the overt movement in that both move a wh-element to the CP:
       (165)  D-structure:  [CP [Who saw what]]?
              S-structure:  [CP WhOj [t* saw what]]?
              LF:           [CP What- who* [h saw tj]]?
    This also allows superiority to be explained in terms of the ECP, but with respect to the covert movement, not the overt one. Consider the examples in (158) again, along with their respective LFs:
       (166) a. [WhOj [tj saw what]]?         - [CP Whatj who* [t; saw tj]]? b. * [Whatj did [who see tj]]? - [CP WhOj whatj [t; saw tj]]?
    In (166a), both traces are properly governed at LF: the subject's trace is properly governed by its antecedent, as it is at S-structure, and the object's trace is properly governed by the verb. However, in (166b) while the trace of the object is properly governed, as always, the trace of the subject is separated from its antecedent by the object wh-element, which moved to the CP first, at S-structure. In other words, with superiority effects, it is the covert movement of the subject that causes the problem, not the overt movement of the object, and hence this phenomenon is not unlike the that-trace effect.
       It is clear from the above discussion that the ECP applies to LF representations, not at D-structure or S-structure. The model of the grammar can then be extended as follows:
Figure 4.5 Government/Binding Theory (Logical Form and Phonetic Form added)
182 4 Movement in Government [Binding Theory
SUPERIORITY
     Who saw what?
     *What did who see?
   The explanation of this is provided by the assumption that the in situ wh-element undergoes a covert movement at LF to the place of the other wh-element for interpretative reasons. When the subject moves first its trace will be properly governed by its antecedent. A subsequently moved object will leave behind a trace that is properly governed by the verb:
     what2 who1 tt saw t2
   However, when the object is moved first, although its trace is properly gov-erned by the verb, the subsequently moved subject will be unable to govern its trace properly, as the moved object will be in the way:
     whoj what2 h saw t2
4.6.3   Binding Theory revisited
The discussion of Binding Theory was left without reaching a conclusion about the level at which binding applies. There are strong conflicting arguments for it applying both before and after movement. The introduction of LF yields yet another possibility that it applies at this level. Indeed this solves a number of problems. Consider the puzzling data once more:
   (167)  a, [Which picture of himself\ did John display b. [The men]j seem to each other t; to be smart.
In (167a) the anaphor is properly bound in its D-structure position, suggesting that binding principles apply before movement takes place. However, in sentence (167b) the binding of the reciprocal anaphor each other is only achieved after the subject of the lower clause is raised into the higher clause, suggesting that Binding Theory applies after movement. These two apparently contradictory observations can be reconciled if Binding Theory applies at LF and if a further covert movement re-establishes the proper relationships in the case of (167a). This movement must in effect reverse the wh-movement by placing the structure containing the anaphor back into its D-structure position. Considering example (167a) more closely we can see that the overtly moved wh-phrase consists of a wh-determiner which and an NP containing the rest of the phrase. It is possible that the semantic interpretation of this sentence is served if only the determiner is in the specifier of CP but that the NP is moved along with the DP for syntactic
4 Movement in Government,/Binding Theory 183
reasons holding at S-structure. In this case, we can envisage a movement which 'reconstructs' the non-wh part of the DP (i.e. the NP) back into its original position, creating the following LF:
   (168)  [Which tj did John display [picture of himself^?
If Binding Theory applies to this structure, the sentence will be correctly predicted to be grammatical as the antecedent of the anaphor binds it within the local governing category.
   Although this is probably the simplest version of the reconstruction approach to this problem, it does leave us with a trace inside the wh-DP that is not properly governed and so in violation of the ECP. In order to get round this problem some more complicated assumptions would have to be made. These will not be pursued here as it will turn out in subsequent chapters that the problem does not arise under current assumptions.
  LOGICAL FORM
   Logical Form (LF) is another level of syntactic representation like D~structure and S-structure. It is produced from S-structure in the same way that S-structure is produced from D-structure: via movements. However, as Phonetic Form (PF) is taken from S-structure, the movements which form LF are not phonetically realized and have purely semantic repercussions. Languages may differ as to where certain movements take place: at S-structure or at LF. If the former, then the movement will be visible - e.g. wh-movement in English or quantifier movement in Hungarian - and if the latter, then the movement will be invisible - e.g. wh-movement in Chinese or quantifier movement in English. LF is also the level of representation at which reconstruction movements take place, putting back material into a position where it is more readily semantically interpreted from a position it was forced to move to by S-structure requirements.
Discussion topics
1  Movement is seen in GB Theory as a relationship between D- and S-structure. Do you think this works satisfactorily for apparent variants like T gave him a book', T gave a book to him' and (in the speech of one of the authors) 'I gave it him' versus 1 gave him it' or 'Whom did you give it to?' and 'To whom did you give it?'?
2  Given that one goal of the grammar is to achieve explanatory adequacy, what
   does the GB Theory of movement imply for children's acquisition of language?
3  Move a is a very liberal principle essentially allowing anything to move anywhere, Take a simple sentence such as the choir sang a song and attempt to move any element into any other position. Do we know why not all of these movements produce grammatical sentences?
4  The Case Filter states that all DPs must have Case. Is it possible for one DP to
184 4 Movement in Government/Binding Theory
    have more than one Case? Can a DP, for instance, move from one Case position to another?
5  Sometimes we use reflexive pronouns without apparent antecedents, as in:
      Only. John and myself knew the answer.
      What's a good kid like yourself doing in a place like this?
      Address the envelope to yourself. Lies about yourself can be very upsetting.
    Are all of these uses of reflexives problematic for Binding Theory?
6  Some languages assign inherent Case (dative or genitive) to the subjects of certain Verbs and in these cases the object shows up in the nominative. This has been claimed to be another example of Burzio's generalization (p. 161), Can you see why?
7  Suppose that all quantifiers move to the front of the clause by LF. Would the ECP predict possible scope ambiguities between two quantifiers by having either move higher than the other?
8  In Government/Binding Theory, how many principles prevent an element from moving too far? Does the fact that many principles of the grammar appear to have similar functions detract from the elegance of theory?
9  Towards the end of the 1980s Chomsky suggested that certain movements were made only as a last resort. Thus if something didn't have to move it wouldn't. For example, an inflection will not move to a verb if the verb can move to the inflection: English auxiliaries, which are not stuck inside the VP, always move to I. Does this kind of observation fit well with the general conception of Move a?