Authors Response

Modularity, language, and the flexibility of thought

Peter Carruthers

Department of Philosophy, University of Maryland, College Park, MD 20742.

pcarruth@umd.edu

www.philosophy.umd.edu/people/faculty/pcarruthers/

Abstract: The present paper elucidates, elaborates, and defends the main thesis advanced in the target article: namely, that natural-language sentences play a constitutive role in some human thought processes, and that they are responsible for some of the distinctive flexibility of human thinking, serving to integrate the outputs of a variety of conceptual modules. Section R1 clarifies and elaborates this main thesis, responding to a number of objections and misunderstandings. R2 considers three contrasting accounts of the mechanism of inter-modular integration. R3 discusses objections to some of the empirical data supporting my main thesis. R4 criticizes some competing theories of the role of language in cognition. And R5 considers some proposed supplementary cognitive roles which language might play.

R1 Clarifying and elaborating the main thesis

The main thesis of the target article is that natural language sentences, constructed by the language module, serve to integrate and combine the outputs of a variety of central-process conceptual modules, hence enabling some of the flexibility which is distinctive of human cognition. In the present section this thesis will be both clarified and elaborated upon. Only by casting the motivation behind my main thesis in a new light, in fact, and by adding some later developments, can I put myself in position to reply to the detailed criticisms of my commentators.

R1.1 On mental modularity

A number of commentators challenge the assumption of mental modularity which forms the backdrop to my proposal (Bickerton, Dale & Spivey, Hurford & Dessalles, MacWhinney), and Hermer-Vazquez complains of the difficulty of defining what modularity is. Otherwise most are neutral or express some degree of support. Only two, however (Atran, Bryson), identify the primary reason for believing in mental modularity: computational tractability. Let me elaborate on the positive before replying to the negative. Part of the point of the discussion will be to clarify exactly how I think the modularity thesis should be construed.

The most important argument in support of mental modularity derives from Fodor (2000). It goes like this:

(1) The mind is computationally realized.

(2) A-modular, or holistic, processes are computationally intractable.

Premise (1) is the guiding assumption which lies behind all work in computational psychology, hence gaining a degree of inductive support from the successes of the computationalist research program. And many of us believe that this form of psychology represents easily our best hope (perhaps our only hope) for understanding how mental processes can be realized in physical ones (Rey, 1997).

Just about the only people who reject premise (1) are those who endorse an extreme form of distributed connectionism, believing that the brain (or significant portions of it, dedicated to central processes) forms a vast collection of connectionist networks, in which there are no local representations (OBrien & Opie). The successes of the distributed connectionist program have been limited, however, mostly being confined to various forms of pattern-recognition; and there are principled reasons for thinking that such models cannot explain the kinds of structured thinking and one-shot learning of which humans and other animals are manifestly capable (Fodor & McLaughlin, 1990; Fodor & Pylyshyn, 1988; Gallistel, 2000; Horgan & Tienson, 1996; Marcus, 2001). Indeed, even the alleged neurological plausibility of connectionist models is now pretty thoroughly undermined, as more and more discoveries are made concerning localist representation in the brain (e.g. Rizzolatti et al., 2001).

Premise (2), too, is almost universally accepted, and has been since the early days of computational modeling of vision. You only have to begin thinking in engineering terms about how to realize cognitive processes in computational ones to see that the tasks will need to be broken down into separate problems which can be processed separately (and preferably in parallel). And indeed this is, for example, exactly what we find in the organization of the visual system.

What this second premise then does, is to impose on proposed modular systems quite a tight encapsulation constraint. For any processor which had to access the full set of the agents background beliefs (or even a significant sub-set thereof) would be faced with an unmanageable combinatorial explosion. We should therefore expect the mind to consist of a set of processing systems which are not only modular in the sense of being distinct isolable components, but which operate in isolation from most of the information which is available elsewhere in the mind.

I should emphasize that this point concerns, not the range of inputs which are available to modular systems, but rather the processing data-bases which are accessed by those systems in executing their algorithms (Carruthers, 2003; Sperber, 2002). Failure to appreciate this point leads Varley & Siegal to complain that language cant be a module, since it receives inputs from many other modules. While we might expect some modules to receive only a restricted range of inputs, this is by no means a requirement of modularity.[2] And indeed, the practical reasoning module envisaged in the target article is hypothesized to receive inputs from all belief-forming and desire-forming modules (see also R1.3 below). This is quite consistent with its modular status, provided that the information which can be accessed when processing any given set of inputs is strictly limited.

I should also emphasize that modularity is perfectly consistent with an acknowledgement of the importance of attention in organizing the direction of modular resources; and also with a variety of kinds of top/down processing. (Both are found in the operations of the human visual system; Kosslyn, 1994.) Some kinds of attention can be thought of as directing the inputs to a module. And top/down processing can all be internal to a wider modular system. What is important, from a modularist perspective, is just that the processing within the system shouldnt require habitual access to significant quantities of information held elsewhere in the mind.

Modularism is now routinely assumed by just about everyone working in artificial intelligence, in fact (Bryson, 2000; McDermott, 2001). So anyone wishing to deny the thesis of massive modularity is forced to assume a heavy burden. It must be claimed, either that minds arent computationally realized, or that we havent the faintest idea how they can be. And either way, it becomes quite mysterious how minds can exist in a physical universe. (This isnt to say that modularism doesnt have its own problems, of course. The main ones will be discussed briefly in R1.4 below. And one of the major purposes of my target article was to make a start on addressing them.)

Any computational psychologist should predict, then, that the brain will contain a whole host of modular processing systems, connected with one another in a variety of complex ways, and many of which can be decomposed into networks of smaller modules. (See any of the familiar and highly complex wiring diagrams for visual cortex to get the flavor of the kind of thing I have in mind.) And nothing in this account need prevent one sub-module (e.g. for processing faces) from forming a common part of a number of distinct higher-level modules, either (e.g. not only the visual system, but also an emotional system).

So far as I am aware, such a picture is consistent with (and to some degree supported by) everything that we currently know about the organization of the brain. Certainly it is no problem for this sort of modularist account that brain-imaging studies should show dispersed networks of neurons engaged in any given task, to which Bickerton draws our attention. For first, most tasks are likely to involve the activity of a variety of modules, both input and output; and second, many modules may well be composed out of sub-modules which are located in different brain regions. (See, e.g., Baron-Cohen & Ring, 1994, for a tentative brain map of the sub-systems involved in our theory of mind module, or ToMM.)

Nor does the elegant eye-tracking data described by Dale & Spivey count against this sort of modularist account.[3] That language comprehension should be subserved by a complex system involving a language module receiving inputs from both speech and vision and an attentional system for coordinating the two sorts of input (as well as a number of other parts besides, probably) is exactly what a modularist of my stripe might predict (see also Chomsky, 2000; Pietroski, 2003).

Is a modularist position implausible on evolutionary grounds, in the way that both Bickerton and Hurford & Dessalles allege? In fact not. What we know from biology generally is that evolution is apt to operate by constructing specific systems in response to specific adaptive pressures (Tooby & Cosmides, 1992). And often this will happen by co-opting and reconfiguring resources which were antecedently available, in such a way that multi-functionality becomes rife (Hermer-Vazquez). What would really be biologically implausible would be the evolution of a big equipotent brain (given the high costs of sustaining large brain size), or the evolution of an unstructured central processing arena, of the sort to which many anti-modularists are committed (Tooby & Cosmides, 1992).

R1.2 Integration of module outputs without language

A great many commentators object to my thesis that language is the medium of inter-modular integration by pointing out cases where the outputs of different modular systems are integrated without language (Bryson, Chater, Dale & Spivey, Hampton, Hermer-Vazquez, Hurford & Dessalles, Nelson, Robbins, Slobin, Varley & Siegal, Wynn & Coolidge). For example, the shape and space modules in the visual system are integrated without language (Wynn & Coolidge), and the mirror neurons discovered by Rizzolatti et al. (2001) integrate both visual and proprioceptive information (Bryson, Varley & Siegal). But as will be manifest from the account of mental modularity sketched in R1.1, I too believe that modular systems will frequently receive inputs from two or more distinct modules, and hence that integration of modular outputs will frequently occur in the absence of language. Nothing that I said in the target article was intended to deny this. Rather, the thesis that I there advanced assumed a certain sort of overall mental architecture, and was intended to be restricted to just one level of that architecture.

What I envisage is a layer of perceptual / input modules (including language) which generate outputs to be taken as input by a layer of conceptual belief-forming and desire-forming modules. The input modules may bear complex relations to one another (e.g. both hearing and vision feed into language, as noted in R1.1 above), and may themselves decompose into a variety of sub-modules. And the conceptual modules, similarly, will bear complex relations to one another as well as being further decomposable.

The conceptual modules in turn (besides generating memory; see R1.3 below) feed their outputs to a limited-channel practical reasoning system. I envisage that the latter is capable of only a very limited range of inferences (e.g. selecting the currently strongest desire from amongst its inputs, and conditional inference; I presume that it has no capacity to conjoin inputs). And I envisage that it can only draw on a limited data-base of information (e.g. a set of abstract motor-schemas of performable actions and action sequences). So the practical reasoning system, too, deserves to be regarded as a module (Carruthers, forthcoming b).

The thesis that language is the vehicle of inter-modular integration is intended to apply only at the interface between the conceptual belief-forming and desire-forming modules and the practical reasoning system. The claim is that it is those contents (the outputs of the conceptual modules, fit to be taken singly as input by the practical reasoning system) which can only be conjoined with one another for purposes of further inference (the contrast being: for purposes of storage, or memory; see R1.3 below) through the offices of the language faculty. The fact that modular outputs can be integrated at other levels of cognition is simply irrelevant.

The following puzzle arises, however. Given that modular outputs can be integrated at other levels in cognition, how is it that (prior to the evolution of the language faculty) this wasnt the case at the interface between conceptual modules and the practical reasoning system? The answer is that cognitive systems only evolve the powers that they need to, whereas cross-module talk is computationally expensive (Bryson); and it isnt clear that there was any need to integrate module-specific beliefs and desires.

Consider the powers that a practical reasoning module might have in the absence of language. (For a more elaborate and complex account than figures in the following sketch, see Carruthers, forthcoming b.) It takes as input whatever is the currently strongest desire, P. It then initiates a search for beliefs of the form, Q ® P, cueing a search of memory for beliefs of this form and/or keying into action a suite of belief-forming modules to attempt to generate beliefs of this form. When it receives one, it checks its database to see if Q is something for which an existing motor-schema exists. If so, it initiates a search of the contents of current perception to see if the circumstances required to bring about Q are actual (i.e. to see, not only whether Q is something doable, but doable here and now). If so, it goes ahead and does it. If not, it initiates a further search for beliefs of the form, R ® Q, and so on. Perhaps the system also has a simple stopping rule: if you have to go more than n number of conditionals deep, stop and move on to the next strongest desire.

In addition, one thing which immediately-ancestral practical reasoning systems would have had, surely, is the power to initiate episodes of mental rehearsal. Once some sort of initial plan has been hit upon (I want P, if I do Q Ill get P; Q is something I can do right now), there would be considerable value in feeding the supposition that I do Q back through the various central modular systems once again as input, to see if such an action would have other as-yet-unforeseen consequences, whether beneficial or harmful. This doesnt seem to require anything a-modular to be built into the system yet; for both the desire for P, and the belief that if Q then P, can be the product of individual modules.[4]^,
[5]

The point is that earlier hominid species, armed with some such practical reasoning module and a bunch of powerful belief-forming and desire-generating modules, would already have been pretty smart. They could have generated flexible and appropriate plans of action, with the different elements of the plan drawn from different conceptual-modular domains. I thus agree with MacWhinney that our hominid ancestors would have had sophisticated planning abilities prior to the evolution of language. But in contrast with him, I dont think that they would have needed to be capable of integrating the outputs of conceptual modules in order for this to be possible. For it simply isnt clear why they would have needed to do cross-modular integration of contents. And on my account this only became possible as a by-product once a language faculty had evolved for purposes of communication, given that such a faculty would have been set up in such a way as to receive, and to combine, contents from each of the conceptual modules.

R1.3 Mental modularity and domain-integrating concepts

As Robbins rightly notices, there is a sense in which my thesis is only intended to apply at the level of process or inference, not at the level of stored content. (Indeed, modularism, too, is a thesis about processing and not about storage. Modules are learning mechanisms, charged with generating information and appropriate goals. Whether memory systems are also fractionated along modular lines, or whether their patterns of organization need to follow the same fault-lines, is quite another matter. See below.) So the fact that there exist domain-integrating concepts, in the way that both Hurford & Dessalles and Nelson argue, poses no special threat to that thesis.

For example, it is plausible that pre-linguistic representations of individual people should be organized in folders, with some sort of identifying label (perhaps a recognition schema), and with information about the individual drawn from a variety of modular domains (physical, biological, psychological, social, and so forth) contained within the folder, and so in one sense conjoined. But I would claim that in order to do anything much with the information in a folder in the absence of language (in order to draw further inferences from it of a non-practical sort) one would have to take each item of domain-specific information and process it separately within the appropriate conceptual module.

Similarly, then, for the pre-linguistic event-concepts which combine information from various domains, discussed by Nelson: I can allow their existence, consistently with my thesis, by claiming that the drawing of inferences from such concepts in the absence of language will always be a piecemeal and module-specific affair.

What exactly a modularist should say about any given domain-integrating concept will depend upon the details of the case, of course. In addition to the possibility just canvassed in connection with concepts of individuals and of event-types, there are at least two further possibilities. Each of these can be illustrated through discussion of the abstract concept three, raised as an example against me by Hurford & Dessalles. Let me elaborate.

The concept three can of course be applied across all domains, wherever there are individuals to be numbered. So one can have three rocks, three cats, and three thoughts. But on some views that concept might still be proprietary to a more abstract numerical module (Butterworth, 1999), whose operation is dependent upon classifications effected by other modular systems.[6]

An alternative view would be that the abstract and exact numerical concept three is dependent upon language (Spelke & Tsivkin, 2001). All that would be available in pre-linguistic infants and monkeys, underpinning the behavior which seems to approximate to possession of that concept, would be the object-file system. This can hold in working memory a representation of three separate concrete individuals, enabling subjects to track the movements and absences of those individuals (Carey, 2001).

R1.4 Mental modularity and distinctively human thought

The central challenge for any strongly modularist account of human cognition, of course, is to account for the distinctive flexibility and creativity of the human mind, which seems to mark us off so sharply from all other species of creature on earth (past or present) in the way that Bickerton emphasizes. The main thesis of the target article is intended as a start on this problem: it is cross-modular contents formulated in language and cycles of processing activity involving language in inner speech which is supposed to account for some of that flexibility. But this was, of course, only a start, and cannot by itself be the whole story. Let me briefly mention two other components.

One (not mentioned in the target article) is a supposition-generator or supposer (Nichols & Stich, 2000), which in my view is built on the back of the language faculty, picking up on weak associations and analogies to generate novel sentences for consideration. It is this which is responsible for the distinctive creativity of human thinking, on my account. And I claim that it is the function of the species-specific disposition to engage in pretend play in childhood to build and hone the activity of this supposition-generator (Carruthers, 2002b).

At this point I differ from Bickerton (and also, I think, from Chomsky, 1988). He seems to believe that the human language-faculty by itself is sufficient to explain our disposition to engage in creative thought. But this cant be quite right. It is one thing to be capable, by virtue of the combinatorial powers of ones language, of entertaining novel and creative sentences. With this I agree. And it is quite another thing to have any disposition to generate such sentences. This is what the supposer confers on us, on my account.

The other major element in an account of human creative powers is some sort of faculty of abductive inference, or faculty of inference to the best explanation, which can receive these novel sentences and evaluate their worth. This was mentioned briefly in the target article, where I gave the misleading impression that I thought it would be embodied in a separate module. And to this Dale & Spivey rightly object that a faculty of abductive inference would seem to be the archetype of a radically a-modular, or holistic, system (Fodor, 1983).

In fact my hope is that a kind of virtual faculty of abductive inference can be built out of elements already present in module-based cognition. For details see Carruthers (forthcoming a). But very roughly the idea is that principles of simplicity, relevance and coherence with existing belief which are already employed in the interpretation of speech and the evaluation of speech-dependent testimony from other people, would become a faculty of inference to the best explanation when directed inwards, in cycles of inner speech, giving rise to a preference for simple but fecund theories constrained by background belief.

The overall goal of the research project (of which the target article is one component) is to try to explain distinctively-human cognition by seeing it as built out of modular components. And the motivation for adopting that goal, in turn, is to make it possible for us to see human cognitive processes as computationally realized, given that computational processes have to be encapsulated if they are to be tractable.

R1.5 LF and logical inference

The hypothesis tentatively advanced in the target article is that it is linguistic representations of a certain specific kind which serve as the medium of inter-modular integration. These are, namely, representations at the level of logical form, or LF (Chomsky, 1995; Hornstein, 1995). In the present sub-section I respond to a number of different objections to this idea. But first let me stress that such a claim isnt strictly essential to the main thesis of the target article, which is just that representations in a modular language faculty (whatever form they might take) serve to integrate contents across conceptual modules.

Dale & Spivey challenge the very existence of LF, or at least its psychological reality. But in doing so they present a challenge to a significant portion of contemporary linguistics. I take my stand with the latter. For given Chomskys current construal of LF, in which LF and PF (phonological form) are the only linguistic levels (Chomsky, 1995), to deny LF is tantamount to denying abstract syntax.

In fact, evidence of the psychological reality of LF is legion. For example, not only adults but children as young as three will only accept the contraction of want to to wanna in those cases (statistically by far the most common) where there isnt a covert wh-trace between the verb and the preposition (Pietroski & Crain, 2001; Thornton, 1996). Thus subjects will accept the contraction of [1] to [2], but not of [3] to [4].

[1] I dont want to make breakfast for John.

[2] I dont wanna make breakfast for John.

[3] Who does John want to make breakfast?

[4] *Who does John wanna make breakfast?

This is because the LF representation of [3] contains a covert trace of the subject of make breakfast, as in: Who_t does John want t to make breakfast?

Ironically, indeed, the very data cited by Dale & Spivey in their attack on LF, concerning the partial and incomplete nature of semantic processing (Ferreira et al., 2002; Sanford & Sturt, 2002) are fully in accord with Chomskys own thinking (Chomsky, 2000; Pietroski, 2003). According to Chomsky, the semantic contribution of the language faculty and of LF stops short of a full specification of truth-conditions. Thus the full LF representation of the sentence, He saw the book in the window and bought it would leave open whether it was the very token book seen which was bought, or just an instance of the same type. It is then supposed to be the work of other faculties of the mind to resolve such ambiguities. And presumably such resolutions are only sought when needed, just as Sanford & Sturt (2002) suggest.

Molina wonders why I chose LF to play the integrative role, and doubts whether it is plausible to say that LF representations could constitute the locus of thought (where thinking is really happening). But of course I dont believe that only LF thoughts are real! I believe that the individual outputs of the conceptual modules, prior to integration in LF, are also thoughts; and that thinking is going on within each of those modular systems, and also in the practical reasoning module. LF plays a constitutive role in only a very limited range of thoughts, in my view; viz. those which combine together the outputs of conceptual modules for further processing.[7]

But why LF? Partly because my thesis is that it is the natural language faculty which does the integrative work. And partly because, according to our best theories, LF is the place where the language faculty interfaces with other systems in such a way as to fix external reference, and these systems would presumably include the various conceptual modules whose outputs stand in need of integration.

Chater wonders whether there is really any difference between my views concerning the role of LF and the traditional thesis that there is a language of thought (LoT, or Mentalese) over which logical and other forms of inference are defined. And he wonders whether the difference between my view and that of, say, Fodor (1975) might be merely terminological.

One difference is that LF representations will contain lexical items drawn from a specific natural language (and might also contain some language-specific structural elements; see the discussion in R5.6 below). Whereas LoT sentences, in contrast, are supposed to be in some sort of universal brain language. Put differently, the point is that the LF representations of a speaker of English will contain lexical items drawn from that very language; whereas LoT sentences wont (in general) do so (except in the case where a subject is thinking about English sentences).

Another difference between LF and LoT is that I certainly dont envisage that there is any kind of global logical capability operating over sentences of LF. A language module should, of course, be capable of certain limited forms of inference defined over LFs (see below). But such inferences need not be manifestations of any sort of general-purpose inference engine. In fact, I very much doubt whether humans have a global logical faculty (Evans & Over, 1996; Gigerenzer & Todd, 2000). Rather, a set of heuristics and implicit procedures (together with the limited inferential powers of the language faculty) could collectively provide a sufficiently good approximation of logical ability.

Of course what we do also have, if dual process theories of cognition are correct (Evans & Over, 1996; Frankish, forthcoming; Stanovich, 1999; and see R.4.4 below), is the capacity to form explicit beliefs about which forms of inference are and arent valid, and to train ourselves to follow only valid patterns in our explicit thinking. But this capacity, in my view (while also involving natural language sentences as vehicles of thought), supervenes upon the more basic role of LF in integrating domain-specific thoughts.

Robbins correctly notices that the integrative role of the language module, on my account, depends upon it having the capacity for certain kinds of inference. Specifically, it must be capable of taking two LF sentences, constructed from the outputs of two distinct conceptual modules, and of combining them appropriately into a single LF representation. He claims that such a view is highly implausible, however, and that it puts me into conflict with the views of most contemporary linguists. But: not so.

Robbins, like Chater, mistakenly thinks that what is in question is the existence of a general-purpose inference engine located within the language faculty. For indeed, such a claim is not only intrinsically unbelievable, but certainly wouldnt be believed by any working linguist. However, just about every linguist does think (with the possible exception of Fodor and those closely influenced by Fodor) that some inferences are valid, and are known to be valid, purely in virtue of our linguistic competence.

Most linguists believe, for example, that the inference from, John ran quickly to, John ran is endorsed in virtue of semantic competence (Parsons, 1990); and many would claim that such competence is embodied in the language faculty. In contemporary parlance, this amounts to saying that the inference is made by the language module transforming sentences of LF. Indeed, it is worth noting that one way in which such inferences might be handled is in terms of covert quantification over events and conjunction reduction (Pietroski, 2004). Thus on this sort of account, John ran quickly really has the form, $e (e is a running & e is by John & e is quick). And then the inference to, John ran is just an instance of conjunction elimination.

In conclusion, then, so far from being implausible, the idea that certain limited forms of inference (notably conjunction introduction and conjunction elimination, among others) are handled internally within the language faculty, in transformations of LF sentences, is pretty much the orthodox view in contemporary linguistics.

R2 Mechanisms of modular integration

In this section three distinct proposals made by commentators will be discussed, concerning the mechanisms for integrating central-modular domain-specific contents.

R2.1 The relative pronoun

Bermudez appears to endorse both the spirit and the letter of my proposal, but offers a friendly elaboration. Instead of saying merely that it is sentences of LF which serve to integrate modular outputs, why not say that it is the availability of the relative pronoun, in particular, which makes this possible? I would be happy to accept such an amendment if I thought that it was well-motivated. But I cant see that it is.

Consider the simplest possible example arising from the Hermer-Vazquez et al. (1999) re-orientation experiments: suppose that the sentences to be conjoined are, The toy is near a long wall and, The toy is near the blue wall (in a case where the blue wall is also a long wall). My own proposal is just that, with the referent of wall indexed as the same, LF structures entitle us to multiple-embed the predicates long and blue to form the sentence, The toy is near the long blue wall.

Bermudezs proposal amounts to the claim that the first of the above sentences really has the form, The toy is near a wall that is long and the second has the form, The toy is near the wall that is blue; and then (since the referents of wall are indexed as the same), we are entitled to move to the single sentence, The toy is near the wall that is long and that is blue. In effect, the proposal seems to be that all complex predicates are relative clauses. But I doubt whether many linguists would consider such a suggestion worth pursuing. For what could be wrong with allowing blue and long to attach to a noun-phrase directly, without the formation of a relative clause?

Bermudezs worry (on my behalf) would seem to be this: if the predicates blue and long express concepts drawn from different modular domains, then there is nothing to guarantee that it will even make sense to attach them to objects drawn from the other domain. And the abstract apparatus of relative clauses is somehow supposed to help with this. I dont really see how the story here is supposed to go, in fact. But in any case the original worry is ill-motivated, and betrays a misunderstanding of the sense in which conceptual modules are domain specific.

Many modules can process information concerning the same object or situation; and so far as I can see, this will be the general case. But then we automatically have a guarantee that predicates produced by the one module will apply to the objects of the other. This should have been manifest from the original example. The thoughts produced by the geometry module and the landmark module both concern the location of the toy; and it is the toy which is the subject of both resulting sentences. Equally, one and the same wall can both be long (spatial module) and colored (color module).

The problem to which the language faculty provides a solution, in fact, isnt that of generating abstract concepts which can apply across modular boundaries (for this already comes for free with the modules themselves); it is rather to provide an abstract framework within which concepts of the different kinds can be combined into a single thought for purposes of inference.

R2.2 Metaphor and structure-mapping

Bryant & Gibbs and Dominey each propose that the human cognitive system contains a pre-linguistic capacity to map one domain onto another, giving rise to metaphorical conceptual schemes. For example, the domain of debate and argument can be mapped onto the domain of warfare, giving rise to an elaborate system of concepts summarized by the slogan, Argument is war (Lakoff & Johnson, 1980). And it is suggested that such conceptual schemes are non-linguistic in character, receiving expression in metaphorical language, rather than being reflections of it. So on such an account the basic mechanism of inter-modular integration will be this domain-general structure-mapping capability, rather than language.

The fact that there exists a correlation between cross-domain mapping and metaphorical language is one thing, however; the question of the direction of causation here is quite another. It might be possible to claim that such conceptual mappings begin initially with the integrative functions of language. Or so, at least, I shall argue.

For these purposes it is important to distinguish between live metaphors and dead ones. Live metaphors are fresh creations and strike us (if they are effective) with the force of insight; dead metaphors are once-live metaphors which have been used so commonly that it takes hardly any effort to process them, and their metaphorical status is barely discerned. For notice that the metaphorical conceptual systems studied by Lakoff & Johnson (1980, 1999) and others are dead ones, enshrined in common patterns of speech. If someone says, for example, Dont worry; Im on top of the situation we probably wont even notice that a metaphor has been used.

Recall from R1.4 above that my account presupposes the existence of a language-dependent supposer. It might work somewhat as follows. Sometimes when two previously unconnected concepts A and B are co-activated (albeit one of them weakly) by the same stimulus, situation, or thought, then this leads to the supposition that A is B being explicitly formulated. This supposition is then fed as input to the various (mostly modular) reasoning systems, and the results evaluated. At the outset of an episode of pretend play in childhood, for example, the sight of a curved banana might also activate the concept telephone (because of the bananas similarity in shape to a telephone handset). The child then entertains the supposition, The banana is a telephone. From this supposition various inferences can be drawn. If the banana is a telephone, then you can call grandma on it, and it will have buttons down its middle section for dialing. The child can then access these consequences, pretend to dial, and subsequently begin a pretend conversation with grandma.[8]

Generalizing from this example, then, one might propose that a live metaphor is first created when some sort of structural correspondence or similarity between two phenomena leads to at least weak co-activation of the concepts applying to each. The resulting metaphorical thought is formulated in language by the suppositional system, and fed back to the various modular inferential systems. Consequences which are evaluated as true or useful are retained, and the rest discarded. (For example, the playing child is likely to discard from its game the inference, If the banana is a telephone, then it wont squish if I squeeze it.) This account presupposes no domain-integrating capacity beyond the language-dependent supposer.

In terms of such an account it is easy enough to see how culturally-entrenched dead metaphors might become established. Wherever two domains are structurally alike in some significant respect, this is likely to lead to weak co-activation of the corresponding concepts, and hence people will sometimes entertain the supposition that the one domain (or an aspect of the one domain) is the other. If there are enough structural similarities between the two domains for the one to serve as a viable partial model of the other (as in the case of, Argument is war) then the corresponding modes of speech are likely to become entrenched. Hence we now talk of winning an argument and attacking someones premises without giving it a second thought.

In conclusion of this sub-section, then, I claim that one can explain the deeply metaphorical character of much of our language and speech consistently with my main hypothesis, without having to suppose that people possess some sort of language-independent domain-general structure-mapping capability.

R2.3 Meta-module or language module?

Atran presents the following important challenge to my account: Why should we think that it is language which does inter-modular integration, rather than some sort of meta-representational faculty (or a theory of mind mechanism, or ToMM)? Atran agrees with the modularist framework adopted in my paper; and he agrees that domain-general flexible cognition somehow has to be built out of modular components. But he sees nothing yet to discriminate between my proposal and the idea previously advanced by Sperber (1996) that ToMM is the module which has the power to combine together the outputs of all the others.

This alternative proposal is worth replying to at some length. So consider, then, the competitor hypothesis that it is ToMM which has access to the outputs of all the various central / conceptual modules, and which also has the power to combine those contents together to form inter-modular thoughts.

One difficulty for such an hypothesis is that there is evidence that ToMM doesnt, in fact, have any such access. Instead of automatically receiving as input the various beliefs which influence our own behavior (which would then make self-explanation relatively trivial), there is considerable evidence that what ToMM does instead is engage in self-interpretation, ascribing thoughts and goals to oneself to explain manifest behavior in something very much like the way that thoughts and goals are ascribed to other people (Gopnik, 1993; Gazzaniga, 1998).

This emerges most clearly in those many cases where it can be demonstrated that peoples attributions of thoughts to themselves are really instances of confabulation, rather than mere reporting of contents to which ToMM has some sort of direct access (Gazzaniga, 1998; Gopnik, 1993; Nisbett & Ross, 1980; Nisbett & Wilson, 1977; Wilson, 1985; Wilson, 2002; Wilson & Stone, 1985; Wilson et al., 1981). Indeed, a strong case can be made that the only circumstances in which ToMM has non-inferential access to a subjects own thoughts is where those thoughts are linguistically expressed in inner speech (Carruthers, 1998b; see also the discussion in R5.2 below).

Conclusion: if ToMM doesnt have access to the subjects beliefs (i.e. the outputs of the subjects belief-forming conceptual modules), then it cannot, of course, be ToMM which serves to bind together and integrate the contents of those beliefs on a regular basis. In contrast, we know that the language faculty must have access to (a significant subset of) the subjects beliefs, in order that those beliefs should be expressible in speech.

Another difficulty for the Sperber / Atran proposal (even if we set aside the first) is to explain why any such meta-representational inter-modular architecture should have evolved. For the main business of ToMM is presumably to predict and explain the behavior of conspecifics. This capacity would only require the construction of inter-modular thoughts if others were already entertaining such thoughts. Otherwise attributions of domain-specific thoughts alone would do perfectly well. In contrast, in the case of language the demands of swift and efficient communication would have created a significant selection pressure for inter-modular integration, allowing the outputs of distinct central modules concerning the same object or event to be combined into a single spoken sentence. So instead of saying separately, The object is near a long wall and, The object is near the blue wall one can say much more succinctly, The object is near the long blue wall.

It might be replied that the pressure for ToMM to integrate contents across modules could have come, not from the demands of predicting and explaining the behavior of oneself and others, but rather from the benefits which such integration can bring for other areas of activity (such as solving the re-orientation problem). This is possible. After all, it is common enough in biology for a system initially selected for one purpose to be co-opted and used for another. And it might be claimed that ToMM would be ideally placed to play the integrative role, taking inputs from all the other central modules (allegedly anyway; see above) and providing outputs to practical reasoning.

But actually, it is hard to see how one could get from here to anything resembling the full flexibility of human cognition. For there is no reason to suppose that ToMM would have been set up so as to provide its output as input to the other central modules. In which case there would be no scope for cycles of reasoning activity, with ToMM combining outputs from central modules and then harnessing their resources for further reasoning. In contrast, since language is both an output and an input module, it is well positioned for just this role, as I argued at length in the target article.

What of the experimental evidence? Atran attributes to me the claim that dual-task data of the sort collected by Hermer-Vazquez et al. (1999) cannot discriminate between the idea that it is language, on the one hand, or ToMM, on the other, which performs the integrative function across modules. But this is a misreading of the article. What I said, rather, is that dual-task studies cannot easily be used to test whether it is language which plays the role of integrating the outputs of ToMM and other modular systems, because of the likelihood that ToMM itself sometimes accesses and deploys the representations of the language faculty in the course of its normal operation. For then any task which disrupts language is likely to disrupt ToMM as well. But this is by no means the same as saying that dual-task studies cant discriminate between ToMM and language as candidates for playing the integrative role in respect of other modular systems (i.e. systems other than language and ToMM).

Indeed, the dual-task data collected by Spelke and colleagues seem to me to support my own proposal quite strongly, when matched against the Sperber / Atran one. For if subjects fail at the task in the speech-shadowing condition (but not in the rhythm-shadowing condition) because they cannot then integrate geometrical and landmark information, it is very hard to see how it can be a disruption to ToMM which is responsible for this failure. For there is no reason to think that shadowing of speech should involve the resources of ToMM.

Admittedly, a good case can be made for saying that normal speech comprehension and production will implicate a sub-system of ToMM, at least, where that sub-system is charged with figuring out the conditions for relevance in communication (Sperber & Wilson, 2002). So normal speaking and comprehending would surely disrupt any other ToMM-involving task (and so any task requiring inter-modular integration, on the hypothesis that it is ToMM which performs this role). But there is no reason to think that this should be true of speech shadowing. For in this case there is no requirement of comprehension, and no attempt at communication. (See also R3.2 below.)

In conclusion of this sub-section, then, I claim (on theoretical, evolutionary, and experimental grounds) that language is a much more plausible candidate for integrating the contents of other central modules than is ToMM.

R3 The re-orientation data reconsidered

Several commentators raise doubts about the re-orientation data collected by Spelke and her colleagues (Hermer & Spelke, 1996; Hermer-Vazquez et al., 1999). Responding to these doubts is the task of the present section.

R3.1 Language as an aid to memory

Bryson and Clark each provide very similar alternative explanations of the Spelke data, albeit from quite different theoretical perspectives (Bryson is a modularist, Clark a connectionist). Their idea is that linguistic terms might serve, not as vehicles of inter-modular integration, but rather as aids to memory, helping us to focus our attention in tasks which we would otherwise find difficult.

What this suggestion overlooks, however, is the datum that the only predictor of success in the re-orientation tasks which could be discovered amongst young children, was productive use of left and right (Hermer & Spelke, 1996). For we already know that pre-linguistic children and rats rely on geometric information in attempting to solve these tasks. So their problem is not to recover the spatial relationships involved when they have been disoriented: we know that they can already make use of those. Rather, their problem is to combine spatial information with landmark (e.g. color) information.

Since it looks like it is colors which children have trouble remembering in these circumstances, a memory-enhancement theory should predict that it would be the productive use of color-vocabulary which would be the best indicator of success. But this isnt what we find. What enables children eventually to succeed in the task isnt some way of remembering the relevance of color (memory-enhancement), but rather a way of combining into a single explicit goal both items of relevant information (i.e. module-integration).

But then why, on the module-integrating account, is it spatial vocabulary which is crucial? Shouldnt color vocabulary be just as important? Indeed so; but the fact is that color vocabulary is much more easily acquired. So the crucial water-shed, at which both types of lexical item become available together for the first time, is the acquisition of left and right.

Wynn & Coolidge put forward a related counter-proposal for explaining the Spelke data. Relying upon Baddeleys (1986) model of the working-memory system as being made up of a central executive and two slave-systems (the phonological loop and the visuo-spatial sketch-pad), they suggest that the reason why performance is disrupted in the speech-shadowing task might be because of interference with the central executive component of working memory. And since the central executive isnt supposed to be language-dependent, this would mean that integration of contents across domains wouldnt be dependent upon language.

Without a lot more detail, however, this proposal doesnt look well-motivated. For Hermer-Vazquez et al. (1999) go to great lengths to argue that the rhythm-shadowing task is just as demanding of the resources of working memory as is the speech-shadowing one. So Wynn & Coolidge need to show us how there is something about the speech-shadowing task which is especially demanding of the resources of the central executive in particular.

R3.2 A contrast otherwise explained?

Samuels raises an objection which, if it were successful, might provide just the materials which Wynn & Coolidge need.[9] He points out that on orthodox theories of speech production, two additional kinds of cognitive resource are involved, neither of which would be required for the rhythm-shadowing task. First, extra-linguistic concepts have to be recruited and combined together into a representation of the message to be communicated; and second, communicative intentions have to be formed. In which case speech-shadowing might differentially disrupt performance because forming communicative intentions places greater demands on working memory than do mere motor intentions, and/or because the resources used for combining together concepts are otherwise occupied by shadowing, not because language per se is the medium of inter-modular integration.

Unfortunately for Samuels proposal, however, there is no reason to think that speech shadowing utilizes the same cognitive resources as does normal speech production; and there is good reason to think that it doesnt. For first, when shadowing speech people dont have any intention to communicate; so no communicative intentions need to be formed. And second, since there isnt any message to be communicated, either, there is no reason to think that extra-linguistic concepts would have to be combined together with one another.

It looks as if speech-shadowing should only need to utilize some sort of fast-track loop between phoneme perception and phoneme production, indeed, by-passing the conceptual systems altogether. And the speed of speech shadowing might seem to suggest that this is in fact the case. Speech can be shadowed with latencies as small as 250 milliseconds, with subjects beginning to pronounce each phoneme as soon as it has been perceived. Indeed, these were amongst the data used by Fodor (1983) to argue for the modularity of the language faculty, suggesting that such shadowing can be carried out independently of conceptual thought.

Admittedly, there is evidence from instances of speech repair that shadowers are sometimes influenced by syntactic and semantic context, in cases where they are shadowing speech which contains errors (or at any rate, this occurs when they are concentrating exclusively on the shadowing task; Marslen-Wilson, 1975).[10] But given the speed with which repair is conducted, it seems highly likely that the syntactic and logical forms implicated are all of the narrow sort which are internal to the language faculty (see R1.5 above). In which case it is unlikely that conceptual systems external to the language faculty are being accessed, of the sort that (on the orthodox account) would be needed to solve the re-orientation task.

R3.3 Do fish integrate across modules?

Vallortigara & Sovrano remind us of the fact that a range of species other than humans and rats (specifically, monkeys, chickens, and fish) are able to solve the re-orientation task in the absence of language. Many have drawn from this fact the conclusion that integration of information across the relevant modules cant, therefore, be dependent upon language.

In the target article I pointed out that this conclusion is too hasty, since sequential use of domain-specific information would also be sufficient to solve the task. I suggested that the practical reasoning system in fish, say, might be set up in such a way as to embody the rule, When disorientated, look for landmark information first [e.g. red wall], and then geometric information second [e.g. long wall on left]. This would generate the correct solution. In human children and rats, in contrast, the rule would appear to be, When disorientated, look for geometric information, period.

Vallortigara & Sovrano acknowledge that this is an abstract possibility; but they complain that it doesnt appear to be empirically testable. And they conclude that in such circumstances my proposal would be a supposed difference [that] would not make any difference. But this is far too strong. Even if one thought that real theoretical differences have to be testable in principle, it is quite another matter to insist that they be testable now. And scientists frequently articulate genuine theoretical contrasts which no one can see how to test at the time.

What I do claim on behalf of my sequential-ordering proposal is that it is theoretically well-motivated. First, task-analysis of what it is that a practical reasoning system has to be able to do, suggests just such a picture (articulated briefly in R1.2). Given a goal, the practical reasoning system has to be able to call up a sequence of conditional beliefs which will guide the organism towards achieving that goal.

Second, consider the significance of the fact that human children and rats are perfectly capable of utilizing both geometric and landmark information in other circumstances. If the use of two forms of information in the same task means that they are routinely integrated, in the way that Vallortigara & Sovranos interpretation of their data implies, then it would be puzzling why such integration should suddenly become impossible under conditions of disorientation. If we suppose that the information is only drawn upon sequentially, however, then it makes perfectly good sense that some organisms might have an innate preference for just one of these kinds of information when disorientated.

R3.4 Success in a different task

Varley & Siegal describe a piloted experiment in which aphasic patients were tested on a version of the above task, but with a scale model of a rectangular disorientation-room being spun in front of the patient, rather than the patient being disoriented in such a room. And they report that all subjects were capable of solving the task, utilizing both geometric and landmark information in the absence of language.

Unfortunately, this is just a different task, and no conclusions can be drawn. There is no reason to think that subjects become disoriented in such circumstances. And there are no data from rats or young children to suggest that the task should be a difficult one. As I remarked in R3.3 above, we already know that rats and young children without language can utilize both geometric and landmark information when not disorientated; and I have just argued that sequential use of such information might be what enables fish and chickens to solve the re-orientation tasks. That may be how the aphasic patients are solving the spinning-model tasks, too.

R4 Competing claims for the involvement of language in thought

A number of commentators advance claims for the positive involvement of language in thought which are inconsistent with the main theoretical idea (central-module-integration) defended in the target paper. These will be discussed in the present section.

R4.1 No thought without language?

Gauker defends the traditional philosophical thesis that there can be no conceptual thought without language. (This also seems to be tentatively endorsed by de Villiers & de Villiers as a generalization of their claims concerning higher-order thought; see R4.3 below.) I claim, on the contrary, that there can be a great deal of conceptual thought in the absence of language, viz. the outputs of the conceptual modules, and the inputs to practical reasoning.

Gauker proceeds by defining conceptual thought to mean: having the capacity to represent a particular thing as belonging to some general kind; and he then tries to explain away the behaviors which might seem to suggest that non-linguistic animals have any such capacity (e.g. by appealing to elaborate mental movie making). There is nothing wrong with the definition; but the attempts at explaining away limp badly, in my view.[11]

Gauker tries to explain animal capacities for categorization in terms of an ability to discern similarities amongst perceptual images. But this surely isnt sufficient. All prey species have the capacity to distinguish between predator and non-predator, for example. But if all such judgments were made in terms of mere perceptual similarity, one would expect wart-hogs and hyenas to be put together in one category, with lions in another.

Many social species (e.g. monkeys and apes) can recognize individuals, and are capable of tracking the changing properties of those individuals. They behave appropriately towards an individual depending upon what happened at their last encounter, or depending upon observed one-off encounters with others. Why this shouldnt count as representing a particular thing as belonging to a general kind escapes me. And I have no idea how one might account for such abilities in terms of sequences of sensory images, without this tacitly amounting to the claim that the animals in question engage in conceptual thought.[12]

R4.2 No off-line thought without language?

In the target article I interpret Bickerton (1995) to be claiming that all imaginative thinking which doesnt simply reflect current or remembered circumstances requires language. In his comment Bickerton corrects me, saying that he had meant that language is the vehicle of genuinely creative thought, including the capacity to conceive of truly novel objects or circumstances. With such a claim I can agree. But I deny that language is sufficient for us actually to entertain such thoughts. Rather, as pointed out in R1.4 above, I think that we need to postulate a distinct language-dependent supposer, whose activities are honed in childhood through the disposition to engage in pretend play (Carruthers, 2002b).

Bickerton goes on to suggest that the capacities to entertain conditional thoughts of the form, If x then y and to entertain causal thoughts of the form, x because y are dependent upon language. With this I strongly disagree. It will be the business of the conceptual modules (existing prior to and independently of language) to generate such thoughts. For example, the folk mechanics module can surely generate thoughts of the form, The rock striking the core caused the sharp edge. And such thoughts would have been available to hominids long before the evolution of language. (Indeed, rats, too, seem to be capable of causal thinking; see Dickinson & Shanks, 1995.)

On the model sketched in R1.2 above, similarly, it is the main business of all belief-forming modules to generate conditional information for input to the practical reasoning system. For example, a pre-linguistic hominid engaged on a hunt might have needed some way to attract the attention of a co-hunter without alerting the prey. The theory of mind module (or a precursor thereof) might have generated the conditional, If a pebble were to drop just in front of him, it would attract his attention, hence yielding an appropriate plan (to throw a pebble). I see no reason to assume that such routine planning should require language.

R4.3 No higher-order thought without language?

De Villiers & de Villiers present their claim that language is the vehicle of higher-order thought, or thoughts about the thoughts (e.g. the false beliefs) of another. I allow one aspect of this claim. Specifically, I allow that thought about domain-integrating thoughts is dependent upon language. For if the very existence of such thoughts is dependent upon language, as I argue, then thoughts about them would have to involve language too. But I deny that language is necessary for higher-order thought in general.

I make this denial, both because I think it very plausible that the evolution of the language faculty would have presupposed the prior existence of a capacity for higher-order thought (Origgi & Sperber, 2000), and because I think that the experimental evidence supports it. Let me elaborate a little upon the latter point.

As de Villiers & de Villiers note, I appeal for support to the experimental finding of Varley (1998), that a man with severe agrammatic aphasia (and no mentalistic vocabulary) was nevertheless easily able to solve the usual battery of false-belief tasks (explained to him by pantomime; itself an indicator of his sophistication in this domain). De Villiers & de Villiers point out the possibility that he might have deficits only of comprehension and production, however, leaving his underlying knowledge of LF intact. But in fact his input and output deficits match one another very precisely, suggesting a deficit in underlying competence. And indeed brain-scans show that he has lost the use of almost his entire left hemisphere!

The data that de Villiers & de Villiers cite in their own support are that competence with verbs of communication (e.g. says that) and verbs of thought (e.g. believes that) predicts false-belief performance. But as Hale & Tager-Flusberg (2003) point out, these data are confounded. That competence with verbs of thought predicts competence with false-belief might be because the requisite mental concepts need to be in place first in order to understand those verbs, rather than because those verbs are constitutive of thought about thought.

Hale & Tager-Flusberg (2003) undertook their own training study, coaching pre-schoolers on verbs of communication and on false belief. They found that training on sentential complements did indeed boost false belief performance, just as de Villiers & de Villiers would predict. But they also found that training on false belief tasks boosted performance, too, yet had no reciprocal influence upon language. This result counts strongly against their view.

Another recent study, by Woolfe et al. (2002), also counts against the view that natural language syntax is crucially implicated in mental state representation. It found that late-signing deaf children were very significantly weaker on a range of theory of mind tasks, even when matched for syntactic ability with a group of early-signing (and younger) deaf children. (The two groups couldnt be matched for ability with sentential complements, since British Sign Language has no sentential-complement construction!) Woolfe et al. argue that exposure to and engagement in conversation serves to boost theory of mind ability, rather than language per se being directly implicated in theory of mind reasoning.

R4.4 Dual-process theories and language

Evans & Over discuss the relationship between the main ideas of the target article and so-called dual process theories of the sort developed by themselves (1996), by Reber (1993), and by Stanovich (1999). (Dual process theories are also endorsed by Frankish and Nelson.) They find several deep connections between the two approaches.

They object, however, to the modularism which forms an important part of my approach. While they allow that implicit, System 1, processes are generally domain-specific, they assert that much of this sort of cognition results from domain-general learning. But there is no reason why a modularist should insist that all modules are innate, in fact. We can allow that some modules are built in the course of development (Karmiloff-Smith, 1992). This will be correct if chess-playing skill counts as modular, for example (possible), or if reading ability is so (very likely).

It is one thing to say that many implicit processes result from learning, however (and I agree); to say that they result from domain-general learning is quite another. For some modules might be built through the operations of other learning modules. And there might not really be any such thing as general learning, anyway. For example, a strong case has now been made that even classical conditioning is best explained, not as resulting from general associationist processes, but rather as arising from the computations carried by a specialized and innately structured foraging module, which calculates rates of return from different activities (Gallistel, 2000; Gallistel & Gibson, 2001).

Moreover, while Evans & Over accept that explicit, System 2, thinking is closely related to language, they doubt whether it is conducted in language; citing evidence for the involvement of mental models in System 2 thinking. But they forget that natural language sentences only form one part of the overall process of explicit domain-integrative thinking, on my account. And they forget that mental models play an important element in my own views, enabling the natural language sentences which are being looped back in inner speech to be cast in a form accessible to the various conceptual modules once again.

I conclude, therefore, that the correspondence between dual-process theories and the main claim defended in the target article (the module-integrating role of language) may be much closer than Evans & Over suggest.

R4.5 Language and connectionism

Clark and OBrien & Opie put forward accounts of the role of natural language in cognition from a connectionist perspective.[13] Both comments focus on the role of inner speech in enabling different parts of the brain to communicate with one another, and in serving as a locus of attention and memory.

This might be all well and good if connectionism were viable as an overall model of cognition. But it isnt, for the reasons sketched in R1.1. The successes of distributed connectionist models have been restricted to various forms of pattern recognition and feature extraction. And insofar as connectionist nets can approximate to anything resembling systematic thought and one-off learning, it is only through the imposition of an architecture which is tantamount to symbol processing (Marcus, 2001).

It might be urged that it is interesting, at least, to see that my thesis that language has an important cognitive function can have a close analogue within such a different theoretical framework. Well yes, to a degree. But it isnt news that connectionism can find a place for natural language in cognition. Such views have been proposed since the early days of the movement (Rumelhart & McClelland, 1986). And this is perhaps no accident, since natural language structures provide connectionists with their only means of approaching the serial and structured nature of conceptual thought. What is news is that language can be accorded a central role in the human mind within a framework which is realist and representationalist about structured thought, and which is committed to a strongly modularist account of our overall mental architecture.

R4.6 Language as an invention for encoding platonic meanings

Baumeister & Vohs deny that language evolved. Rather, they claim, it was invented. But this ignores the very significant body of evidence built up over the last half-century for the thesis that humans have an innately structured natural language faculty (for reviews, see Chomsky, 1988; Laurence and Margolis, 2001; Pietroski and Crain, 2001). Some such view was taken for granted in the target article, and I stand by that.

According to Baumeister & Vohs, too, language is a tool for processing meanings; where meanings are abstract mind-independent platonic entities, discovered by humans when their brains reached a certain level of sophistication. In effect, this is just Poppers (1959) third realm doctrine all over again: (1) there is the physical world of objects and physical processes, (2) there is the mental world of experiences and ideas, and (3) there is the abstract world of propositions and meanings; and the mental world is supposed somehow to grasp and make use of the abstract world in representing the physical one. (See also Frege, 1918.)

The consensus amongst naturalistically-inclined philosophers, however (once again for much of the last half-century) is that Platonism of this sort places insuperable obstacles in the way of achieving a scientific understanding of mind (Papineau, 1993). We have reasonable hopes of understanding how minds can represent the physical world by being realized in computational processes in brains which engage in the right sorts of causal commerce with that world (Rey, 1997). And abstract meanings might surely supervene on such processes. But we have no hope at all of understanding, in causal and scientifically acceptable terms, how mind/brains might succeed in grasping elements of an abstract mind-independent reality (meanings) directly, through which they would then succeed in representing the physical world.

R5 Supplementary claims for the involvement of language in thought

A number of commentators present proposals which are supplementary to, and/or largely consistent with, the main theoretical idea (central-module-integration) defended in the target article. These will be discussed in the present section.

R5.1 Language and concept formation

Both Hampton and Henser claim that natural language lexical items serve as the vehicles of our more abstract concepts, serving to chunk otherwise unwieldy amounts of data into discrete and manipulable packages. This proposal strikes me as plausible, to a degree; especially in connection with concepts which have emerged out of many years of collective labor by scientists, and which can only be acquired by immersion in the relevant scientific theories: such as electron, neutrino, and DNA.

If Hampton and Henser intend their thesis to be very much stronger than this, however, then it seems to me almost certain to be wrong, because of the phenomenon of fast mapping (Bloom, 2000). Children acquire lexical items at such a prodigious rate that the only plausible account of how they do it, is that they are mapping those items onto previously existing non-linguistic concepts, rather than constructing a set of language-involving concepts from scratch.

Xu introduces a fascinating body of data which suggests that natural language kind-terms are essence place-holders. As I understand it, the thesis is exclusively diachronic: there is no suggestion that lexical items are constitutive of possession of the corresponding concepts. Nor does the thesis imply that language-learning is the manner in which children acquire many new concepts, I think. Rather, use of kind-terms is taken by children to be evidence of the existence of an underlying essence to the kind. So the use of kind-terms is what enables children to select, from amongst the wider set of concepts available to them, the sub-set to which they are going to attach a belief in an underlying essence.

That children should possess some such disposition as this makes good sense, on the assumption that the language surrounding the child will embody the hard-earned wisdom of the society as a whole, concerning those kinds in the environment (especially biological kinds) which can underpin robust forms of inductive generalization. So in this sense, it may be that to learn a language is to tap into the beliefs of a culture (see also R5.7 below).

R5.2 Conscious thinking and inner speech

Both Chater and Clark are happy to accede to one of the subsidiary claims defended in my target article, namely that inner speech is constitutive of conscious propositional thinking. But each also puts forward a stronger thesis, namely that this is the only way in which we can entertain conscious propositional thoughts. Now as it happens, this is a thesis I am inclined to endorse, and have defended elsewhere (Carruthers, 1998b). But it requires a bit more work to establish than either Chater or Clark seem to allow.

The problem is that, besides reporting the occurrence of inner speech, many people also report engaging in pure wordless (non-linguistic, non-imagistic) conscious thinking (Hurlburt, 1990, 1993). So it needs to be argued somehow that these reports are illusory. And as it turns out, the confabulation data mentioned in R2.3, which show that humans lack direct access to (most of) their thought processes, can be deployed to plug this gap (Carruthers, 1998b); but it does need to be plugged, in order to get the conclusion which Chater and Clark both want.

Frankish, too, is happy to allow my claim that linguistically-mediated thinking occurs consciously, in inner speech. But he challenges my suggestion that such thinking might also occur non-consciously, through the manipulation of LF representations alone, without any phonological component.

I grant Frankish that the evidence of non-conscious linguistic thinking (e.g. the occurrence of eureka thoughts) is far from conclusive. And the thesis isnt one that I need to place any weight upon. But I still think that a case of sorts can be made. In particular, the well-known phenomenon of sleeping on it seems to me to support such a view. As we all know, when stumped by some difficult (conscious and linguistically formulated) problem, it often helps to sleep on it, or to turn our conscious thinking to other matters. And as a result, the solution sometimes just comes to us on waking, or in the midst of thinking about something else. These seem to be cases in which we can be confident that no conscious thinking about the problem has been going on in the interim (contra Frankishs suggestion); which would imply that the thinking in question has been non-conscious.

Frankish also argues that it is implausible to postulate two feed-back loops: a phonological one, for conscious linguistic thinking, and an LF one for non-conscious linguistic thinking (citing the work of Levelt, 1989, in his support). But actually, Levelt himself is committed to the existence of two feed-back loops: a phonological one, for on-line speech repair; and a message to be communicated one, for monitoring ones own speech intentions. So it is open to me to claim, either that this second loop runs on LF representations, or (more plausibly) that it can be exploited by LF representations in such a way as to subserve non-conscious module-integrating thinking.

In contrast, both Bickerton (in passing) and Slezak (at length) seem to challenge the very coherence of the idea that some thought might be conducted in inner speech. Bickerton offers no arguments. But Slezak claims that I have been seduced by the image of hearing with the minds ear (c.f. seeing with the minds eye). He claims that my position commits me to endorsing one particular side of the imagery debate (Kosslyn, 1994; Pylyshyn, in press), and that my account ends in incoherence as a result. There is much that goes awry here.

First, there is nothing fundamentally incoherent in Kosslyns (1994) position on images. The claim that images are like percepts (indeed, perhaps are percepts of a self-induced sort), occupying the same cognitive resources as does perception, is a perfectly sensible (albeit controversial) one. And nothing about it need commit one to the idea of a little homunculus who perceives the image (i.e. to the existence of some sort of pernicious and question-begging Cartesian Theatre; see Dennett, 1991). Rather, the idea is that images will be consumed and interpreted in whatever way perceptual states are consumed and interpreted.

Second, I dont in fact need to come down on one side or the other of the imagery debate. For my purposes, it matters little whether inner speech is relevantly like heard speech, or whether it is realized in complex Mentalese descriptions of heard sentences instead. Either way, it can still be the case that the states in question bear a semantic interpretation. And either way, it can still be the case that certain kinds of cognitive process (viz. domain-integrative processes) are dependent upon their existence.

R5.3 Language and memory

Schrauf makes an interesting connection between the main thesis of the target article and evidence of the role of language in effortful cases of episodic memory building. For episodic memories will often need to link narrative information across different modular domains, and there is considerable evidence of the role of natural language in the process of reconstructing such memories.

Especially intriguing is the evidence that bilinguals find it easier to reconstruct a memory in the language which they would have been using at the time of the original events. I dont think Schraufs claim here is that memories are stored in one natural language rather than another; instead, it seems to be that there are associations of various sorts created between the currently-used language and the various non-linguistic components of episodic memory. This seems very plausible.

Wynn & Coolidge complain that it would have been helpful if my proposal had been compared with the model of working memory developed by Baddeley (1986),[14] and they then proceed to provide a very useful comparison of this sort for themselves. And I basically agree with their account of the connections between Baddeleys views and my own, including the role of the phonological loop, and their suggestion that Baddeleys central executive might be equated with the sort of virtual executive which gets appealed to on my own proposal.

I also think that Wynn & Coolidge are right to stress the importance of visual imagination (Baddeleys visuo-spatial sketch-pad) in the co-ordination of complex skilled action, as the work of Keller & Keller (1996) illustrates. I would suggest that visual imagination is one of the main vehicles of the sort of pre-linguistic mental rehearsal discussed in R1.2, indeed. But I would claim that pre-linguistic visual imagery needs to be driven by the outputs of specific conceptual modules and/or by the contents of the practical reasoning module; and I would also claim that the process of generating images which link together the outputs of different conceptual modules is dependent upon language.

R5.4 Perspective-taking and language

MacWhinney stresses the importance for central cognitive processes of perspective-taking in production and comprehension of discourse. It is this, he says, which enables language to integrate information from many different parts of the brain. Taken in one way, this might seem to be a competitor to the hypothesis that it is sentences in LF which are the vehicles of inter-modular integration. And this might indeed be what MacWhinneys has in mind, since he (like Baumeister & Vohs) thinks that grammatical language is a late human invention, rather than an aspect of our innate human endowment (MacWhinney, 2003). But if so, then the account founders on the evidence of an innately structured language faculty, mentioned in R4.6.

MacWhinneys proposal might be better seen as an account of how mental models get built, I think; emphasizing their role in the comprehension process especially. If so, then it would emerge as a friendly supplement to my own account. For I, too, have stressed the importance of mental-model building in the comprehension of language. And it is this, in my view, which enables linguistic representations, in LF, to be taken as input and consumed by the various conceptual modules.

R5.5 Vygotskian processing

Both Frawley and Pléh rightly point out that there are strong connections between my main thesis and the Vygotskian tradition in psychology and cognitive science. For this, too, stresses the importance of language (specifically, silent speech), not just in development, but also in mature cognition; especially when task demands increase. My inter-modular integration proposal can be seen as one way of fleshing out what hard amounts to, in this context.[15]

Both Frawley and Pléh point out, as well, that the use of inner speech in cognition isnt hard-wired (so to speak), but is rather variable and opportunistic. With this, too, I agree. On my account, distinctively human creative thinking arises out of the interaction of a range of hard-wired systems which were selected for other purposes. (These include: a set of sophisticated conceptual modules for generating beliefs and desires; a language module which both takes inputs from the conceptual modules and provides inputs to them, and which forms LF sentences integrating the outputs of the conceptual modules; feed-back loops for mental rehearsal, which routinely run the contents of intended actions back though the conceptual modules for further evaluation; and a set of principles of relevance and coherence used in interpreting speech and evaluating the testimony of others.) What then had to be added were a set of dispositions: to generate novel linguistic contents as mere suppositions, for input to conceptual modules; to evaluate the consequences of those suppositions as if they were testimony; and to form explicit (language-involving) beliefs about appropriate patterns of reasoning and decision making.

I would predict that these dispositions, while to some degree universal amongst humans, are highly variable in their strength. Indeed, they might constitute the main components of the kind of cognitive style which Stanovich (1999) argues is the major determinant of differential success in batteries of reasoning tasks and measures of IQ.

R5.6 Weak Whorfianism

In the target article I was concessive about the powers of different natural languages to sculpt cognition differently during development. And Slobin points out that, in the light of Levinsons results concerning the ways in which space is handled differently in different natural languages (Levinson, 1996, 1997, 2002), I am therefore not entitled to assume that concepts such as long wall on the left will be cross-culturally available, existing pre-linguistically and awaiting translation into language.[16]

Now, in one respect this is quite right. I very much doubt that the output of the geometry module will include the concepts of left and right. Acquiring the vocabulary of left and right would surely be a great deal easier for children if it did! So in this respect the way in which I developed my hypothesis was misleading. Rather, the output of the geometry module might take the form, The toy is at the corner of this [diagram] shape, which then has to be mapped laboriously into the language of left and right.

However, I now also believe that I may have been too concessive towards the language sculpts cognition approach (as Atran, too, points out). Each set of data will have to be examined on a case-by-case basis, of course. But in respect of Levinsons spatial results, Li & Gleitman (2002) provide an elegant demonstration that those results would seem to depend on the spatial cues which are salient in the testing environment. (However, see Levinson et al., 2002, for a vigorous reply.) Similarly, Papafragou et al. (2002) conduct a test of Slobins (2002) claims concerning the ways in which different encodings of manner of motion in different languages will create patterns of attention which will lead, in turn, to effects on non-verbal memory. They are able to find no such effects.

I think the jury is still out on the question whether language sculpts cognition, then. Certainly the weak forms in which this thesis is currently being pursued are consistent with the strong modularism adopted in my target article, and also with my main thesis. But no such weak Whorfian claims are supported or entailed by my views. And the empirical data are still subject to a variety of interpretations.

Both Frawley and Henser claim that my view of the cognitive role of LF commits me to some sort of synchronic linguistic relativity thesis. They reason that if some thought is conducted in natural language, then (since languages differ in significant ways) peoples thought processes, too, must differ in significant ways, depending upon the language in which their thoughts are entertained. Such a claim is surely correct in respect of what Slobin (2002) calls thinking for speaking. For if your language requires you to encode manner of motion, then of course you have to think about manner in describing a motion event; and if your language uses cardinal directions to describe space, then of course you will have to think in terms of such directions in order to tell someone where something is. But it is another matter to claim that speakers of different languages will differ in their non-linguistic thoughts. And as indicated above, I regard such claims as currently unproven.

Perhaps Frawley and Henser have in mind a stronger thesis, however. Perhaps their idea is that, by virtue of speaking different languages, certain thoughts will be inaccessible to one group of speakers which are available to the other. But it is highly controversial to claim that languages differ in their expressive powers. Indeed, the working hypothesis adopted by most linguists in the Chomskian tradition is that, at the deepest level, the LFs of any language are determined by UG (Universal Grammar) together with the relevant lexicon and parameter settings (Higginbotham, 1995; Hornstein, 1995). And it is generally assumed that there is no incommensurability between languages in respect of the thoughts which they can carry. (See Baker, 2002, for extended and detailed discussion.)

R5.7 Language and the transmission of culture

A number of commentators stress the importance of language in the generation and transmission of culture (Baumeister & Vohs, Bryson, Hampton, Pléh). I agree entirely. I didnt emphasize this in the target article, since my focus was on the synchronic cognitive functions of language. But nothing that I said was intended to be inconsistent with it.

Of course language is the primary means by which we construct and communicate elements of culture, and it is also the primary means by which cultures are transmitted from one generation to the next, whether informally, through gossip and narrative, or formally, through education and training. And as Baumeister & Vohs point out, language is the vehicle for multi-generational accumulation and transmission of knowledge. Moreover, it is through language that most social processes are conducted and negotiated. And as Hampton highlights, too, language is crucial for the development of socially shared, explicit, logical and scientific thinking. There is nothing here that I should want to disagree with.

R.6 Conclusion

Modular models of mind are well-motivated by the need to understand mental processes as computationally realized. But such models give rise to a problem: namely, to comprehend how flexible and creative human cognition could emerge out of the interactions of a set of modular components. The main thesis of the target article (that the language module serves to conjoin the contents of a suite of conceptual modules) is one aspect of a solution to that problem. And both that thesis and the evidence for it have, I believe, survived the commentary process clarified and further elaborated, but essentially undamaged.

ACKNOWLEDGEMENT

I am grateful to the following for help and advice received while constructing this response to my commentators: Scott Atran, Stephen Crain, Paul Harris, Stephen Laurence, Gary Marcus, Paul Pietroski, Georges Rey, Michael Siegal, Barry Smith, and Helen Tager-Flusberg.

References (if not already contained in the original target article references or in the commentary references)

Baker, M. 2002. The Atoms of Language. Basic Books.

Baron-Cohen, S. & Ring, H. 1994. A model of the mind-reading system: neuro-psychological and neuro-biological perspectives. In C. Lewis & P. Mitchell (eds.), Children in Early Understanding of Mind, Erlbaum.

Bryson, J. 2000. Cross-paradigm analysis of autonomous agent architecture. Journal of Experimental and Theoretical Artificial Intelligence, 12:165-190.

Bloom, P. 2000. How Children Learn the Meanings of Words. MIT Press.

Butterworth, B. 1999. What Counts: How Every Brain is Hardwired for Math. Simon & Schuster.

Carey, S. 2001. Cognitive foundations of arithmetic: evolution and ontogenesis. Mind and Language, 16:37-55.

Carruthers, P. 2002b. Human creativity: its evolution, its cognitive basis, and its connections with childhood pretence. British Journal for the Philosophy of Science, 53:1-25.

Carruthers, P. 2003. Moderately massive modularity. In A.OHear (ed.), Mind and Persons, Cambridge University Press.

Carruthers, P. forthcoming a. On Fodors Problem.

Carruthers, P. forthcoming b. Practical reasoning in a modular mind.

Chomsky, N. 2000. New Horizons in the Study of Language and Mind. Cambridge University Press.

Dickinson, A. & Shanks, D. 1995. Instrumental action and causal representation. In D. Sperber, D. Premack & A. Premack (eds.), Causal Cognition, Oxford University Press.

Evans, G. 1982. The Varieties of Reference. Oxford University Press.

Fodor, J. 1975. The Language of Thought. Harvester.

Fodor, J. 2000. The Mind Doesnt Work That Way. MIT Press.

Fodor, J. & McLaughlin, B. 1990. Connectionism and the problem of systematicity. Cognition, 35.

Fodor, J. & Pylyshyn, Z. 1988. Connectionism and cognitive architecture. Cognition, 28.

Frege, G. 1918. The thought. In Posthumous Writings, trans. P. Long & R. White, Blackwell, 1979.

Gallistel, R. 2000. The replacement of general-purpose learning models with adaptively specialized learning modules. In M.Gazzaniga (ed.), The New Cognitive Neurosciences (second edition), MIT Press.

Gallistel, R. & Gibson, J. (2001). Time, rate and conditioning. Psychological Review, 108.

Gazzaniga, M. 1998. The Minds Past. California University Press.

Gigerenzer, G. & Todd, P. 2000. Simple Heuristics that Make Us Smart. Oxford University Press.

Gopnik, A. 1993: How we know our minds: The illusion of first-person knowledge of intentionality. Behavioral and Brain Sciences, 16:1-14.

Hale, C. & Tager-Flusberg, H. 2003. The influence of language on theory of mind. Developmental Science, 7.

Higginbotham, J. 1995. Sense and Syntax. Oxford University Press. An inaugural lecture delivered before the University of Oxford on 20 October 1994.

Hornstein, N. 1995. Logical Form. Blackwell.

Karmiloff-Smith, A. (1992). Beyond Modularity. MIT Press.

Lakoff, G. & Johnson, M. 1980. Metaphors we Live By. Chicago University Press.

Laurence, S. & Margolis, E. 2001. The poverty of the stimulus argument. British Journal for the Philosophy of Science, 52:217-276.

Levinson, S., Kita, S., Haun, D. & Rasch, B. 2002. Returning the tables: language effects and spatial reasoning. Cognition, 84:155-188.

Li, P. & Gleitman, L. 2002. Turning the tables: language and spatial reasoning. Cognition, 83:265-294.

Marcus, G. 2001. The Algebraic Mind. MIT Press.

Marslen-Wilson, W. 1973. Sentence perception as an interactive parallel process. Science, 189:226-8.

McDermott, D. 2001. Mind and Mechanism. MIT Press.

Nichols, S. & Stich, S. 2000. A cognitive theory of pretense. Cognition, 74:115-147.

Nisbett, R. & Ross, L. 1980. Human Inference. Prentice-Hall.

Nisbett, R. & Wilson, T. 1977. Telling more than we can know. Psychological Review, 84:231-295.

Origgi, G. & Sperber, D. 2000. Evolution, communication and the proper function of language. In P. Carruthers & A. Chamberlain (eds.), Evolution and the Human Mind, Cambridge University Press.

Papineau, D. 1993. Philosophical Naturalism. Blackwell.

Paprafragou, A., Massey, C. & Gleitman, L. 2002. Shake, rattle, n roll: the representation of motion in language and cognition. Cognition, 84:189-219.

Parsons, T. 1990. Events in the Semantics of English. MIT Press.

Pietroski, P. 2003. The character of natural language semantics. In A. Barber (ed.), Epistemology of Language, Oxford University Press.

Pietroski, P. 2004. Events and Semantic Architecture. Oxford University Press.

Pietroski, P. & Crain, S. 2001. Nature, nurture and universal grammar. Linguistics and Philosophy 24:139-86.

Popper, K. 1959. The Logic of Scientific Discovery. Routledge.

Rey, G. 1997. Contemporary Philosophy of Mind. Blackwell.

Rumelhart, D. & McClelland, J. 1986. Parallel Distributed Processing. MIT Press.

Smith, P. 1996. Language and the evolution of mind-reading. In P. Carruthers and P. Smith (eds.), Theories of Theories of Mind, Cambridge University Press.

Sperber, D. 2002. In defense of massive modularity. In I. Dupoux, ed., Language, Brain and Cognitive Development, MIT Press.

Sperber, D. & Wilson, D. 2002. Pragmatics, modularity and mind-reading. Mind and Language, 17:3-23.

Thornton, R. 1996. Elicited production. In D. McDaniel, C. McKee, & H. Cairns (eds.), Methods for Assessing Childrens Syntax, MIT Press.

Tooby, J. & Cosmides, L. 1992. The psychological foundations of culture. In J. Barkow, L. Cosmides & J. Tooby (eds.), The Adapted Mind, Oxford University Press.

Wilson, T. 1985. Strangers to ourselves. In J. Harvey and G. Weary (eds.), Attribution: basic issues and applications. Academic Press.

Wilson, T. 2002. Strangers to Ourselves. Harvard University Press.

Wilson, T. & Stone, J. 1985. Limitations of self-knowledge. In P. Shaver (ed.), Self, Situations and Social Behavior. Sage.

Wilson, T., Hull, J., & Johnson, J. 1981. Awareness and self-perception: Verbal reports on internal states. Journal of Personality and Social Psychology, 40:53-71.

Woolfe, T., Want, S. & Siegal, M. 2002. Signposts to development: theory of mind in deaf children. Child Development, 73: 768-778.

Authors Response