Categorization in Cognitive
Computer Science

John F. Sowa

Abstract.  In cognitive science, computers can be used in three ways:  to simulate cognition for artificial intelligence, to enhance cognition by assisting human intellectual activity, and to help scientists understand cognition by testing theories on large amounts of data. These three approaches are not mutually exclusive, since specialists in any of these areas frequently adopt techniques designed for the others. This article surveys theories of categorization and reasoning in cognitive science that have been implemented and tested in computer systems. Most of the ideas originated long before modern computers were invented, but computers provide an opportunity for developing them in greater detail than was previously possible. Section 2 surveys top-down and bottom-up approaches to categorization; Section 3 analyzes the implications of structure, context, and purpose on the choice of categories and the methods for recognizing individuals that belong to those categories; and Section 4 considers the interactions of categorization and reasoning. The concluding Section 5 discusses the levels of cognition and some successes and failures in simulating those levels by AI.

This paper was published in H. Cohen & C. Lefebvre, eds., Handbook of Categorization in Cognitive Science, Elsevier, 2006, pp. 141-163.

1. Computation in Cognitive Science

Theories of categorization in artificial intelligence, information retrieval, data mining, and other computational fields are no different in kind from theories that predate modern computers. The computer, however, introduces two important elements:  it enables theories to be tested on large amounts of data, and it enforces precision, since no program running on a digital computer can ever be vague or ambiguous. Both elements can be helpful in formulating and testing theories, but neither can guarantee truth, relevance, or usefulness. Sometimes, as Lord Kelvin observed, precision can be a distraction:  "Better a rough answer to the right question, than an exact answer to the wrong question." To avoid a bias toward answers that are easy to program, it is important to consider questions posed before computers were invented.

This paper surveys a variety of computational methods that have been applied to categorization and related methods for reasoning with and about the categories. These methods can be used for three different purposes:

  1. Artificial intelligence.  From the earliest days, computers were considered "giant brains", which had the potential to mimic and perhaps surpass human intelligence. Good (1965) predicted "It is more probable than not that, within the twentieth century, an ultraintelligent machine will be built and that it will be the last invention that man need make." Except for chess playing, that prediction has not come to pass, but attempts to achieve it have contributed to a better appreciation of the richness and complexity of human intelligence.

  2. Intelligence enhancement.  Computer capabilities are very different from and complementary to human abilities. That difference has led to a wide range of tools that supplement human cognition:  information storage, management, search, query, and retrieval; text editing, analysis, translation, formatting, and distribution; calculation in graphics, spreadsheets, statistical packages, and formula manipulation; and computer-aided human communication and collaboration.

  3. Hypothesis testing.  Psychology, linguistics, and philosophy deal with complex phenomena that cannot be described by the elegant mathematics used in physics. Without computers, theorists are often limited to studying an unrepresentative sample of oversimplified examples or to collecting statistics that show trends, but not causal connections. A computer, however, can analyze large volumes of realistic data and test hypotheses about causal mechanisms that could generate or interpret such data.
These three approaches differ primarily in their goals:  simulation, enhancement, or understanding of human cognition. Computational methods designed for any one of them can usually be adapted to the others.

Neural networks illustrate the way theories introduced before modern computers have contributed to computer-based techniques. In his study of learning, Thorndike (1932) developed a stimulus-response theory, which he called connectionism:  rewards strengthen the S-R connections, and punishments weaken them. Meanwhile, McCulloch and Pitts (1943), a neurophysiologist collaborating with a logician, designed a theoretical model of neural nets that are capable of learning and computing any Boolean function of their inputs. To implement that model, Rosenblatt (1958) built a machine called a perceptron, which simulated the nodes and links of a neural net. Later versions combined connectionism with neural nets, which were simulated on digital computers instead of being implemented directly in hardware. Figure 1 shows a typical neural net with stimuli entering the layer of nodes at the left; the links propagate signals to nodes in one or more hidden layers in the middle; finally, the responses appear at the layer of nodes on the right. The nodes represent neurons, and the links represent dendrites that pass signals from one neuron to another. In computer simulations, each link has an associated weight, which increases or decreases the strength of any signal that passes through the link. Each node combines the strength of all its inputs to determine the value of its output.

Figure 1:  A neural net for connecting stimuli to responses

Behaviorist methods of operant conditioning suggested neural-net methods of learning by backpropagation. Whenever a network generates incorrect responses for given stimuli, it is, in effect, "punished" by adjusting the weights on the links, starting from the response layer and propagating the adjustments backwards toward the stimulus layer. Over the years, many variations have been proposed for functions that combine inputs to determine outputs, for methods of backpropagation, for algorithms that use and change the weights, for the number of nodes and hidden layers, and for possible feedback from response layers to earlier stimulus layers. All these variations are incremental modifications of an older theoretical foundation.

The example of neural nets is interesting in its own right, but it also shows how a theory can be used for different purposes and how the same aspects may sometimes be an advantage or a disadvantage:

The remainder of this paper examines various ideas, themes, and theories of cognitive science that have been adapted and applied in computer systems. In most cases, the ideas originated long before modern computers, but the ability to process large volumes of data with great precision has led to a depth and sophistication that was previously unattainable.

2. The Great Categorization Debates

Aristotle invented the two dominant methods of classification, which in different variations have been used in all branches of cognitive science. In his logical works, he presented a top-down method for defining concepts based on a genus or supertype and one or more differentiae, which distinguish the new type from others of the same genus. But in his biological works, he criticized the limitations of the top-down approach and recommended a bottom-up approach beginning with a detailed description of individuals, classifying them in species, and grouping species in genera. He considered the top-down method appropriate for presenting the results of analysis and reasoning about them, but he recommended the bottom-up method as a better discovery procedure for investigating a new subject.

Tree of Porphyry

Figure 2:  Tree of Porphyry, translated from a version by Peter of Spain (1239)

In the third century AD, the philosopher Porphyry wrote a commentary on Aristotle's categories, which contained the first recorded tree diagram. The version in Figure 2 illustrates the categories and their relationships to syllogisms, which are Aristotle's rules for reasoning about types and subtypes. With the differentia material, the supreme genus Substance becomes the subtype Body, and with the differentia immaterial, becomes Spirit. The technique of inheritance is the process of merging the differentiae along the path above any category:  LivingThing is defined as animate material Substance, and Human is rational sensitive animate material Substance.

The goal of mechanically relating concepts to primitives was first proposed by Ramon Lull in the thirteenth century. His Ars Magna was a system of disks inscribed with primitive concepts, which could be combined in various ways by rotating the disks. Lull's system inspired Leibniz (1679) to develop his Universal Characteristic, which represented primitive concepts by prime numbers and compound concepts by products of primes. Then statements of the form All A is B are verified by checking whether the number for A is divisible by the number for B. If Plant is represented by 17 and Deciduous by 29, their product 493 would represent DeciduousPlant. If Vine is represented by 20,213, the statement All vines are deciduous is judged to be true because 20,213 is divisible by 493. Leibniz envisioned a universal dictionary for mapping concepts to numbers and a calculus of reasoning that would automate the syllogisms. To simplify the computations, he invented the first calculating machine that could do multiplication and division.

With the advent of electronic computers, computational linguists set out to implement Leibniz's universal dictionary. For her machine translation system, Masterman (1961), a former student of Wittgenstein's, defined a semantic net of 15,000 words, organized as a lattice based on 100 primitive concepts, such as Folk, Stuff, Change, Go, Talk. For the sentence This man is greedy, but pusillanimous, her system would generate the representation

     (THIS: MAN:) (HE: (CAN/ DO/ (MUCH: EAT:)))
     (BUT: NOT:) (HE: (CAN/ DO/ (MUCH: FIGHT:))).
For conceptual dependency graphs, Schank (1975) reduced the number of primitive acts to 11. The phrase x bought y, for example, could be expanded to x obtained possession of y in exchange for money. Transforming high-level concepts to primitives may show that two different phrases are synonymous. But many deductions are shorter and simpler in terms of a single concept like Liar than a graph or formula for one who mentally transfers information that is not true. In general, a system should allow high-level concepts to be expanded in terms of lower ones, but such expansions should be optional, not obligatory.

Modern dictionaries analyze thousands of words into more primitive ones, but they are not limited to a fixed set of categories. They also allow circular definitions, such as defining attribute as characteristic and characteristic as attribute. In linguistics, Katz and Fodor (1963) introduced primitives called semantic markers with projection rules for combining them. Many linguists adopted variations of the Katz-Fodor method, but even those who used it raised serious criticisms:

All logic-based methods ranging from the Tree of Porphyry to the latest formal ontologies are examples of an Aristotelian top-down approach, but Aristotle himself recommended bottom-up methods for analyzing empirical data. Whewell (1858) went further in claiming that top-down definitions are useless in biology:

Natural groups are given by Type, not by Definition. And this consideration accounts for that indefiniteness and indecision which we frequently find in the descriptions of such groups, and which must appear so strange and inconsistent to anyone who does not suppose these descriptions to assume any deeper ground of connection than an arbitrary choice of the botanist. Thus in the family of the rose tree, we are told that the ovules are very rarely erect, the stigmata usually simple. Of what use, it might be asked, can such loose accounts be? To which the answer is, that they are not inserted to distinguish the species, but in order to describe the family, and the total relations of the ovules and the stigmata of the family are better known by this general statement....

Though in a Natural group of objects a definition can no longer be of any use as a regulative principle, classes are not therefore left quite loose, without any certain standard or guide. The class is steadily fixed, though not precisely limited; it is given, though not circumscribed; it is determined, not by a boundary line without, but by a central point within; not by what it strictly excludes, but by what it eminently includes; by an example, not by a precept; in short, instead of a Definition we have a Type for our director.

Mill (1865) dropped the assumption of necessary and sufficient conditions, but he still believed that a closed-form definition was possible. He advocated a weaker criterion based on all of the necessary characteristics and a majority of the optional ones:
Whatever resembles the genus Rose more than it resembles any other genus, does so because it possesses a greater number of the characters of that genus, than of the characters of any other genus. Nor can there be the smallest difficulty in representing, by an enumeration of characters, the nature and degree of the resemblance which is strictly sufficient to include any object in the class. There are always some properties common to all things which are included. Others there often are, to which some things, which are nevertheless included, are exceptions. But the objects which are exceptions to one character are not exceptions to another: the resemblance which fails in some particulars must be made up for in others. The class, therefore, is constituted by the possession of all the characters which are universal, and most of those which admit of exceptions.

In his later philosophy, Wittgenstein (1953) repudiated his earlier logic-based approach by showing that ordinary words like game cannot be defined by necessary and sufficient conditions. Competition is present in ball games, but absent in solitaire or ring around the rosy; organized sport follows strict rules, but not spontaneous play; and serious games of life or war lack the aspects of leisure and enjoyment. Instead of differentiae that distinguish them from all other activities, games share a family resemblance:  baseball is a game because it resembles the family of activities that people call games. Except for technical terms in mathematics, Wittgenstein maintained that most words are defined by family resemblances, and the meaning of a word is its use within a language. A word, he said, is like a chess piece, whose meaning is not determined by its physical shape, but by the rules for using it in the game.

All three of the 19th-century methods of definition have been refined and implemented in modern computer systems:

After two millennia of philosophical debate and fifty years of computer implementations, the modern consensus is not much different from Kant (1800):

Since the synthesis of empirical concepts is not arbitrary but based on experience, and as such can never be complete (for in experience ever new characteristics of the concept can be discovered), empirical concepts cannot be defined. Thus only arbitrarily made concepts can be defined synthetically. Such definitions... could also be called declarations, since in them one declares one's thoughts or renders account of what one understands by a word. This is the case with mathematicians.
For any particular application, a top-down hierarchy of concepts can be legislated, but attempts to force all concepts into a universal, globally consistent hierarchy have failed.

3. From Local Features to Global Structures

Local features are useful for both top-down and bottom-up methods of classification, but the criteria for determining what features are significant depends more on purpose than on prominent, but superficial details. A rug and a sofa, for example, might be similar in color and texture, but not in size, shape, and function. A bird and a bat might be similar in flying ability, but in anatomical details a bat is more similar to a mouse than a bird. The criteria that determine what features to select and how to evaluate them depend on global considerations of structure, context, and purpose. Following is a summary of the kinds of criteria and the computational methods used or proposed for handling them:

As this list shows, the criteria for recognizing, classifying, and interpreting things and events are complex and varied. A feature-based classifier might be useful as the first stage of perception, but context and purpose are essential for focusing attention on what to observe, determining its relevance, and relating it to background knowledge. In the sentence, "Yojo batted an eraser across the desk," the words play and toy do not occur. With the background knowledge that Yojo is a cat, cats are playful creatures, and an eraser is a mouse-like object, one might interpret the action as playing and the eraser as a toy. This issue is not limited to natural language understanding, since interpreting a movie or a photograph would require the same kind of analysis. Such interpretations are necessary for reporting and classifying any observations. Luria (1968) wrote a book about Shereshevskii, a man with a phenomenal memory for the exact words and images he observed. Because of his memory, Shereshevskii got a job as a newspaper reporter, but he was totally unsuited. His memory for detail was perfect, but he could not interpret the detail, determine its relevance, or produce a meaningful summary. In effect, Shereshevskii behaved like a feature-based perceiver attached to a vast database:  he stored everything he saw or heard, but he could not interpret the relevance of anything.

Every classifier, logical or fuzzy, feature-based or structural, depends on some model for relating instances to categories. Figure 1, for example, illustrates the underlying model for most neural nets, and Figure 2 illustrates the model for most logic-based systems. Data models incorporate general assumptions about patterns typically found in well-behaved data; the popular bell-shaped curve, for example, is called "normal." Algorithmic models, such as neural nets, are based on assumptions about the mechanisms that occur in some natural processes. Breiman (2001), a statistician who had designed and used a wide range of models, emphasized the need for models that accurately reflect the nature of the subject:

But when a model is fit to data to draw quantitative conclusions, the conclusions are about the model's mechanism, and not about nature's mechanism. It follows that if the model is a poor emulation of nature, the conclusions may be wrong. Approaching problems by looking for a data model imposes an a priori straight jacket that restricts the ability of statisticians to deal with a wide range of statistical problems. The best available solution to a data problem might be a data model; then again it might be an algorithmic model. The data and the problem guide the solution. To solve a wider range of data problems, a larger set of tools is needed.
Every model is an approximation that extracts a simplified, computable mechanism from the overwhelming complexity of nature. As the statistician George Box observed, "All models are wrong, but some are useful."

Features represent local information, which could be treated as independent variables, or they could be organized in a global pattern by a schema or Gestalt. In their original papers on chunks and frames, Newell, Simon, and Minsky tried to incorporate the full richness of the psychological models. In later implementations, however, the word frame was applied to data structures that do little more than package a few pointers. Those packages are useful, but they don't model Selz's schematic anticipation, Bartlett's active organizations, or Wertheimer's structural laws. Minsky (1987) continued to argue for a more globally organized society of mind, and Newell (1990) proposed a unified theory of cognition called SOAR. The pioneers in AI realized from the beginning that human intelligence depends on global mechanisms, but the challenge of bridging the gap between local features and global structure has not been met.

Chess playing illustrates the importance of the Gestalt effects. In applying Selz's methods to chess, de Groot (1965) had chessplayers study positions and select a move while saying whatever came to mind, what moves or lines of play they were considering. He found no significant difference between masters and experts in the number of moves considered, depth of analysis, or time spent. The only significant difference was that the masters would usually focus on the best move at their first glance, while the nonmasters were distracted by moves that would dissipate their advantage. Former world champion Emanuel Lasker said that chess is a highly stereotyped game. Instead of exhaustively analyzing all options, the master simply looks at the board and "sees" which moves are worth considering.

After 40 years of chess programming, a computer was finally able to defeat the world champion, but only by a brute-force search. To discover what the human could see at a glance, the computer had to analyze the details of billions of chess positions. The Gestalt effect, which is significant on an 8×8 chess board with a maximum of 32 pieces, becomes even more pronounced in the oriental game of Go, which can have over 300 stones scattered around a 19×19 board. The number of possible patterns in Go is many orders of magnitude greater than in chess, and a brute-force search cannot overcome the human advantage of "seeing" the patterns. As a result, no program today can play Go beyond a novice level.

The schema or Gestalt theories appear in two different forms that are sometimes used in combination:  discrete versions use patterns of graphs, and continuous versions are based on geometric models. The Gestalt psychologists emphasized image-like geometric patterns, but Selz's graph patterns have been easier to implement. The term image schema (Lakoff & Johnson 1980; Oakley 2004) has been used for geometric models mapped to patterns of discrete words expressed in natural languages. The psychological and linguistic evidence for such patterns is strong, and conceptual spaces (Gärdenfors 2000) are a promising geometrical model that is designed to represent abstract concepts as well as physical images.

4. Categorization and Reasoning

Categorization and reasoning are interdependent cognitive processes:  every multistep reasoning process depends on categorization at each step; every one-step reasoning process is also a one-step categorization process, and vice-versa; and every multistep categorization process involves multiple reasoning steps. As an example, consider the rule of deduction called modus ponens:

     Premise:     If P then Q.
     Assertion:   P.
     Conclusion:  Q.
This rule depends on the most basic technique of categorization:  a comparison of two instances of A to determine whether they are identical. If the P in the assertion is not identical to the P in the premise, then a preliminary process of unification is required, which uses another technique of categorization:  the specialization of both Ps to a common form. If the Ps are complex expressions, multiple categorization steps may be necessary.

Deduction is one of Peirce's three methods of reasoning, each of which has a corresponding method of categorization. The following examples show how each of the three methods for reasoning about propositions has a corresponding method for dealing with categories. (For these examples, the words principle and fact are used as informal synonyms for proposition; a fact is considered more specialized than a principle, but either one could be arbitrarily complex.)

  1. Deduction.  Apply a general principle to infer some fact.
    Propositions: bird(x)->flies(x). bird(Tweety).
         Therefore, flies(Tweety).
    Categories: Birds⊂Flyers. Tweety∈Birds.
         Therefore, Tweety∈Flyers.

  2. Induction.  Assume a general principle that explains many facts.
    Propositions: bird(Tweety). bird(Polly). bird(Hooty). bat(Fred).
         flies(Tweety). flies(Polly). flies(Hooty). flies(Fred).
         Assume, bird(x)->flies(x).
    Categories: Birds={Tweety, Polly, Hooty}. Bats={Fred}.
         Flyers={Tweety, Polly, Hooty, Fred}.
         Assume, Birds⊂Flyers.

  3. Abduction.  Guess a new fact that implies some given fact.
    Propositions: bird(x)->flies(x). flies(Tweety).
         Guess, bird(Tweety).
    Categories: Birds⊂Flyers. Tweety∈Flyers.
         Guess, Tweety∈Birds.
In his pioneering work on symbolic logic, Boole (1854) used the same algebraic symbols for both propositions and categories. Since then, the notations have diverged, but the correspondences remain.

The methods of categorization and reasoning used in AI and related branches of computer science can be grouped in three broad areas, all of which have been under development since John McCarthy coined the term artificial intelligence in 1956. Most new developments may be considered incremental improvements, even though many of the "increments" are sufficiently radical to be important breakthroughs. Each of the three areas is characterized by its primary form of knowledge representation:  features, structures, or rules.

  1. Feature analysis.  The oldest methods of both categorization and formal reasoning are based on collections of features, which may be processed by a variety of logical, statistical, and algorithmic techniques. The features could be monadic predicates, as in Aristotle's differentiae, or they could be functions or dyadic predicates, in which the second argument (or slot) is a number or other value. A collection of features may be called a set, a list, a vector, a frame, or a logical conjunction. Among the feature-based methods are neural nets, decision trees, description logics, formal concept analysis, and a wide variety of vector-space methods, such as LSA.

  2. Structural analysis.  A collection of features, by themselves, cannot distinguish "blind Venetians" from "Venetian blinds" or "Dog bites man" from "Man bites dog." Instead of unordered collections, structural methods represent ordered dependencies and interconnections with directed graphs, including trees, forests, and nests of graphs within the nodes of other graphs. Such graphs are commonly used to represent the syntax and semantics of languages, both natural and artificial. Some versions represent rigorous logic-based formalisms, and others use informal heuristics. Among the structural methods are grammar-based parsers and pattern recognizers, constraint-satisfaction and path-finding systems, spreading-activation systems that pass messages through graphs, and graph-matching systems for finding patterns and patterns of patterns.

  3. Rule-based systems.  Rules are often used to process features or structures, but in a rule-based system, the rules themselves are the knowledge representation. The rules may be formulas in logic, or they may be less formal heuristic rules in an expert system. The more formal rule processors are called theorem provers, and the more informal ones are called inference engines. In general, the rules of a rule-based system may be used for multistep categorization in diagnostics and pattern recognition or for multistep reasoning in planning and problem solving.
Large systems are often hybrids that combine more than one of these methods. A natural-language parser, for example, may use features for the syntactic and semantic aspects of words and build a parse tree to represent the grammatical structure of a sentence. A reasoning system may combine a T-box for terminology defined by a description logic with an A-box for assertions and rules.

The results of classification may be a static category or a dynamic control structure. Decision-tree systems, for example, are often used for robots because they learn quickly and generate a decision tree that can be compiled into an efficient control mechanism. To train a robot, a teacher can take it "by the hand" and guide it through a task. At each step, the system records a vector of features that represents the current state of the robot together with the response that would take it to the next step. After that training pass, the system builds a decision tree that determines the response for any combination of features. If the robot makes a mistake on subsequent trials, the teacher can stop it, back it up to the point where the mistake occurred, and guide it through the next few steps. Then the system would modify the decision tree to prevent the robot from making the same mistake twice. It might, however, make similar mistakes in slightly changed conditions. Since the early days (Hunt 1962), incremental improvements in the algorithms have enabled decision-tree systems to generalize more quickly and reduce the number of "similar mistakes".

The boundaries between the three groups of AI systems are blurred because some features may be defined in terms of structures and some structures may be rule-like in their effect. As an example, a neural net for optical character recognition (OCR) might use a feature defined as "having an acute angle at the top." That feature can be encoded in a single bit, derived from information about angles (geometrical structures) and isTopOf (a dyadic relation). As another example, analogical reasoning, which is based on structure mapping, can be used to derive the same kinds of conclusions as a rule-based system that uses induction to derive rules followed by deduction to apply the rules. That approach is the foundation for case-based reasoning (Riesbeck & Schank 1989), but the principle was first stated by Ibn Taymiyya in his comparison of Aristotle's logic to the analogies used in legal reasoning (Hallaq 1993).

Figure 3:  Comparison of logical and analogical reasoning

Ibn Taymiyya admitted that deduction in mathematics is certain. But in any empirical subject, universal propositions can only be derived by induction, and induction must be guided by the same principles of evidence and relevance used in analogy. Figure 3 illustrates his argument:  Deduction proceeds from a theory containing universal propositions. But those propositions must have earlier been derived by induction with the same criteria used for analogy. The only difference is that induction produces a theory as intermediate result, which is then used in a subsequent process of deduction. By using analogy directly, legal reasoning dispenses with the intermediate theory and goes straight from cases to conclusion. If the theory and the analogy are based on the same evidence, they must lead to the same conclusions.

The question in Figure 3 asks for information Q about some case P. Answering that question by deduction requires a theory that contains some rule or combination of rules of the form If P′ then Q′, where P′ in the theory is a generalization of P in the given case. By a version of structure mapping called unification, P′ in the theory is specialized to match P. The same specialization, when applied Q′, produces the answer Q. If the question Q has one or more unknowns, as in Selz's method of schematic anticipation, the unknowns trigger a search to find the missing information. That search may take many steps, which may apply different rules of the same theory or even rules from different theories. But before any theory can be applied, it must have been derived by induction:  analyze the evidence, apply a reliable methodology for determining what commonalities indicate significant causal connections, and state those connections as if-then rules.

In analogical reasoning, the question Q leads to the same schematic anticipation, but instead of triggering the if-then rules of some theory, the unknown aspects of Q lead to the cases from which a theory could have been derived. The case that gives the best match to the given case P may be assumed as the best source of evidence for estimating the unknown aspects of Q; the other cases show possible alternatives. For each new case P′, the same principles of evidence, relevance, and significance must be used. The same kinds of operations used in induction and deduction are used to relate the question Q to some corresponding part Q′ of the case P′. The closer the agreement among the alternatives for Q, the stronger the evidence for the conclusion. In effect, the process of induction creates a one-size-fits-all theory, which can be used to solve many related problems by deduction. Case-based reasoning, however, is a method of bespoke tailoring for each problem, yet the operations of stitching propositions are the same for both.

Creating a new theory that covers multiple cases typically requires new categories in the type hierarchy. To characterize the effects of analogies and metaphors, Way (1991) proposed dynamic type hierarchies, in which two or more analogous cases are generalized to a more abstract type T that subsumes all of them. The new type T also subsumes other possibilities that may combine aspects of the original cases in novel arrangements. Sowa (2000) embedded the hierarchies in an infinite lattice of all possible theories. Some of the theories are highly specialized descriptions of just a single case, and others are very general. The most general theory at the top of the lattice contains only tautologies, which are true of everything. At the bottom is the contradictory or absurd theory, which is true of nothing. The theories are related by four operators:  analogy and the belief revision operators of contraction, expansion, and revision (Alchourrón et al. 1985). These four operators define pathways through the lattice (Figure 4), which determine all possible ways of deriving new theories from old ones.

Figure 4:  Pathways through the lattice of theories

To illustrate the moves through the lattice, suppose that A is Newton's theory of gravitation applied to the earth revolving around the sun and F is Niels Bohr's theory about an electron revolving around the nucleus of a hydrogen atom. The path from A to F is a step-by-step transformation of the old theory to the new one. The contraction step from A to B deletes the axioms for gravitation, and the expansion step from B to C adds the axioms for the electrical force. The result of both moves is the equivalent of a revision step from A to C, which substitutes electrical axioms for gravitational axioms. Unlike contraction and expansion, which move to nearby theories in the lattice, analogy jumps to a remote theory, such as C to E, by systematically renaming the types, relations, and individuals:  the earth is renamed the electron; the sun is renamed the nucleus; and the solar system is renamed the atom. Finally, the revision step from E to F uses a contraction step to discard details about the earth and sun that have become irrelevant, followed by an expansion step to add new axioms for quantum effects. One revision step can replace two steps of contraction and expansion, but one analogy step can transfer a complete theory from one domain to another just by relabeling the symbols.

Different methods of walking through the lattice can produce the same results as induction, deduction, abduction, learning, case-based reasoning, or nonmonotonic reasoning. As a method of creative discovery, Fauconnier and Turner (2002) defined conceptual blending, which also corresponds to a walk through the lattice:  given two cases A and B, derive a generic case T, which corresponds to the least common supertype in the lattice (as in Way's method); then use the expansion operator to specialize T to a blend B. Although all possible forms of reasoning, learning, and discovery can be reduced to walks in a lattice, the challenge is to find the correct path in an infinite lattice. In AI, search methods can be guided by a heuristic function, which estimates the distance from any given point to any desired goal. The term optimality, which Fauconnier and Turner adopted, is used in linguistics and psychology for the constraints that characterize a desirable goal; any computable definition of such constraints could be programmed as a heuristic function.

5. Levels of Cognition

The methods of studying and modeling cognition differ from one branch of cognitive science to another, but all of them are abstractions from nature. As Breiman (2001) cautioned, any conclusions derived from a model are "about the model's mechanism, and not about nature's mechanism." Furthermore, none of the models yet proposed in any branch of cognitive science can explain the full range of phenomena observed in humans and other animals in their natural environments. Figure 5 shows some models that have been postulated for animals at various levels of evolution.

Figure 5:  Evolution of cognition

The cognitive systems of the animals at each level of Figure 5 build on and extend the capabilities of the earlier levels. The worms at the top have rudimentary sensory and motor mechanisms connected by ganglia with a small number of neurons. A neural net that connects stimulus to response with just a few intermediate layers might be an adequate model. The fish brain is tiny compared to the mammals, but it already has a complex structure that receives inputs from highly differentiated sensory mechanisms and sends outputs to just as differentiated muscular mechanisms, which support both delicate control and high-speed propulsion. Exactly how those mechanisms work is not known, but the neural evidence suggests a division into perceptual mechanisms for interpreting inputs and motor mechanisms for controlling action. There must also be a significant amount of interaction between perceptual and motor mechanisms, and a simple neural net is probably an inadequate model.

At the next level, mammals have a cerebral cortex with distinct projection areas for each of the sensory and motor systems. If the fish brain is already capable of sophisticated interpretation and control, the larger cortex must add something more. Figure 5 labels it analogy and symbolizes it by a cat playing with a ball of yarn that serves as a mouse analog. Whether the structure-mapping mechanisms of computer analogies can explain the rich range of mammalian behavior is not known, but whatever is happening must be at least as complex and probably much more so. The human level is illustrated by a typical human, Sherlock Holmes, whose is famous for his skills at induction, abduction, and deduction. Those reasoning skills may be characterized as specialized ways of using analogies, but they work seamlessly with the more primitive abilities.

Whatever the neural mechanisms may be, the functions of human intelligence operate at multiple levels. The instant reaction to touching a hot stove is controlled by a wormlike ganglion well before the brain perceives what happened. Although muscles can be controlled by conscious attention, they operate most efficiently when the details are left to the fishlike parts of the brain. In his subsumption architecture for mobile robots, Brooks (1986) distinguished eight levels of competence, each with increasingly more sophisticated kinds of processing:

  1. Avoiding.  Avoid contact with other objects, either moving or stationary.

  2. Wandering.  Wander around aimlessly without hitting things.

  3. Exploring.  Look for places in the world that seem reachable and head for them.

  4. Mapping.  Build a map of the environment and record the routes from one place to another.

  5. Noticing.  Recognize changes in the environment that require updates to the mental maps.

  6. Reasoning.  Identify objects, reason about them, and perform actions on them.

  7. Planning.  Formulate and execute plans that involve changing the environment in some desirable way.

  8. Anticipating.  Reason about the behavior of other objects, anticipate their actions, and modify plans accordingly.
Each of these levels depends on and subsumes the competence attained by the earlier levels. Each level responds to signals from the input sensors and generates output for the motor mechanisms. The first few levels by themselves could support an insectlike intelligence that responds directly without complex reasoning. For more sophisticated or intelligent behavior, the later levels may inhibit and take control from the earlier levels. But for danger signals, the primitive levels can override the inhibitions with a reflexive response.

The competence levels can be mapped to the six levels of the psyche defined by Aristotle (Hamlyn 1993):  nutrition, perception, desire, locomotion, imagery, and thought. These levels can help clarify the kinds of computation required for robots and their parallels to animal cognition. Nutrition and desire, for example, are two important levels that are missing in Brooks's hierarchy.

  1. Nutrition. For a mobile robot, nutrition corresponds the act of recharging its batteries from time to time. For a software agent, it would correspond to the procurement of computer time and space from some host.

  2. Perception. For a robot, perception depends on input sensors and the ability to interpret the inputs. A television camera may provide a stream of data, but to see, the robot must convert the data to a representation of objects in the environment.

  3. Desire. Aristotle's general word for desire is orexis, which he further distinguished as appetite (epithymia), passion (thymos), and will (boulêsis). He classified appetite and passion as feelings shared with beasts and will as the result of rational thought. Moffat and Frijda (1995) made a similar distinction:  the built-in equivalent of appetite or passion gives a robot a preference for certain kinds of states, but its will is determined by a logically derived plan for reaching a preferred state.

  4. Locomotion. For mobile robots, locomotion includes the first three levels of Brook's competence levels — avoiding, wandering, and exploring.

  5. Imagery. Aristotle's word phantasia means the appearance, presentation, or representation of images, either real or illusory. It would include Brooks's levels of mapping and noticing.

  6. Thought. Aristotle reserved the last level of the psyche for rational thought, which would inlcude Brooks's levels of reasoning, planning, and anticipating. Some aspects of those functions, however, are found in animals without speech.
The psyche of an agent is its functional organization, and its level of sophistication depends on how much of the Aristotelian range of function it is able to support.

In AI systems, deduction has been developed in much greater depth than any other method of reasoning. Although methods of learning have been studied for years, no comparable level of success has been achieved. As an example, the following sentence was spoken by a child named Laura at 34 months of age (Limber 1973):

When I was a little girl, I could go "geek, geek" like that;
but now I can go "This is a chair."
That single sentence incorporates a surprising amount of logical complexity:  subordinate and coordinate clauses; earlier time contrasted with "now"; the modal auxiliaries can and could; the quotations "geek, geek" and "This is a chair"; metalanguage about her own linguistic abilities; contrast shown by but; and parallel stylistic structure. No program today can learn language at that level of complexity, and even some of the best language-generation programs have difficulty in producing sentences with such style.

Chomsky and his followers, such as Jackendoff (2002), maintain that children learn language so rapidly because the human brain has an innate module that facilitates the acquisition of syntax. After studying human evolution, Deacon (1997) agreed that the brain has a predisposition to language, but that the evidence shows a slow evolution of the brain and vocal tract toward modern forms. It suggests that early hominids already had a rudimentary language, which gave individuals with larger brains and better speaking ability a competitive advantage. To analyze human and animal communication, Deacon used Peirce's semiotic categories of icon, index, and symbol to classify the kinds of signs they could recognize and produce. He found that higher mammals easily recognize the first two kinds, icons and indexes, but only after lengthy training could a few talented chimpanzees learn to recognize symbolic expressions. Deacon concluded that if chimpanzees could make the semiotic transition from indexes to symbols, early hominids could. Once that transition had been made, language was possible, and the use of language would have promoted the co-evolution of both language and brain.

Although Jackendoff diverged from Chomsky by assuming graph-like conceptual structures, he replied to Deacon by repeating the claims for innate syntax. But if language depended on an innate syntactic module, a simulation of that module might enable computers to learn and understand language as easily as children. Yet syntax is not what makes language difficult for computers, and it's not essential for human understanding. People speaking a foreign language can make themselves understood even with badly fractured syntax, which is notoriously difficult for computers to parse. That supports Deacon's hypothesis that symbols are more important than syntax:  at first, children learn words as symbols for things; their predisposition to mimic adult speech enables them to learn patterns of words; and syntax results from learning to map speech patterns to cognitive patterns. Contrary to Chomsky, universal grammar is not the prerequisite, but the result of many generations of children preferring patterns that are easy to learn. Further research in semiotics may clarify these issues; promising techniques include algebraic semiotics (Goguen 1999), a dynamic logic of semiosis (Deacon 2004), and a combination of Peirce's semiotics with conceptual structures (Sowa 2000).

In summary, computer cognition gives us a better appreciation for the full range of human abilities. The greatest challenge for cognitive science is to integrate thousands of separate studies into a working model of the brain. In AI, Newell (1990) and Minsky (1987) integrated thirty or more years of their own research into models of cognition that have shown some promise, but are no more successful overall than Lenat's Cyc. Although AI is not yet able to simulate all aspects of human intelligence, it has contributed many valuable methods for enhancing intelligence and for testing hypotheses about intelligence.


Alchourrón, Carlos, Peter Gärdenfors, & David Makinson (1985) "On the logic of theory change: partial meet contraction and revision functions," Journal of Symbolic Logic 50:2, 510-530.

Baader, Franz, Diego Calvanese, Deborah McGuinness, Daniele Nardi, Peter Patel-Schneider (2003) Description Logic Handbook, Cambridge University Press, Cambridge.

Bartlett, Frederic C. (1932) Remembering, Cambridge University Press, Cambridge.

Boole, George (1854) An Investigation into the Laws of Thought, reprinted by Dover Publications, New York.

Breiman, Leo (2001) "Statistical Modeling: The Two Cultures," Statistical Science 16:3, 199-231.

Brooks, Rodney A. (1986) "A robust layered control system for a mobile robot," IEEE Journal of Robotics and Automation RA-2:1, 14-23.

Deacon, Terrence W. (1997) The Symbolic Species: The Co-evolution of Language and the Brain, W. W. Norton, New York.

Deacon, Terrence W. (2004) "Memes as Signs in the Dynamic Logic of Semiosis: Beyond Molecular Science and Computation Theory," in K. E. Wolff, H. D. Pfeiffer, & H. S. Delugach, Conceptual Structures at Work, LNAI 3127, Springer, Berlin, pp. 17-30.

de Groot, Adriaan D. (1965) Thought and Choice in Chess, Mouton, The Hague.

Evans, Thomas G. (1963) A Heuristic Program to Solve Geometric-Analogy Problems, abridged version in Minsky (1968), pp. 271-353.

Falkenhainer, B., Kenneth D. Forbus, Dedre Gentner (1989) "The Structure mapping engine: algorithm and examples," Artificial Intelligence 41, 1-63.

Fauconnier, Gilles, & Mark Turner (2002) The Way We Think:  Conceptual Blending and the Mind's Hidden Complexities, Basic Books, New York.

Ganter, Bernhard, & Rudolf Wille (1999) Formal Concept Analysis: Mathematical Foundations, Springer-Verlag, Berlin.

Gärdenfors, Peter (2000) Conceptual Spaces: The Geometry of Thought, MIT Press, Cambridge, MA.

Goguen, Joseph (1999) "An introduction to algebraic semiotics, with applications to user interface design," in C. Nehaniv, ed., Computation for Metaphors, Analogy and Agents, LNAI 1562, Springer, Berlin, 242-291.

Good, Irving John (1965) "Speculations concerning the first ultraintelligent machine" in F. L. Alt & M. Rubinoff, eds., Advances in Computers 6, Academic Press, New York, 31-88.

Hallaq, Wael B. (1993) Ibn Taymiyya Against the Greek Logicians, Clarendon Press, Oxford.

Hamlyn, D. W. (1993) Translation, introduction, and notes to Aristotle's De Anima, second edition, Clarendon Press, Oxford.

Hunt, Earl B., (1962) Concept Learning: An Information Processing Problem, Wiley, New York.

Jackendoff, Ray (2002) Foundations of Language:  Brain, Meaning, Grammar, Evolution, Oxford University Press, Oxford.

Kamp, Hans, & Uwe Reyle (1993) From Discourse to Logic, Kluwer, Dordrecht.

Kant, Immanuel (1787) Kritik der reinen Vernunft, translated by N. Kemp Smith as Critique of Pure Reason, St. Martin's Press, New York.

Kant, Immanuel (1800) Logik: Ein Handbuch zu Vorlesungen, translated as Logic by R. S. Hartmann & W. Schwarz, Dover Publications, New York, 1988.

Katz, Jerrold J., & Jerry A. Fodor (1963) "The structure of a semantic theory," Language 39, 170-210.

Lakoff, George, and Mark Johnson (1980) Metaphors We Live By, University of Chicago Press, Chicago.

Leibniz, Gottfried Wilhelm (1666) "Dissertatio de arte combinatoria," Leibnizens mathematische Schriften, vol. 5, Georg Olms, Hildesheim.

Lenat, Douglas B. (1995) "Cyc: A large-scale investment in knowledge infrastructure," Communications of the ACM 38:11, 33-38.

Limber, John (1973) "The genesis of complex sentences," in T. Moore, ed., Cognitive Development and the Acquisition of Language, Academic Press, New York, 169-186.

Luria, Alexandr Romanovich (1968) The Mind of a Mnemonist, Basic Books, New York.

Masterman, Margaret (1961) "Translation," Proceedings of the Aristotelian Society, 169-216.

McCulloch, Warren S., & Walter Pitts (1943) "A logical calculus of the ideas immanent in nervous activity," Bulletin of Mathematical Biophysics 5, 115-133.

Mill, John Stuart (1865) A System of Logic, Longmans, London.

Minsky, Marvin, ed. (1968) Semantic Information Processing, MIT Press, Cambridge, MA.

Minsky, Marvin (1975) "A framework for representing knowledge," in P. Winston, ed., The Psychology of Computer Vision, McGraw-Hill, New York, 211-280.

Minsky, Marvin (1987) The Society of Mind, Simon & Schuster, New York.

Moffat, David, & Nico H. Frijda (1995) "Where there's a Will there's an agent," in M. J. Wooldridge & N. R. Jennings (1995), Intelligent Agents, LNAI 890, Springer-Verlag, Berlin.

Narayanan, Srini (1999) "Reasoning about actions in narrative understanding," Proc. IJCAI'99, pp. 350-358.

Newell, Allen (1990) Unified Theories of Cognition, Harvard University Press, Cambridge, MA.

Newell, Allen, & Herbert A. Simon (1972) Human Problem Solving, Prentice-Hall, Englewood Cliffs, NJ.

Oakley, Todd (2004) "Image schema," in D. Geeraerts & H. Cuyckens, eds., Handbook of Cognitive Linguistics, Oxford University Press, Oxford.

Peirce, Charles Sanders (1903) "Harvard lectures on pragmatism," The Essential Peirce, ed. by N. Houser, C. Kloesel, et al., Indiana University Press, Bloomington, vol. 2, pp. 133-241.

Peter of Spain or Petrus Hispanus (circa 1239) Summulae Logicales, ed. by I. M. Bochenski, Marietti, Turin, 1947.

Riesbeck, Christopher K. & Roger C. Schank, (1989) Inside Case-Based Reasoning, Erlbaum, Hillsdale, NJ.

Rosch, Eleanor (1975) "Cognitive representations of semantic categories," J. of Experimental Psychology, General 104, 192-253.

Rosenblatt, Frank (1958) "The perceptron: a probabilistic model for information storage and organization in the brain," Psychological Review 65:6, 386-408.

Schank, Roger C., ed. (1975) Conceptual Information Processing, North-Holland Publishing Co., Amsterdam.

Schank, Roger C., & Robert P. Abelson (1977) Scripts, Plans, Goals and Understanding, Lawrence Erlbaum Associates, Hillsdale, NJ.

Selz, Otto (1913) Über die Gesetze des geordneten Denkverlaufs, Spemann, Stuttgart.

Selz, Otto (1922) Zur Psychologie des produktiven Denkens und des Irrtums, Friedrich Cohen, Bonn.

Sowa, John F. (2000) Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks/Cole Publishing Co., Pacific Grove, CA.

Sowa, John F., & Arun K. Majumdar (2003) "Analogical reasoning," A. de Moor, W. Lex, & B. Ganter, eds. (2003) Conceptual Structures for Knowledge Creation and Communication, LNAI 2746, Springer, Berlin, pp. 16-36.

Thorndike, Edward Lee (1932) The Fundamentals of Learning, Teachers College Press, New York.

Way, Eileen C. (1991) Knowledge Representation and Metaphor, Kluwer Academic Publishers, Dordrecht.

Wertheimer, Max (1925) Über Gestalttheorie translated as "Gestalt Theory," by W. D. Ellis, Source Book of Gestalt Psychology, Harcourt, Brace and Co, New York 1938.

Whewell, William (1858) History of Scientific Ideas, J. W. Parker & Son, London.

Wittgenstein, Ludwig (1922) Tractatus Logico-Philosophicus, Routledge & Kegan Paul, London.

Wittgenstein, Ludwig (1953) Philosophical Investigations, Basil Blackwell, Oxford.