ITR: A Biologically Inspired Adaptive Working Memory System for Efficient Robot Control and Learning

NSF Grant EIA-0325641

Research Objectives and Approach

In the primate brain, the tension between the desire for flexibility and the need for efficiency is thought to be largely addressed by the interaction between working memory and executive control faculties in the prefrontal cortex (PFC) and systems supporting relatively automatic forms of behavior in more posterior areas. The efficient reactive processes of posterior brain areas typically drive behavior in common situations, and they are modulated by frontal systems when special circumstances arise. The frontal cortex is well equipped to maintain task-relevant information in a kind of working memory, and it is well positioned to guide action selection when flexibility is needed.


Inspired by the utility of the PFC in biological systems, we are implementing an adaptive working memory system for efficient robot control and learning based on computational neuroscience models, and we will assess the contribution of such a system to the successful performance of robot navigation and object manipulation tasks in dynamically changing environments.



Our objectives include:

  • To develop a software toolkit that encapsulates computational neuroscience models of the working memory circuits of the PFC, optimized for use in robot control systems.
  • To develop powerful perception algorithms capable of encoding sensory events as abstract and compact working memory chunks, highlighting salient features such as category membership of perceived objects and novelty of events.
  • To develop mechanisms for relating metric spatial representations to representations of a more qualitative and linguistic nature, allowing for the efficient recording of spatially distributed events and goals in working memory.
  • To develop cognitive architectures for flexible motor control, utilizing an adaptive working memory to guide the search for situation-appropriate motor programs to be integrated.
  • To demonstrate the utility of an adaptive working memory for robot control and, at the same time, provide evidence that current computational neuroscience accounts of PFC function will indeed scale to real-world tasks and situations.

Learning from experience is central to this effort. Perception algorithms being explored use a variety of machine learning methods. The proposed motor control architecture is grounded in adaptive models of the cerebellum and basal ganglia. The working memory system learns to identify informational chunks worthy of retention using a model of the interactions between the brain's dopamine (DA) system and the PFC. Together, these systems will learn both routine behaviors and flexible strategies for novel situations.

Broader Impacts

  • Computational methods facilitating the development of robots and other cognitive systems capable of learning from experience to respond flexibly in novel situations.
  • Refined computational neuroscience models of human memory, tested in the real world.
  • A freely disseminated open source software toolkit for robotic adaptive working memory systems – computational neuroscience results packaged for use by technologists.
  • Strengthened partnerships between researchers in the fields of robotics, machine learning, artificial intelligence, and cognitive neuroscience.

Preliminary Results After Initial 6 Months

  • Paper on initial project results to be presented at American Association for Artificial Intelligence (AAAI) Fall 2004 Symposium Series.
  • Working memory toolkit designed and implemented, awaiting initial testing.
  • Perception algorithms undergoing tuning. Basic novelty detection demonstrated.
  • Sensory EgoSphere (SES) integrated with spatial language system.
  • Cognitive control architecture for flexible motor control designed and being implemented.


Figure 1: Prefrontal Cortex of the Human Brain





Figure 2: Actor-Critic TD-Learning Framework for Working Memory




Figure 3: Sensory EgoSphere and Spatial Reasoning




Figure 4: SIFT Object Recognition Algorithm




Figure 5: The ISAC Humanoid Robot Collaborating with the Segway RMP




Figure 6: ISAC and the SES



Project Overview

In the design of robotic systems, an essential tension exists between the desire for flexibility and the need for efficiency. A robot should exhibit robust performance in a wide range of environmental circumstances, including those which are unusual or difficult, while acting promptly and fluidly in common situations. Often, flexible responding is seen as requiring a rich internal representation of the current environment and, perhaps, a detailed model of how that situation would change as a result of considered actions. Deliberations over such rich representations can be a bane to efficiency, however, encouraging designers to identify and implement computationally simple relationships between perceptual features and enacted responses, instead (Brooks, 1986).

In primates, this tension is thought to be at least partially addressed by the interaction between working memory and executive control faculties in the prefrontal cortex (PFC) and the systems supporting relatively automatic forms of behavior in more posterior brain areas (Bianchi, 1922; Luria, 1969; Shallice, 1982; Eslinger and Damasio, 1985; Fuster, 1989). The efficient reactive processes of posterior areas typically drive behavior in common situations, and they are modulated by frontal systems when special circumstances arise. According to some theories, this top-down modulation can take many forms, including the highlighting of specific perceptual features, the mediation of conflicts between competing automatic behaviors, and the specification of the current set of appropriate responses (Cohen et al., 1996). In particular, the frontal cortex appears to be well equipped to actively maintain task-relevant information in a kind of working memory, and it is well positioned to allow this maintained information to guide attention and influence action selection (O'Reilly et al., 1999).

Inspired by the apparent utility of prefrontal cortex in biological systems, we are designing and implementing an adaptive working memory system for efficient robot control and learning, and we intend to assess the contribution of such a system to the successful performance of navigation and object manipulation tasks in dynamically changing environments. This working memory will be distinguished from other memory systems in its rapid accessibility by action selection systems and in its exclusive retention of only situation-specific information that is essential to the successful performance of the current task. The task-sensitive nature of this working memory system introduces a need for adaptability. The memory will be expected to learn, from experience, to discriminate between information which needs to be actively maintained by the memory in a given task context and that which can be safely ignored. The system may choose to ignore information because it is irrelevant to the current task or because it is expected to be easily retrievable from either a long term memory store or from future perceptual acts. Such an adaptive working memory system is expected to make deliberative reasoning more efficient by narrowing attention to task-relevant spatial locations, sensory features, and objects, allowing deliberation to modulate or preempt reactive processes in a timely manner.

A mobile robot equipped with such a working memory system will not waste time processing most of the sensory information available to it as it navigates through space. Reactive systems will monitor the path ahead, implementing behaviors like obstacle avoidance using basic sensory cues. The working memory system will contain only information about the next few expected landmarks. The robot may retain information about the expected locations of such landmarks or about their expected sensory properties, and this information will be used to direct attention in perceptual processes. When a sought after landmark is finally discovered, the memory will have learned to free itself of the burden of retaining information about that landmark and to "gate into working memory" information about landmarks expected further down the path, retrieved from a long term store. Also, the location of a temporarily occluded landmark will be actively retained until, because of the robot's motion or because of motion of the occluding object, it becomes visible once again.

A humanoid robot manipulating a large collection of tools in a dynamic environment that potentially includes interacting humans and other robots, when equipped with an adaptive working memory, will learn to remember the location at which it sets down a tool that it will soon need again. When seeking to retrieve the tool, it will not need to search either its environment or its internal long-term "map" of the workspace, but will immediately direct attention to the appropriate spatial location and attempt to perceptually reacquire an object with the remembered sensory properties. Such a robot, when collaborating with human partners, will also learn to discern when the current task facing the group has changed, appropriately loading working memory with useful information related to newly ascertained goals.

We have begun to design and construct an adaptive working memory system appropriate for both standard mobile robots and humanoids, based on the architecture of contemporary computational models of prefrontal cortex. Some of these models have focused on the process of selecting information for retention (or forgetting), and they have attempted to explain how this selection process can be learned from experience. In particular, it is thought that the dopamine neurons of the ventral tegmental area (VTA) of the midbrain, which project extensively to prefrontal cortex, encode a measure of "change in expected reward" (Schultz et al., 1993; Schultz et al., 1997), and this measure is seen as a useful indicator of when a chunk of information should be maintained in working memory (Braver and Cohen, 2000). If the considered retention of a particular chunk causes the dopamine system to predict an increase in future reward, then that chunk should be retained. Computational models of the dopamine system have discovered that predictions of "change in expected reward" can be reliably learned from experience through the use of a reinforcement learning technique called Temporal Difference (TD) learning (Sutton, 1988; Montague et al., 1996). Following these models, we are using a temporal difference learning algorithm to adaptively update a working memory of task-relevant spatial locations, sensory features, and objects. Note that, by building on existing neuroscientific models, this work will not only demonstrate the utility of an adaptive working memory for robotic systems, but it will also test the scalability of existing models of prefrontal cortex.

The constructed adaptive working memory system will be embedded in both the ISAC humanoid robot (Kawamura et al., 2004) at Vanderbilt University and various mobile robots at the University of Missouri - Columbia, all of which will be equipped with innovative systems for supporting and leveraging an adaptive working memory. The system will be initially assessed by presenting the robots with formal analogues to neuroscientific laboratory tasks which are sensitive to prefrontal brain function. Once basic competence is established using these laboratory tasks, the utility of the working memory system will be assessed in the context of some common robot tasks, including navigation tasks (mobile robot) and tool manipulation tasks (humanoid robot). The primary components of this project, which are aligned with the backgrounds and skills of the investigators, are:

  • Adaptive Working Memory Toolkit (Noelle) - The development of software tools for the creation and use of robotic working memory systems, based on computational models of the prefrontal cortex.
  • Cognitive Architecture (Kawamura) - The development of a humanoid robot control architecture which can efficiently deliver appropriate "chunks" of information to the working memory system and can fluently make use of working memory contents in order to guide robot behavior.
  • Spatial Representation & Reasoning (Skubic) - New methods for encoding information about the spatial arrangement of objects in a robot's environment, using quantitative, qualitative, and linguistic codes. These encodings will be used to identify useful working memory chunks, and associated methods for spatial reasoning will be designed to effectively leverage working memory.
  • Perception & Object Recognition (Keller) - This component includes both the use of powerful statistical object recognition techniques to identify useful informational chunks to be retained in working memory and also the augmentation of those techniques to allow for top-down modulation based on working memory contents.
  • Pre-Attentive Perceptual Processing (Wilkes) - This component involves the fabrication of computationally efficient complex visual feature detectors which may be used either to guide attention for the purpose of object recognition or to indicate salience of a percept (or region of space) to the adaptive working memory system.
  • Integration & Evaluation (All) - This is the component in which all of these efforts are brought together in order to assess the utility of adaptive working memory systems, along with appropriately integrated perceptual and deliberation mechanisms, for robot control and learning.

Each of these project components will require conceptual innovations specific to each component and the transformation of those insights into a working integrated system.

References

Bianchi, L. (1922). The Mechanism of the Brain and the Function of the Frontal Lobes. E. & S Livingstone, Edinburgh.

Braver, T. S. and Cohen, J. D. (2000). On the control of control: The role of dopamine in regulating prefrontal function and working memory. In Monsell, S. and Driver, J., editors, Control of Cognitive Processes, volume XVIII of Attention and Performance, chapter 31, pages 713-737. MIT Press.

Brooks, R. A. (1986). A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation, 2(1):14-23.

Cohen, J. D., Braver, T. S., and O'Reilly, R. C. (1996). A computational approach to prefrontal cortex, cognitive control, and schizophrenia: Recent developments and current challenges. Philosophical Transactions of the Royal Society of London B, 351:1515-1527.

Eslinger, P. J. and Damasio, A. R. (1985). Severe disturbance of higher cognition after bilateral frontal lobe ablation: Patient EVR. Neurology, 35(12):1731-1741.

Fuster, J. M. (1989). The Prefrontal Cortex. Raven Press, New York, 2nd edition.

Kawamura, K., Peters, II, R. A., Bodenheimer, R., Sarkar, N., Park, J., and Spratley, A. (2004). Multiagent-based cognitive robot architecture and its realization. International Journal of Humanoid Robotics. In press.

Luria, A. R. (1969). Frontal lobe syndromes. In Vinken, P. J. and Bruyn, G. W., editors, Handbook of Clinical Neurology, volume II. Elsevier, New York.

Montague, P. R., Dayan, P., and Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. Journal of Neuroscience, 16:1936-1947.

O'Reilly, R. C., Braver, T. S., and Cohen, J. D. (1999). A biologically based computational model of working memory. In Miyake, A. and Shah, P., editors, Models of Working Memory: Mechanisms of Active Maintenance and Executive Control, chapter 11, pages 375-411. Cambridge University Press, Cambridge.

Schultz, W., Apicella, P., and Ljungberg, T. (1993). Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. Journal of Neuroscience, 13:900-913.

Schultz, W., Dayan, P., and Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275:1593-1599.

Shallice, T. (1982). Specific impairments of planning. Philosophical Transactions of the Royal Society of London B, 298:199-209.

Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3:9-44.

The Working Memory project homepage is maintained by the University of Missouri here.