Percept-Behavior Association Learning Using Working Memory
In order to be considered a cognitive robot, ISAC must use a variety of aspects of cognition such as learning, long- and short-term memory, working memory, and internal states. Furthermore, ISAC must possess the ability to act reactively, deliberately, and reflectively. Therefore, as ISAC encounters stimuli in the environment and chooses behavioral responses, the underlying executive and internal processes attempt to identify similarities and regularities that can be extracted and later used as reactive responses.
The current ISAC architecture uses a biologically inspired working memory system to create percept-behavior associations. This working memory system, inspired by the model of [Braver and Cohen, 2000] and implemented using a working memory toolkit (WMtk) developed by [Phillips and Noelle, 2005], attempts to load and maintain task relevant information during task execution. Chunks of information are loaded from the Sensory EgoSphere (SES), ISAC’s short-term memory system that holds perceptual information, and from procedural memory, which stores information for the performance of behaviors. A list of candidate chunks are presented to the WMtk and the appropriate chunks are then selected by a learning network that uses temporal difference learning [Sutton, 1988] to modify network weights. Over time, as percept-behavior pairs are regularly and routinely loaded, maintained, and used during task execution by the WMtk these pairs are learned by the working memory system and later used by the CEA and FRA to perform routine tasks and to relieve pressure on ISAC’s deliberative processes by allowing routine actions to be performed automatically [Kawamura, et al., 2007].
Another key element to learning percept-behavior combinations is through the continual monitoring and processing of ISAC’s episodic LTM. Currently, episodic LTM is used by the ISAC’s deliberative processes to create a probability model for choosing particular actions in a particular situation. However, as the number of similar episodes associated with a situation grows, the likelihood of choosing the same behavior when presented with the same percept increases towards one. Once the probability becomes one, it is sufficient to consider this a percept-behavior combination and the learned pair can be stored within the FRA.References
[Braver and Cohen, 2000] T.S. Braver and J.D. Cohen, “On the Control of Control: The Role of Dopamine in Regulating Prefrontal Function and Working Memory”, In S. Monsell and J. Driver (Eds.), Control of Cognitive Processes, Vol. 18 of Attention and Performance, chapter 31, pp. 713-737, MIT Press, 2000
[Kawamura, et al., 2007] K. Kawamura, S. Gordon, P. Ratanaswasd, C. Garber, and E. Erdemir, “Implementation of Cognitive Control for Robots”, Proc. of the 4th COE Workshop on Human Adaptive Mechatronics (HAM), pp. 41-54, Tokyo Denki University, Japan, March 2-3, 2007
[Phillips and Noelle, 2005] J.L. Phillips and D. Noelle, “A Biologically Inspired Working Memory Framework for Robots”, Proc. of the 27th Annual Conf. of the Cognitive Science Society, pp. 1750-1755, 2005