Image Mapping and Visual Attention on a Sensory Ego-Sphere

The Sensory Ego-Sphere (SES) is a biologically inspired, short term memory structure for robots that acts as an interface between sensing and cognition [1]. It can be envisioned as a virtual spherical shell surrounding the robot. Information about a point in space is stored on the shell in the direction of the point from the center of the sphere. Thus the SES is an egocentric, spherical mapping of the locale. To date, it has been used to recall the locations of discrete objects in the vicinity of a robot. As such the SES is a sparsely populated map. This research focuses on two problems related to the SES: (1) the mapping to the SES of high-resolution sensory information in the form of imagery, and (2) the concurrent processing of visual attention.

The image sequence was taken from the humanoid robot ISACís rotating camera head. An image was taken at each SES node falling inside a pre-determined area. Problems such as overlap between adjacent images of the image sequence and variable distance between nodes were addressed to obtain a continuous mapping of the robotís visual scene. Although a full image was captured at each node location, a foveal window was extracted from the center of the image. Figure 1 illustrates this process. The foveal windows were then used to populate the SES and reconstruct the visual scene with minimal overlap. A graphic of all foveal windows posted on the SES can be found in Figure 2 while Figure 3 represents the reconstructed visual scene.

Figure 1. Posting a foveal window onto the SES


Figure 2. Visual scene posted on ISACís Sensory Ego-Sphere


Figure 3. Reconstructed scene from SES foveal windows


A mechanism for attention is necessary if the SES is populated with dense imagery. Because of limited computational resources, only regions of interest can be attended if a robot is to interact with a human-centered environment in real-time. Two possibilities of selecting visual attention on the SES were examined in an attempt to address the problem of how attentional processing should be achieved. Both methods used the FeatureGate model of visual attention [2,3].

The first method involved performing attentional processing on individual full-size images from the image sequence to identify the most salient locations. Because of overlap present between images in the sequence, attentional points found in different images could refer to the same location in space. For this reason, the salient locations found in each image were then associated with the node closest to their location on the SES (instead of the node corresponding to the optical center of the full-size image). Attentional points were then summed at each node to find the most salient node locations in the entire visual scene. Based on the assumption that the more often a location is selected in separate images, the more likely it is that there is an actual relevant feature at that location, an attentional point that has persisted in several adjacent images will have a higher activation value and, therefore, will be deemed more salient than an attentional point found in only one image. The top 12 most salient locations found by this method can be found in Figure 4. The order of salience for the locations is as follows: red, orange, yellow, green, blue, indigo, violet, magenta, black, gray, brown, and hunter green.

Figure 4. Top 12 most salient locations in scene by activation summation.

The second method of selecting attention on the SES involved performing attentional processing on the image reconstructed from the foveal windows posted on the SES. Less information was available in this method since only one image determined the most salient locations in the scene as opposed to a sequence of overlapping images. Whether an attentional point has persisted in several adjacent images is not known. Therefore, the confidence level that a location deemed salient by this processing method is an actual salient feature in the environment is less than with the first processing method. The top 12 most salient locations found by this method can be found in Figure 5.

Figure 5. Top 12 most salient locations by attentional processing on reconstructed scene image.



References

1. Peters, R.A. II, Hambuchen, K.A., Kawamura, K., Wilkes, D.M. The Sensory Ego-Sphere as a Short-Term memory for Humanoids. Proceedings of the IEEE-RAS Conference on Humanoid Robots, 2001, pp 451-60.

2. Cave, K.R. The FeatureGate model of visual selection. Psychological Research, 62, 182-194 (1999).

3. Driscoll, J.A., Peters, R.A., Cave, K.R. A Visual Attention Network for a Humanoid Robot. Proc. 1998 IEEE/RSJ Intíl Conf. Intell. Robotic System. (IROSí98)