This paper proposes a scheme for learning and detecting soccer balls through the combination of the computational attention system VOCUS with a classifier. Recognizing soccer balls as an application in the Robot World Cup Soccer Games and Conferences (RoboCup) [8] has been a tough problem to solve because of the lack of definite characteristics describing a ball. Our solution is reliable, scale-independent and color-adaptable in the sense that it can be applied to balls of any size, surface pattern and color.
Our approach consists of a training phase, an adaptation phase, and a detection phase. In the training phase, the classifier is exhaustively trained using balls of different sizes, colors, and surface patterns from a wide variety of training images. The output of the training is a cascade of classifiers that in turn consist of a set of decision trees. In the adaptation phase, VOCUS is quickly adapted to a special scenario: it learns from few example images (here: 2) the properties of the scenario, e.g., the color of the ball and its intensity contrast to the environment. This adaptation results in a set of feature weights describing the ball in its surroundings. In the detection phase, first, VOCUS computes regions of interest by weighting the image features with the learned weights. Second, the classifier is applied to these regions, verifying the object hypothesis (Fig. 1). This approach makes the system flexible as well as robust.
The visual attention system VOCUS consists of a bottom-up part computing data-driven saliency and a top-down part and enabling goal-directed search. Bottom-up saliency results from uniqueness of features, e.g., a black sheep among white ones, whereas top-down saliency uses features that belong to a specified target, e.g., red when searching for a red ball. The bottom-up part, also described in [7], is based on the well-known model of visual attention by Koch & Ullman [11] used by many computational attention systems [12,1]. It computes saliencies according to the features intensity, orientation, and color and combines them in a saliency map. The most salient region in this map yields the focus of attention. The top-down part is new: it uses previously learned feature weights to excite target-specific features and inhibit others.
Balls are classified according to the Viola-Jones classifier [22]: The shape of the ball is learned by using edge-filtered and thresholded images, represented by computationally efficient integral images [22]. The Gentle Ada Boost learning technique [5] is used to learn a selection of Classification and Regression Trees (CARTs) that select an arrangement of Haar-like features to classify the object. Several selections are combined into a cascade of classifiers. This learning phase is relatively time-consuming, but only needs to be executed once, since the classifier is then general enough to apply to any ball shaped object.
|
The most common techniques for ball detection in RoboCup rely on color information. In the last few years, fast color segmentation algorithms have been developed to detect and track objects in this scenario [10,19]. The community agreed that in the near future, visual cues like color coding will be removed to come to a more realistic setup with robots playing with a ``normal'' soccer ball [20].
Treptow and Zell learn with Ada Boost conglomerations of Haar like classifiers and arrange them in a cascade to recognize balls without color information [20]. However, in previous work [16] we show problems with learning non symmetric object patterns in differently illuminated environments. To overcome this problem, we preporcessed the input with edge detection and learned classification and regression trees (CARTs) instead of simple conglomerations of feature classifiers and accomplished color-independent ball detection for various balls. To reduce a significant amount of false detections, where the classifier marked various round shapes, e.g., the heads in Fig. 7, we propose here an attention algorithm that is quickly adapted on the spot to a specific ball. It yields several region hypotheses. With the combination of both systems, we eliminate the false detections and identify only the intersection of the two classified sets as correct. In this way, the ball detector can efficiently be applied to more complex images, without worrying about false detections.
The combination of an attention system with classification has also been done by Miau, Papageorgiou and Itti who detect pedestrians on attentionally focused image regions using a support vector machine algorithm [15]. Walther and colleagues combine in [23] an attention system with the object recognizer of Lowe [14] and show that the recognition results are improved by the attentional front-end. Nevertheless, all of these approaches focus on bottom-up attention and do not enable goal-directed search. To our knowledge, this is the first approach combining a top-down modulated attention system with a classifier.
The rest of the paper is structured as follows: First, we describe the attention system VOCUS in section II. We then discuss briefly the process of learning and detecting balls in section III. The results of each algorithm independently as well as in combination are given in section IV and, finally, section V concludes the paper.